165 lines
6.3 KiB
Text
165 lines
6.3 KiB
Text
|
= cloud-image-uploader SOP
|
||
|
|
||
|
Upload Cloud images to public clouds after they are built in Koji.
|
||
|
|
||
|
Source code: https://pagure.io/cloud-image-uploader
|
||
|
|
||
|
== Contact Information
|
||
|
|
||
|
Owner::
|
||
|
Cloud SIG, Jeremy Cline (jcline)
|
||
|
Contact::
|
||
|
#cloud:fedoraproject.org (Matrix)
|
||
|
Servers::
|
||
|
- https://console-openshift-console.apps.ocp.stg.fedoraproject.org/project-details/ns/cloud-image-uploader[Stage]
|
||
|
- https://console-openshift-console.apps.ocp.fedoraproject.org/project-details/ns/cloud-image-uploader[Production]
|
||
|
|
||
|
Purpose::
|
||
|
Upload Cloud images to public clouds.
|
||
|
|
||
|
== Description
|
||
|
|
||
|
cloud-image-uploader is an AMQP message consumer (run via `fedora-messaging
|
||
|
consume`) that processes Pungi compose messages published on the
|
||
|
`org.fedoraproject.*.pungi.compose.status.change` AMQP topic. When a compose
|
||
|
enters the `FINISHED` or `FINISHED_INCOMPLETE` states, the service downloads
|
||
|
any images in the compose and uploads it to the relevant cloud provider by
|
||
|
running an Ansible playbook. Consult the `playbooks` directory in the source
|
||
|
repository or Python package to see the playbooks.
|
||
|
|
||
|
The service does not accept any incoming connections and only depends on the
|
||
|
RabbitMQ message broker and the relevant cloud provider's APIs.
|
||
|
|
||
|
It requires a few gigabytes of temporary space to download the images before
|
||
|
uploading them to the cloud provider. It is heavily I/O bound and the most
|
||
|
computationally expensive thing it does is decompress the images.
|
||
|
|
||
|
== General Configuration
|
||
|
|
||
|
The Fedora Ansible repository contains the
|
||
|
https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/cloud-image-uploader[OpenShift
|
||
|
application definition]. The playbook to create the OpenShift application is
|
||
|
located at `playbooks/openshift-apps/cloud-image-uploader.yml`.
|
||
|
|
||
|
Within the container image, configuration is provided via
|
||
|
`/etc/fedora-messaging/config.toml`. Additionally, secrets may be provided via
|
||
|
environment variables and are noted in the relevant cloud sections.
|
||
|
|
||
|
== Deploying
|
||
|
|
||
|
The service contains a single image and one pod in its deployment configuration.
|
||
|
|
||
|
=== Staging
|
||
|
|
||
|
The staging BuildConfig builds a container from
|
||
|
https://pagure.io/cloud-image-uploader/tree/main[the main branch]. You need to
|
||
|
trigger a build manually, either from the web UI or the CLI.
|
||
|
|
||
|
=== Production
|
||
|
|
||
|
The staging BuildConfig builds a container from
|
||
|
https://pagure.io/cloud-image-uploader/tree/prod[the prod branch]. Just like
|
||
|
staging, you need to trigger a build manually. After deploying to staging, the
|
||
|
main branch can be merged into the production branch to "promote" it:
|
||
|
|
||
|
....
|
||
|
$ git checkout prod && git merge --ff-only main
|
||
|
....
|
||
|
|
||
|
=== Azure
|
||
|
|
||
|
Images are uploaded whenever a compose that contains `vhd-compressed` images.
|
||
|
Images are first uploaded to a container in the storage account and then
|
||
|
imported into an Image Gallery.
|
||
|
|
||
|
Credentials for Azure are provided using environment variables. The credentials
|
||
|
are used by the
|
||
|
https://docs.ansible.com/ansible/latest/collections/azure/azcollection/index.html[Azure
|
||
|
Ansible collection].
|
||
|
|
||
|
==== Image Cleanup
|
||
|
|
||
|
Image clean-up is automated.
|
||
|
|
||
|
The storage account is configured to delete any blob in the container older
|
||
|
than 1 week and should require no manual attention. Nothing in the container is
|
||
|
required after the VHD is imported to the Image Gallery.
|
||
|
|
||
|
Images in the Gallery are cleaned up by the image uploader after a new image
|
||
|
has been uploaded. For complete details on the image cleanup policy refer to
|
||
|
the consumer code, but at the time of this writing the policy is as follows:
|
||
|
|
||
|
- Any image that has an end-of-life field that is in the past is removed.
|
||
|
|
||
|
- Only the latest 7 images that are marked as "excluded from latest = True"
|
||
|
within an image definition are retained. When an image is marked as "exclude
|
||
|
from latest = False", new virtual machines that don't reference an explicit
|
||
|
image version will boot using the newest image (following semver). All images
|
||
|
are uploaded with "excluded from latest = True" and are only marked as
|
||
|
"excluded from latest = False" after testing.
|
||
|
|
||
|
- Only the latest 7 images in the Rawhide image definitions are retained,
|
||
|
regardless of whether they are marked "excluded from latest = False".
|
||
|
|
||
|
At the moment, testing and promotion to "excluded from latest = False" is a
|
||
|
manual process, but in the future will be automated to happen regularly
|
||
|
(weekly, perhaps).
|
||
|
|
||
|
==== Authentication
|
||
|
|
||
|
The following environment variables are used:
|
||
|
|
||
|
....
|
||
|
AZURE_SUBSCRIPTION_ID - Identifies the subscription within an Azure tenant (our tenant only has 1)
|
||
|
AZURE_CLIENT_ID - The application ID used during authentication.
|
||
|
AZURE_SECRET - The application secret used during authentication.
|
||
|
AZURE_TENANT - Identifies the Azure tenant.
|
||
|
....
|
||
|
|
||
|
If you have access to the Fedora Project tenant, these values are available in
|
||
|
the https://portal.azure.com[web portal] under the Microsoft Entra ID service
|
||
|
in the "App registrations" tab. To manage things via the CLI you can do `dnf
|
||
|
install azure-cli`. All commands below assume you've logged in with `az login`.
|
||
|
|
||
|
There are two app registrations, `fedora-cloud-image-uploader` and
|
||
|
`fedora-cloud-image-uploader-staging`. These were created by running:
|
||
|
....
|
||
|
$ az ad app create --display-name fedora-cloud-image-uploader
|
||
|
....
|
||
|
|
||
|
==== Authorization
|
||
|
|
||
|
Images are placed in two resource groups (containers for arbitrary resources).
|
||
|
`fedora-cloud-staging` is used for the staging deployment, and `fedora-cloud`
|
||
|
is used for the production deployment.
|
||
|
|
||
|
The app registrations are granted access to their respective resource group by
|
||
|
assigning them a role on the resource group. The role definition can be seen with:
|
||
|
|
||
|
....
|
||
|
$ az role definition list --name "Image Uploader"
|
||
|
....
|
||
|
|
||
|
This role is then assigned to the app registration with
|
||
|
|
||
|
....
|
||
|
$ az role assignment create --assignee "fedora-cloud-image-uploader" \
|
||
|
--role "Image Uploader" \
|
||
|
--scope "/subscriptions/{subscription_id}/resourceGroups/fedora-cloud"
|
||
|
....
|
||
|
|
||
|
In the event that additional permissions are required, the role can be updated
|
||
|
with additional permission.
|
||
|
|
||
|
|
||
|
==== Credential rotation
|
||
|
|
||
|
At the moment, credentials are set to expire and will need to be periodically rotated. To do so via the CLI:
|
||
|
....
|
||
|
$ az ad app list -o table # Find the application to issue new secrets for and set CLIENT_ID to its "Id" field
|
||
|
$ touch azure_secret
|
||
|
$ chmod 600 azure_secret
|
||
|
$ SECRET_NAME="Some useful name for the secret"
|
||
|
$ az ad app credential reset --id $CLIENT_ID --append --display-name $SECRET_NAME --years 1 --query password --output tsv > azure_secret
|
||
|
....
|