diff --git a/modules/sysadmin_guide/pages/cloud-image-uploader.adoc b/modules/sysadmin_guide/pages/cloud-image-uploader.adoc new file mode 100644 index 0000000..0336076 --- /dev/null +++ b/modules/sysadmin_guide/pages/cloud-image-uploader.adoc @@ -0,0 +1,164 @@ += cloud-image-uploader SOP + +Upload Cloud images to public clouds after they are built in Koji. + +Source code: https://pagure.io/cloud-image-uploader + +== Contact Information + +Owner:: + Cloud SIG, Jeremy Cline (jcline) +Contact:: + #cloud:fedoraproject.org (Matrix) +Servers:: + - https://console-openshift-console.apps.ocp.stg.fedoraproject.org/project-details/ns/cloud-image-uploader[Stage] + - https://console-openshift-console.apps.ocp.fedoraproject.org/project-details/ns/cloud-image-uploader[Production] + +Purpose:: + Upload Cloud images to public clouds. + +== Description + +cloud-image-uploader is an AMQP message consumer (run via `fedora-messaging +consume`) that processes Pungi compose messages published on the +`org.fedoraproject.*.pungi.compose.status.change` AMQP topic. When a compose +enters the `FINISHED` or `FINISHED_INCOMPLETE` states, the service downloads +any images in the compose and uploads it to the relevant cloud provider by +running an Ansible playbook. Consult the `playbooks` directory in the source +repository or Python package to see the playbooks. + +The service does not accept any incoming connections and only depends on the +RabbitMQ message broker and the relevant cloud provider's APIs. + +It requires a few gigabytes of temporary space to download the images before +uploading them to the cloud provider. It is heavily I/O bound and the most +computationally expensive thing it does is decompress the images. + +== General Configuration + +The Fedora Ansible repository contains the +https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/cloud-image-uploader[OpenShift +application definition]. The playbook to create the OpenShift application is +located at `playbooks/openshift-apps/cloud-image-uploader.yml`. + +Within the container image, configuration is provided via +`/etc/fedora-messaging/config.toml`. Additionally, secrets may be provided via +environment variables and are noted in the relevant cloud sections. + +== Deploying + +The service contains a single image and one pod in its deployment configuration. + +=== Staging + +The staging BuildConfig builds a container from +https://pagure.io/cloud-image-uploader/tree/main[the main branch]. You need to +trigger a build manually, either from the web UI or the CLI. + +=== Production + +The staging BuildConfig builds a container from +https://pagure.io/cloud-image-uploader/tree/prod[the prod branch]. Just like +staging, you need to trigger a build manually. After deploying to staging, the +main branch can be merged into the production branch to "promote" it: + +.... +$ git checkout prod && git merge --ff-only main +.... + +=== Azure + +Images are uploaded whenever a compose that contains `vhd-compressed` images. +Images are first uploaded to a container in the storage account and then +imported into an Image Gallery. + +Credentials for Azure are provided using environment variables. The credentials +are used by the +https://docs.ansible.com/ansible/latest/collections/azure/azcollection/index.html[Azure +Ansible collection]. + +==== Image Cleanup + +Image clean-up is automated. + +The storage account is configured to delete any blob in the container older +than 1 week and should require no manual attention. Nothing in the container is +required after the VHD is imported to the Image Gallery. + +Images in the Gallery are cleaned up by the image uploader after a new image +has been uploaded. For complete details on the image cleanup policy refer to +the consumer code, but at the time of this writing the policy is as follows: + +- Any image that has an end-of-life field that is in the past is removed. + +- Only the latest 7 images that are marked as "excluded from latest = True" + within an image definition are retained. When an image is marked as "exclude + from latest = False", new virtual machines that don't reference an explicit + image version will boot using the newest image (following semver). All images + are uploaded with "excluded from latest = True" and are only marked as + "excluded from latest = False" after testing. + +- Only the latest 7 images in the Rawhide image definitions are retained, + regardless of whether they are marked "excluded from latest = False". + +At the moment, testing and promotion to "excluded from latest = False" is a +manual process, but in the future will be automated to happen regularly +(weekly, perhaps). + +==== Authentication + +The following environment variables are used: + +.... +AZURE_SUBSCRIPTION_ID - Identifies the subscription within an Azure tenant (our tenant only has 1) +AZURE_CLIENT_ID - The application ID used during authentication. +AZURE_SECRET - The application secret used during authentication. +AZURE_TENANT - Identifies the Azure tenant. +.... + +If you have access to the Fedora Project tenant, these values are available in +the https://portal.azure.com[web portal] under the Microsoft Entra ID service +in the "App registrations" tab. To manage things via the CLI you can do `dnf +install azure-cli`. All commands below assume you've logged in with `az login`. + +There are two app registrations, `fedora-cloud-image-uploader` and +`fedora-cloud-image-uploader-staging`. These were created by running: +.... +$ az ad app create --display-name fedora-cloud-image-uploader +.... + +==== Authorization + +Images are placed in two resource groups (containers for arbitrary resources). +`fedora-cloud-staging` is used for the staging deployment, and `fedora-cloud` +is used for the production deployment. + +The app registrations are granted access to their respective resource group by +assigning them a role on the resource group. The role definition can be seen with: + +.... +$ az role definition list --name "Image Uploader" +.... + +This role is then assigned to the app registration with + +.... +$ az role assignment create --assignee "fedora-cloud-image-uploader" \ + --role "Image Uploader" \ + --scope "/subscriptions/{subscription_id}/resourceGroups/fedora-cloud" +.... + +In the event that additional permissions are required, the role can be updated +with additional permission. + + +==== Credential rotation + +At the moment, credentials are set to expire and will need to be periodically rotated. To do so via the CLI: +.... +$ az ad app list -o table # Find the application to issue new secrets for and set CLIENT_ID to its "Id" field +$ touch azure_secret +$ chmod 600 azure_secret +$ SECRET_NAME="Some useful name for the secret" +$ az ad app credential reset --id $CLIENT_ID --append --display-name $SECRET_NAME --years 1 --query password --output tsv > azure_secret +.... diff --git a/modules/sysadmin_guide/pages/index.adoc b/modules/sysadmin_guide/pages/index.adoc index 033681a..79969dd 100644 --- a/modules/sysadmin_guide/pages/index.adoc +++ b/modules/sysadmin_guide/pages/index.adoc @@ -82,6 +82,7 @@ xref:developer_guide:sops.adoc[Developing Standard Operating Procedures]. * xref:bodhi.adoc[Bodhi Infrastructure - Releng] * xref:bugzilla2fedmsg.adoc[Bugzilla 2 Fedmsg] * xref:bugzilla2fedmsg.adoc[bugzilla2fedmsg] +* xref:cloud-image-uploader.adoc[Cloud Image Uploader] * xref:collectd.adoc[Collectd] * xref:compose-tracker.adoc[Compose Tracker] * xref:contenthosting.adoc[Content Hosting Infrastructure]