From 8f3a5c0d55f57206445fc56421b66f2c975d38ba Mon Sep 17 00:00:00 2001
From: Jeremy Cline <jeremycline@linux.microsoft.com>
Date: Tue, 29 Apr 2025 17:26:39 -0400
Subject: [PATCH] cloud-image-uploader: Add initial sections for AWS, GCP,
 containers

Also includes updates to reflect the new deployment style, and a brief
guide on testing in staging.

Signed-off-by: Jeremy Cline <jeremycline@linux.microsoft.com>
---
 .../pages/cloud-image-uploader.adoc           | 119 ++++++++++++++++--
 1 file changed, 107 insertions(+), 12 deletions(-)
diff --git a/modules/sysadmin_guide/pages/cloud-image-uploader.adoc b/modules/sysadmin_guide/pages/cloud-image-uploader.adoc
index 0336076..7656686 100644
--- a/modules/sysadmin_guide/pages/cloud-image-uploader.adoc
+++ b/modules/sysadmin_guide/pages/cloud-image-uploader.adoc
@@ -23,9 +23,7 @@ cloud-image-uploader is an AMQP message consumer (run via `fedora-messaging
 consume`) that processes Pungi compose messages published on the
 `org.fedoraproject.*.pungi.compose.status.change` AMQP topic. When a compose
 enters the `FINISHED` or `FINISHED_INCOMPLETE` states, the service downloads
-any images in the compose and uploads it to the relevant cloud provider by
-running an Ansible playbook. Consult the `playbooks` directory in the source
-repository or Python package to see the playbooks.
+any images in the compose and uploads it to the relevant cloud provider.
 
 The service does not accept any incoming connections and only depends on the
 RabbitMQ message broker and the relevant cloud provider's APIs.
@@ -41,13 +39,23 @@ https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/cloud-im
 application definition]. The playbook to create the OpenShift application is
 located at `playbooks/openshift-apps/cloud-image-uploader.yml`.
 
-Within the container image, configuration is provided via
-`/etc/fedora-messaging/config.toml`. Additionally, secrets may be provided via
-environment variables and are noted in the relevant cloud sections.
+The Ansible playbook creates multiple fedora-messaging configuration files from
+the `config.toml` template. All application configuration is either in the
+fedora-messaging configuration file or in environment variables. The
+environment variables are used for secrets and vary based on which service the
+container handles.
+
+The fedora-messaging configuration file in use by a container is defined in the
+`FEDORA_MESSAGING_CONF` environment variable.
 
 == Deploying
 
-The service contains a single image and one pod in its deployment configuration.
+The OpenShift deployment consists a single image and multiple containers using
+that image, one container for each content type (containers, azure, aws, and
+gcp). The only variation between the containers is the secrets volumes mounted,
+secrets injected via environment variables, and the `FEDORA_MESSAGING_CONF`
+environment variable which points to one of the fedora-messaging configurations
+in `/etc/fedora-messaging/`.
 
 === Staging
 
@@ -55,9 +63,20 @@ The staging BuildConfig builds a container from
 https://pagure.io/cloud-image-uploader/tree/main[the main branch]. You need to
 trigger a build manually, either from the web UI or the CLI.
 
+Although composes are not done in staging, it's still possible to test in
+staging manually. First, start a debug terminal to enter a running container.
+Next, find an AMQP message for a
+https://apps.fedoraproject.org/datagrepper/v2/search?topic=org.fedoraproject.prod.pungi.compose.status.change[production
+compose] in the `FINISHED` or `FINISHED_INCOMPLETE` state. You can trigger the
+fedora-messaging consumer to process the message by running:
+
+....
+FEDORA_MESSAGING_CONF=/etc/fedora-messaging/service-config.toml fedora-messaging reconsume <message-id>
+....
+
 === Production
 
-The staging BuildConfig builds a container from
+The production BuildConfig builds a container from
 https://pagure.io/cloud-image-uploader/tree/prod[the prod branch]. Just like
 staging, you need to trigger a build manually. After deploying to staging, the
 main branch can be merged into the production branch to "promote" it:
@@ -72,10 +91,8 @@ Images are uploaded whenever a compose that contains `vhd-compressed` images.
 Images are first uploaded to a container in the storage account and then
 imported into an Image Gallery.
 
-Credentials for Azure are provided using environment variables. The credentials
-are used by the
-https://docs.ansible.com/ansible/latest/collections/azure/azcollection/index.html[Azure
-Ansible collection].
+Credentials for Azure are provided using environment variables and are
+discovered by the Azure Python SDK automatically.
 
 ==== Image Cleanup
 
@@ -155,6 +172,7 @@ with additional permission.
 ==== Credential rotation
 
 At the moment, credentials are set to expire and will need to be periodically rotated. To do so via the CLI:
+
 ....
 $ az ad app list -o table  # Find the application to issue new secrets for and set CLIENT_ID to its "Id" field
 $ touch azure_secret
@@ -162,3 +180,80 @@ $ chmod 600 azure_secret
 $ SECRET_NAME="Some useful name for the secret"
 $ az ad app credential reset --id $CLIENT_ID --append --display-name $SECRET_NAME --years 1 --query password --output tsv > azure_secret
 ....
+
+=== AWS
+
+AWS images are uploaded by this service to the Fedora AWS account. Cleanup is
+handled by the general Fedora AWS resource cleaner and uses the tags applied to
+a resource to determine when to remove them.
+
+Images are first uploaded to the `fedora-s3-bucket-fedimg` S3 bucket, and then
+imported as EC2 snapshots to the region configured in the `base_region` setting
+of the `consumer_config.aws` section. The snapshot is then replicated to all
+the regions listed in the `ami_regions` setting.
+
+==== New Regions
+
+In the event that a new region becomes available and users want Fedora Cloud
+images there, simply add the new region to the `ami_regions` list.
+
+
+=== Containers
+
+Containers are pushed to the `registry.fedoraproject.org` and `quay.io/fedora/`
+registries. These include the Fedora Toolbox, Fedora and Fedora Minimal, ELN,
+and Atomic Desktop images.
+
+==== Adding New Container Images
+
+The configuration contains a mapping of variants to registry repositories in
+the `consumer_config.container.repos` configuration section. In order to handle
+a new container image, a new mapping should be added to this dictionary.
+
+=== Google Cloud Engine
+
+Google Cloud Engine images are published under the `fedora-cloud` project in
+Google Cloud Platform. The flow is similar to other clouds, as the tarball is
+uploaded to the `fedora-cloud-image-upload` bucket and then imported as a
+machine image. The bucket has a lifecycle configuration to delete an object 3
+days after it has been created so old tarballs are cleaned up automatically
+after being imported.
+
+==== Credentials
+
+The service uses the
+`fedora-image-uploader@fedora-cloud.iam.gserviceaccount.com` service account.
+New credentials can be issued for that account under the IAM & Admin panel,
+although the current credentials do not expire.
+
+==== Permissions
+
+The service account is assigned the `Fedora Image Uploader` role which should
+grant it the minimal permissions required to manage images. The current
+permission list is as follows:
+
+  - compute.globalOperations.get
+  - compute.images.create
+  - compute.images.createTagBinding
+  - compute.images.delete
+  - compute.images.deleteTagBinding
+  - compute.images.deprecate
+  - compute.images.get
+  - compute.images.getFromFamily
+  - compute.images.list
+  - compute.images.listEffectiveTags
+  - compute.images.listTagBindings
+  - compute.images.setLabels
+  - compute.images.update
+  - compute.images.useReadOnly
+  - resourcemanager.projects.get
+
+In the event that the application requires new permissions, edit the `Fedora
+Image Uploader` role to include the new permissions.
+
+==== Cleanup
+
+Machine images are labeled to include their `end-of-life` date. After this date
+is reached, the image is removed. Images are uploaded as "deprecated" by
+default. Every two weeks an image in an Image Family is promoted and marked as
+not deprecated. Deprecated images are removed after two weeks.