Add an SOP for cloud-image-uploader
This is just the basics for now. Container configurations, AWS, and GCP need sections (once the image uploader supports those clouds).
This commit is contained in:
parent
475e4023ad
commit
9f6ee3c568
2 changed files with 165 additions and 0 deletions
164
modules/sysadmin_guide/pages/cloud-image-uploader.adoc
Normal file
164
modules/sysadmin_guide/pages/cloud-image-uploader.adoc
Normal file
|
@ -0,0 +1,164 @@
|
|||
= cloud-image-uploader SOP
|
||||
|
||||
Upload Cloud images to public clouds after they are built in Koji.
|
||||
|
||||
Source code: https://pagure.io/cloud-image-uploader
|
||||
|
||||
== Contact Information
|
||||
|
||||
Owner::
|
||||
Cloud SIG, Jeremy Cline (jcline)
|
||||
Contact::
|
||||
#cloud:fedoraproject.org (Matrix)
|
||||
Servers::
|
||||
- https://console-openshift-console.apps.ocp.stg.fedoraproject.org/project-details/ns/cloud-image-uploader[Stage]
|
||||
- https://console-openshift-console.apps.ocp.fedoraproject.org/project-details/ns/cloud-image-uploader[Production]
|
||||
|
||||
Purpose::
|
||||
Upload Cloud images to public clouds.
|
||||
|
||||
== Description
|
||||
|
||||
cloud-image-uploader is an AMQP message consumer (run via `fedora-messaging
|
||||
consume`) that processes Pungi compose messages published on the
|
||||
`org.fedoraproject.*.pungi.compose.status.change` AMQP topic. When a compose
|
||||
enters the `FINISHED` or `FINISHED_INCOMPLETE` states, the service downloads
|
||||
any images in the compose and uploads it to the relevant cloud provider by
|
||||
running an Ansible playbook. Consult the `playbooks` directory in the source
|
||||
repository or Python package to see the playbooks.
|
||||
|
||||
The service does not accept any incoming connections and only depends on the
|
||||
RabbitMQ message broker and the relevant cloud provider's APIs.
|
||||
|
||||
It requires a few gigabytes of temporary space to download the images before
|
||||
uploading them to the cloud provider. It is heavily I/O bound and the most
|
||||
computationally expensive thing it does is decompress the images.
|
||||
|
||||
== General Configuration
|
||||
|
||||
The Fedora Ansible repository contains the
|
||||
https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/cloud-image-uploader[OpenShift
|
||||
application definition]. The playbook to create the OpenShift application is
|
||||
located at `playbooks/openshift-apps/cloud-image-uploader.yml`.
|
||||
|
||||
Within the container image, configuration is provided via
|
||||
`/etc/fedora-messaging/config.toml`. Additionally, secrets may be provided via
|
||||
environment variables and are noted in the relevant cloud sections.
|
||||
|
||||
== Deploying
|
||||
|
||||
The service contains a single image and one pod in its deployment configuration.
|
||||
|
||||
=== Staging
|
||||
|
||||
The staging BuildConfig builds a container from
|
||||
https://pagure.io/cloud-image-uploader/tree/main[the main branch]. You need to
|
||||
trigger a build manually, either from the web UI or the CLI.
|
||||
|
||||
=== Production
|
||||
|
||||
The staging BuildConfig builds a container from
|
||||
https://pagure.io/cloud-image-uploader/tree/prod[the prod branch]. Just like
|
||||
staging, you need to trigger a build manually. After deploying to staging, the
|
||||
main branch can be merged into the production branch to "promote" it:
|
||||
|
||||
....
|
||||
$ git checkout prod && git merge --ff-only main
|
||||
....
|
||||
|
||||
=== Azure
|
||||
|
||||
Images are uploaded whenever a compose that contains `vhd-compressed` images.
|
||||
Images are first uploaded to a container in the storage account and then
|
||||
imported into an Image Gallery.
|
||||
|
||||
Credentials for Azure are provided using environment variables. The credentials
|
||||
are used by the
|
||||
https://docs.ansible.com/ansible/latest/collections/azure/azcollection/index.html[Azure
|
||||
Ansible collection].
|
||||
|
||||
==== Image Cleanup
|
||||
|
||||
Image clean-up is automated.
|
||||
|
||||
The storage account is configured to delete any blob in the container older
|
||||
than 1 week and should require no manual attention. Nothing in the container is
|
||||
required after the VHD is imported to the Image Gallery.
|
||||
|
||||
Images in the Gallery are cleaned up by the image uploader after a new image
|
||||
has been uploaded. For complete details on the image cleanup policy refer to
|
||||
the consumer code, but at the time of this writing the policy is as follows:
|
||||
|
||||
- Any image that has an end-of-life field that is in the past is removed.
|
||||
|
||||
- Only the latest 7 images that are marked as "excluded from latest = True"
|
||||
within an image definition are retained. When an image is marked as "exclude
|
||||
from latest = False", new virtual machines that don't reference an explicit
|
||||
image version will boot using the newest image (following semver). All images
|
||||
are uploaded with "excluded from latest = True" and are only marked as
|
||||
"excluded from latest = False" after testing.
|
||||
|
||||
- Only the latest 7 images in the Rawhide image definitions are retained,
|
||||
regardless of whether they are marked "excluded from latest = False".
|
||||
|
||||
At the moment, testing and promotion to "excluded from latest = False" is a
|
||||
manual process, but in the future will be automated to happen regularly
|
||||
(weekly, perhaps).
|
||||
|
||||
==== Authentication
|
||||
|
||||
The following environment variables are used:
|
||||
|
||||
....
|
||||
AZURE_SUBSCRIPTION_ID - Identifies the subscription within an Azure tenant (our tenant only has 1)
|
||||
AZURE_CLIENT_ID - The application ID used during authentication.
|
||||
AZURE_SECRET - The application secret used during authentication.
|
||||
AZURE_TENANT - Identifies the Azure tenant.
|
||||
....
|
||||
|
||||
If you have access to the Fedora Project tenant, these values are available in
|
||||
the https://portal.azure.com[web portal] under the Microsoft Entra ID service
|
||||
in the "App registrations" tab. To manage things via the CLI you can do `dnf
|
||||
install azure-cli`. All commands below assume you've logged in with `az login`.
|
||||
|
||||
There are two app registrations, `fedora-cloud-image-uploader` and
|
||||
`fedora-cloud-image-uploader-staging`. These were created by running:
|
||||
....
|
||||
$ az ad app create --display-name fedora-cloud-image-uploader
|
||||
....
|
||||
|
||||
==== Authorization
|
||||
|
||||
Images are placed in two resource groups (containers for arbitrary resources).
|
||||
`fedora-cloud-staging` is used for the staging deployment, and `fedora-cloud`
|
||||
is used for the production deployment.
|
||||
|
||||
The app registrations are granted access to their respective resource group by
|
||||
assigning them a role on the resource group. The role definition can be seen with:
|
||||
|
||||
....
|
||||
$ az role definition list --name "Image Uploader"
|
||||
....
|
||||
|
||||
This role is then assigned to the app registration with
|
||||
|
||||
....
|
||||
$ az role assignment create --assignee "fedora-cloud-image-uploader" \
|
||||
--role "Image Uploader" \
|
||||
--scope "/subscriptions/{subscription_id}/resourceGroups/fedora-cloud"
|
||||
....
|
||||
|
||||
In the event that additional permissions are required, the role can be updated
|
||||
with additional permission.
|
||||
|
||||
|
||||
==== Credential rotation
|
||||
|
||||
At the moment, credentials are set to expire and will need to be periodically rotated. To do so via the CLI:
|
||||
....
|
||||
$ az ad app list -o table # Find the application to issue new secrets for and set CLIENT_ID to its "Id" field
|
||||
$ touch azure_secret
|
||||
$ chmod 600 azure_secret
|
||||
$ SECRET_NAME="Some useful name for the secret"
|
||||
$ az ad app credential reset --id $CLIENT_ID --append --display-name $SECRET_NAME --years 1 --query password --output tsv > azure_secret
|
||||
....
|
Loading…
Add table
Add a link
Reference in a new issue