Add an SOP for cloud-image-uploader
This is just the basics for now. Container configurations, AWS, and GCP need sections (once the image uploader supports those clouds).
This commit is contained in:
parent
475e4023ad
commit
9f6ee3c568
2 changed files with 165 additions and 0 deletions
164
modules/sysadmin_guide/pages/cloud-image-uploader.adoc
Normal file
164
modules/sysadmin_guide/pages/cloud-image-uploader.adoc
Normal file
|
@ -0,0 +1,164 @@
|
||||||
|
= cloud-image-uploader SOP
|
||||||
|
|
||||||
|
Upload Cloud images to public clouds after they are built in Koji.
|
||||||
|
|
||||||
|
Source code: https://pagure.io/cloud-image-uploader
|
||||||
|
|
||||||
|
== Contact Information
|
||||||
|
|
||||||
|
Owner::
|
||||||
|
Cloud SIG, Jeremy Cline (jcline)
|
||||||
|
Contact::
|
||||||
|
#cloud:fedoraproject.org (Matrix)
|
||||||
|
Servers::
|
||||||
|
- https://console-openshift-console.apps.ocp.stg.fedoraproject.org/project-details/ns/cloud-image-uploader[Stage]
|
||||||
|
- https://console-openshift-console.apps.ocp.fedoraproject.org/project-details/ns/cloud-image-uploader[Production]
|
||||||
|
|
||||||
|
Purpose::
|
||||||
|
Upload Cloud images to public clouds.
|
||||||
|
|
||||||
|
== Description
|
||||||
|
|
||||||
|
cloud-image-uploader is an AMQP message consumer (run via `fedora-messaging
|
||||||
|
consume`) that processes Pungi compose messages published on the
|
||||||
|
`org.fedoraproject.*.pungi.compose.status.change` AMQP topic. When a compose
|
||||||
|
enters the `FINISHED` or `FINISHED_INCOMPLETE` states, the service downloads
|
||||||
|
any images in the compose and uploads it to the relevant cloud provider by
|
||||||
|
running an Ansible playbook. Consult the `playbooks` directory in the source
|
||||||
|
repository or Python package to see the playbooks.
|
||||||
|
|
||||||
|
The service does not accept any incoming connections and only depends on the
|
||||||
|
RabbitMQ message broker and the relevant cloud provider's APIs.
|
||||||
|
|
||||||
|
It requires a few gigabytes of temporary space to download the images before
|
||||||
|
uploading them to the cloud provider. It is heavily I/O bound and the most
|
||||||
|
computationally expensive thing it does is decompress the images.
|
||||||
|
|
||||||
|
== General Configuration
|
||||||
|
|
||||||
|
The Fedora Ansible repository contains the
|
||||||
|
https://pagure.io/fedora-infra/ansible/blob/main/f/roles/openshift-apps/cloud-image-uploader[OpenShift
|
||||||
|
application definition]. The playbook to create the OpenShift application is
|
||||||
|
located at `playbooks/openshift-apps/cloud-image-uploader.yml`.
|
||||||
|
|
||||||
|
Within the container image, configuration is provided via
|
||||||
|
`/etc/fedora-messaging/config.toml`. Additionally, secrets may be provided via
|
||||||
|
environment variables and are noted in the relevant cloud sections.
|
||||||
|
|
||||||
|
== Deploying
|
||||||
|
|
||||||
|
The service contains a single image and one pod in its deployment configuration.
|
||||||
|
|
||||||
|
=== Staging
|
||||||
|
|
||||||
|
The staging BuildConfig builds a container from
|
||||||
|
https://pagure.io/cloud-image-uploader/tree/main[the main branch]. You need to
|
||||||
|
trigger a build manually, either from the web UI or the CLI.
|
||||||
|
|
||||||
|
=== Production
|
||||||
|
|
||||||
|
The staging BuildConfig builds a container from
|
||||||
|
https://pagure.io/cloud-image-uploader/tree/prod[the prod branch]. Just like
|
||||||
|
staging, you need to trigger a build manually. After deploying to staging, the
|
||||||
|
main branch can be merged into the production branch to "promote" it:
|
||||||
|
|
||||||
|
....
|
||||||
|
$ git checkout prod && git merge --ff-only main
|
||||||
|
....
|
||||||
|
|
||||||
|
=== Azure
|
||||||
|
|
||||||
|
Images are uploaded whenever a compose that contains `vhd-compressed` images.
|
||||||
|
Images are first uploaded to a container in the storage account and then
|
||||||
|
imported into an Image Gallery.
|
||||||
|
|
||||||
|
Credentials for Azure are provided using environment variables. The credentials
|
||||||
|
are used by the
|
||||||
|
https://docs.ansible.com/ansible/latest/collections/azure/azcollection/index.html[Azure
|
||||||
|
Ansible collection].
|
||||||
|
|
||||||
|
==== Image Cleanup
|
||||||
|
|
||||||
|
Image clean-up is automated.
|
||||||
|
|
||||||
|
The storage account is configured to delete any blob in the container older
|
||||||
|
than 1 week and should require no manual attention. Nothing in the container is
|
||||||
|
required after the VHD is imported to the Image Gallery.
|
||||||
|
|
||||||
|
Images in the Gallery are cleaned up by the image uploader after a new image
|
||||||
|
has been uploaded. For complete details on the image cleanup policy refer to
|
||||||
|
the consumer code, but at the time of this writing the policy is as follows:
|
||||||
|
|
||||||
|
- Any image that has an end-of-life field that is in the past is removed.
|
||||||
|
|
||||||
|
- Only the latest 7 images that are marked as "excluded from latest = True"
|
||||||
|
within an image definition are retained. When an image is marked as "exclude
|
||||||
|
from latest = False", new virtual machines that don't reference an explicit
|
||||||
|
image version will boot using the newest image (following semver). All images
|
||||||
|
are uploaded with "excluded from latest = True" and are only marked as
|
||||||
|
"excluded from latest = False" after testing.
|
||||||
|
|
||||||
|
- Only the latest 7 images in the Rawhide image definitions are retained,
|
||||||
|
regardless of whether they are marked "excluded from latest = False".
|
||||||
|
|
||||||
|
At the moment, testing and promotion to "excluded from latest = False" is a
|
||||||
|
manual process, but in the future will be automated to happen regularly
|
||||||
|
(weekly, perhaps).
|
||||||
|
|
||||||
|
==== Authentication
|
||||||
|
|
||||||
|
The following environment variables are used:
|
||||||
|
|
||||||
|
....
|
||||||
|
AZURE_SUBSCRIPTION_ID - Identifies the subscription within an Azure tenant (our tenant only has 1)
|
||||||
|
AZURE_CLIENT_ID - The application ID used during authentication.
|
||||||
|
AZURE_SECRET - The application secret used during authentication.
|
||||||
|
AZURE_TENANT - Identifies the Azure tenant.
|
||||||
|
....
|
||||||
|
|
||||||
|
If you have access to the Fedora Project tenant, these values are available in
|
||||||
|
the https://portal.azure.com[web portal] under the Microsoft Entra ID service
|
||||||
|
in the "App registrations" tab. To manage things via the CLI you can do `dnf
|
||||||
|
install azure-cli`. All commands below assume you've logged in with `az login`.
|
||||||
|
|
||||||
|
There are two app registrations, `fedora-cloud-image-uploader` and
|
||||||
|
`fedora-cloud-image-uploader-staging`. These were created by running:
|
||||||
|
....
|
||||||
|
$ az ad app create --display-name fedora-cloud-image-uploader
|
||||||
|
....
|
||||||
|
|
||||||
|
==== Authorization
|
||||||
|
|
||||||
|
Images are placed in two resource groups (containers for arbitrary resources).
|
||||||
|
`fedora-cloud-staging` is used for the staging deployment, and `fedora-cloud`
|
||||||
|
is used for the production deployment.
|
||||||
|
|
||||||
|
The app registrations are granted access to their respective resource group by
|
||||||
|
assigning them a role on the resource group. The role definition can be seen with:
|
||||||
|
|
||||||
|
....
|
||||||
|
$ az role definition list --name "Image Uploader"
|
||||||
|
....
|
||||||
|
|
||||||
|
This role is then assigned to the app registration with
|
||||||
|
|
||||||
|
....
|
||||||
|
$ az role assignment create --assignee "fedora-cloud-image-uploader" \
|
||||||
|
--role "Image Uploader" \
|
||||||
|
--scope "/subscriptions/{subscription_id}/resourceGroups/fedora-cloud"
|
||||||
|
....
|
||||||
|
|
||||||
|
In the event that additional permissions are required, the role can be updated
|
||||||
|
with additional permission.
|
||||||
|
|
||||||
|
|
||||||
|
==== Credential rotation
|
||||||
|
|
||||||
|
At the moment, credentials are set to expire and will need to be periodically rotated. To do so via the CLI:
|
||||||
|
....
|
||||||
|
$ az ad app list -o table # Find the application to issue new secrets for and set CLIENT_ID to its "Id" field
|
||||||
|
$ touch azure_secret
|
||||||
|
$ chmod 600 azure_secret
|
||||||
|
$ SECRET_NAME="Some useful name for the secret"
|
||||||
|
$ az ad app credential reset --id $CLIENT_ID --append --display-name $SECRET_NAME --years 1 --query password --output tsv > azure_secret
|
||||||
|
....
|
|
@ -82,6 +82,7 @@ xref:developer_guide:sops.adoc[Developing Standard Operating Procedures].
|
||||||
* xref:bodhi.adoc[Bodhi Infrastructure - Releng]
|
* xref:bodhi.adoc[Bodhi Infrastructure - Releng]
|
||||||
* xref:bugzilla2fedmsg.adoc[Bugzilla 2 Fedmsg]
|
* xref:bugzilla2fedmsg.adoc[Bugzilla 2 Fedmsg]
|
||||||
* xref:bugzilla2fedmsg.adoc[bugzilla2fedmsg]
|
* xref:bugzilla2fedmsg.adoc[bugzilla2fedmsg]
|
||||||
|
* xref:cloud-image-uploader.adoc[Cloud Image Uploader]
|
||||||
* xref:collectd.adoc[Collectd]
|
* xref:collectd.adoc[Collectd]
|
||||||
* xref:compose-tracker.adoc[Compose Tracker]
|
* xref:compose-tracker.adoc[Compose Tracker]
|
||||||
* xref:contenthosting.adoc[Content Hosting Infrastructure]
|
* xref:contenthosting.adoc[Content Hosting Infrastructure]
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue