fedora-image-uploader: deploy as multiple containers #2592

Merged
ryanlerch merged 1 commit from fiu-multi-container into main 2025-04-29 18:28:29 +00:00
Contributor

In the beginning, this just handled Azure images. Now it does Azure,
AWS, GCP, and containers. Currently, it processes images serially, which
is mostly okay. However, it does mean that whatever service is handled
last has to wait for all the others to succeed before it starts, and it
also means if any of the handlers for their respective platform fail, it
retries all the images again. For most things this is a no-op (or a
few inexpensive calls), but it does have to re-download the image from
Koji to checksum it.

This adds an AMQP message queue for each content type we handle, and
produces a fedora-messaging config for each content type. The deployment
is now made up of 4 containers: azure-image-uploader,
aws-image-uploader, container-image-uploader, and
google-cloud-image-uploader. They only differ in the secrets injected
into them and the fedora-messaging config file they use. The end result
is that images should be available faster and its more resilient to
remote services being down.

Finally, it's worth noting that this bumps the warning threshold for
queue sizes. It can take some services (Azure and AWS) upwards of 30
minutes to replicate the images around the world, and since we subscribe
to any compose status changes, it's not unreasonable for 5-10 messages
to stack up when we hit a compose change that is "FINISHED" with images.

Also note: the templating in here feels a little unpleasant. I wasn't sure how to loop on a role for queue suffixes, and those are disconnected from the queue suffix definitions in the config template. If there's a better way to handle it I'm all ears.

In the beginning, this just handled Azure images. Now it does Azure, AWS, GCP, and containers. Currently, it processes images serially, which is mostly okay. However, it does mean that whatever service is handled last has to wait for all the others to succeed before it starts, and it also means if any of the handlers for their respective platform fail, it retries *all* the images again. For most things this is a no-op (or a few inexpensive calls), but it does have to re-download the image from Koji to checksum it. This adds an AMQP message queue for each content type we handle, and produces a fedora-messaging config for each content type. The deployment is now made up of 4 containers: azure-image-uploader, aws-image-uploader, container-image-uploader, and google-cloud-image-uploader. They only differ in the secrets injected into them and the fedora-messaging config file they use. The end result is that images should be available faster and its more resilient to remote services being down. Finally, it's worth noting that this bumps the warning threshold for queue sizes. It can take some services (Azure and AWS) upwards of 30 minutes to replicate the images around the world, and since we subscribe to _any_ compose status changes, it's not unreasonable for 5-10 messages to stack up when we hit a compose change that is "FINISHED" with images. Also note: the templating in here feels a little unpleasant. I wasn't sure how to loop on a role for queue suffixes, and those are disconnected from the queue suffix definitions in the config template. If there's a better way to handle it I'm all ears.
First-time contributor

Build failed. More information on how to proceed and troubleshoot errors available at https://fedoraproject.org/wiki/Zuul-based-ci
https://fedora.softwarefactory-project.io/zuul/buildset/badd086f2f464e14bc9fd4e92dedcb3f

Build failed. More information on how to proceed and troubleshoot errors available at https://fedoraproject.org/wiki/Zuul-based-ci https://fedora.softwarefactory-project.io/zuul/buildset/badd086f2f464e14bc9fd4e92dedcb3f - [fi-ansible-lint-diff ](https://fedora.softwarefactory-project.io/zuul/build/d616810f353442659d40f0ddefb58fa9) : FAILURE in 2m 57s - [fi-yamllint-diff ](https://fedora.softwarefactory-project.io/zuul/build/0e4939373c5e46dba676ca888b754a1e) : SUCCESS in 2m 20s
Contributor

+1 for this as it makes the deployment much cleaner.

+1 for this as it makes the deployment much cleaner.
Contributor

Seems reasonable to me.

Would you like me to merge/deploy this? Or would you like to?

Seems reasonable to me. Would you like me to merge/deploy this? Or would you like to?
Author
Contributor

Seems reasonable to me.

Would you like me to merge/deploy this? Or would you like to?

I have thus far avoided needing privileges to run Ansible stuff (although perhaps that is becoming increasingly unreasonable), so please do merge + deploy, I'm around to fix anything I've messed up

> Seems reasonable to me. > > Would you like me to merge/deploy this? Or would you like to? > I have thus far avoided needing privileges to run Ansible stuff (although perhaps that is becoming increasingly unreasonable), so please do merge + deploy, I'm around to fix anything I've messed up
Contributor

rebased onto 240aa7b8e0

rebased onto 240aa7b8e04c1da9fc7a34ef04f4658b199300f1
Contributor

rebased onto 240aa7b8e0

rebased onto 240aa7b8e04c1da9fc7a34ef04f4658b199300f1
Contributor

ok. No problem.

ok. No problem.
Contributor

Pull-Request has been merged by kevin

Pull-Request has been merged by kevin
First-time contributor

Build failed. More information on how to proceed and troubleshoot errors available at https://fedoraproject.org/wiki/Zuul-based-ci
https://fedora.softwarefactory-project.io/zuul/buildset/5dd31710249049afb1321dc1d64887ce

Build failed. More information on how to proceed and troubleshoot errors available at https://fedoraproject.org/wiki/Zuul-based-ci https://fedora.softwarefactory-project.io/zuul/buildset/5dd31710249049afb1321dc1d64887ce - [fi-ansible-lint-diff ](https://fedora.softwarefactory-project.io/zuul/build/7e6e8a021b484ad4b61269f37d588b35) : FAILURE in 3m 15s - [fi-yamllint-diff ](https://fedora.softwarefactory-project.io/zuul/build/0c0f8f8a90f54b46b33fae7f8a42c65f) : SUCCESS in 2m 31s
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Infrastructure/ansible#2592
No description provided.