There were folks on site this week to rack new machines/pull old
machines, and unfortunately we don't really have much control over when
this happens based on our freeze, so I am just pushing this as part of
the 'do whats required to handle an outage'.
We did the following changes:
- removed old autosign01 (was out of service as we moved to autosign02 a
while ago)
- removed vmhost-x86-08/09. We also want to migrate off 07 soon and
remove it next visit. A new vmhost-x86-08 is installed to replace
these 3.
- removed vmhost-x86-03/04.stg. Added new vmhost-x86-01.stg to replace
them both.
- added a new kernel02 to replace kernel01 the next onsite trip.
This machine still needs switch ports configured.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
This will ensure that people don't "accidentally" export their staging
badges to their official backpack.
Signed-off-by: Aurélien Bompard <aurelien@bompard.org>
env renders to "production" which is not what messages are published
under ("prod"). Match what other apps are doing and just use a wildcard
so it'll match anything. Since prod and stage are separate brokers this
is fine.
The image needs to be replicated to a region to be usable in that
region. It's likely we'll want to expand this list and potentially add
logic to the uploader to not replicate nightly images until they are
promoted to the latest image in the stream so I've templated it it
in the configuration.
Storage account names need to be globally unique. It seems fedoraimages
was already taken, so I've adjusted it to one that's not taken. It's
only used to import the images so the name doesn't really matter.
This machine has been replaced and so we need to update mac address.
This is technically breaking the freeze, but this machine isn't frozen
and shouldn't affect anything else.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
This should be used to determine which Copr repository (staging or
production) to use when running a playbook.
Signed-off-by: Siteshwar Vashisht <svashisht@redhat.com>
Sometime in the past we manually bumped memory on these, but when I
reinstalled koji02 it got the lower limit set here in ansible.
So, move both of them to 56gb and hopefully fix koji02 falling over
under load.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
Previous staging deployment used production FAS instance for authentication and
it seems to make new accounts when pointing to staging FAS. Let's redirect that
and see if the accounts will be correctly loaded.
Signed-off-by: Michal Konecny <mkonecny@redhat.com>
I assumed gallery names were unique per resource group, but this is not
the case. They're unique per subscription, oddly, so we need to use a
different name in staging.
The client certificate contains "cloud-image-uploader.stg" for the CN,
so our RabbitMQ name needs to match. Additionally, the queue name needs
to start with the username, so we need to adjust that as well.
For an unknown reason on staging, the datanommer pod overloads the node memory and
takes down all the running workload with it.
This set a memory limit to 1Gi (pod takes ~200Mi on prod) to avoid
crashing the compute node (and other workload with it) when that happens.