Time to retire ODCS. ELN is moved off and that was the last thing using
it. Thanks for all the service ODCS!
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
This reverts commit cef30714a2.
Turns out the problem was just that asset cleanup wasn't running
because the testresults/images minimization job was blocking it.
prod is running out of disk space. Not sure why this has only
just started to happen, but...the numbers do seem to more or
less add up...
Signed-off-by: Adam Williamson <awilliam@redhat.com>
nirik and I went around and around a bit today and ended up back
where we started, but with a clearer understanding of where that
this. This explains it a bit better, and makes what's actually
going on in various places clearer with the use of appropriate
shared variables. This should not actually *change* anything at
all when deployed.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
so, this was working before somehow, but it was pretty clearly
wrong. We were using queues owned by openqa.stg on the prod
rabbitmq instance for the cases where the openQA "stg" consumers
need to listen to prod queues. This can only have been working
with an openqa.stg user on prod, which seems wrong. Instead,
these three consumers should do it the way the relval and
relvalami consumers do - use a queue owned by the "openqa" user,
but with a suffix so they have a different queue from the actual
prod queue. The upshot of this is that in the configs, we should
go from:
amqp_url = "amqps://openqa:@rabbitmq.fedoraproject.org/%2Fpubsub"
...
queue = "openqa.stg_scheduler"
- which is weird and I have no idea how it ever worked - to:
amqp_url = "amqps://openqa:@rabbitmq.fedoraproject.org/%2Fpubsub"
...
queue = "openqa_scheduler_stg"
- which seems much more sensible.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This is triggered by
https://pagure.io/fedora-infrastructure/issue/11375 , but the
changes are rather extensive. Unfortunately, some of the
relevant files got messed up by the alphabetical sort thing that
got run on several group variable files a while ago, so that
confuses the diff a bit - I had to unwind those changes to make
the files readable again in order to make these changes.
Ultimately the goal here is to make the config more consistent
and more functional - the variables used and their names should
be more consistently related to what they're actually *for*,
which I didn't entirely understand when setting this up. So
we have variables for the username being used in each case and
we use that variable where we're referring to the username, for
instance. This should also make the whole thing about the cases
where listeners on the openQA stg/lab instance need to listen
to prod messages clearer, too. It also makes the user creation
clearer by doing it explicitly, just once per user, instead of
haphazardly doing it implicitly through the queue definitions.
And finally it should also actually fix 11375, by giving the
appropriate write permissions to each user.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
300G doesn't seem like enough when we have candidate composes
alongside Rawhide and Branched, we're getting incomplete tests
because assets are being removed.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This unifies prod and stg onto the ways of doing things for the
latest packages, and rejigs the swtpm stuff a bit to tear down
more (we shouldn't need the custom SELinux policy any more).
Signed-off-by: Adam Williamson <awilliam@redhat.com>
sigh, needs to be here too as it's used from outside of the role
where the default is set. Not sure if there's a better fix for
this.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This sets us up for scheduling FCOS tests from messages, not
using a cron job. Also reduces some duplication of variables
between openqa-servers-common and the dispatcher role defaults.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
We just updated the scheduler code to handle job re-trigger
requests, we need to configure the listener to listen for the
appropriate messages.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This was done using yq (
https://mikefarah.gitbook.io/yq/operators/sort-keys )
Doing things this way makes it much easier to see if a variable is set
in a file or if two hosts differ in what variables they set. Hopefully
we can keep things sorted moving forward.
Basically this means just sort a-z anything you add to any host or group
vaiable and it will be in the right place.
Additionally, this enforces 'normal' intent rules for all the variable
files which we should also try and obey. 2 spaces for first level, 3 for
next, etc. When in doubt you can run yq on it.
This should cause NO actual vairable changes, it's all just readability
fixing for humans, ansible parses it exactly the same.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
Almost global anyway, i.e. inside the VPN.
The ipa/client-based shell access and sudo rules are only effective for
staging right now, the respective playbook bits are masked out for prod.
- Assign Ansible host groups to IPA host groups, the latter don't care
about 'stg' in the name and use dashes rather than underscores.
- Distill shell access groups from fas_client_groups in group and host
vars.
- Let all `sysadmin-*` groups in the previous list run anything via sudo
in the host group (except bastion & batcave).
- Remove `fas_client_groups` from staging host and group vars.
- Remove sudoers from staging host and group vars if only `sysadmin-*`
groups have shell access.
- Set up `ipa_client_shell_groups` on bastion to be a super set of the
same on batcave.
Newly created IPA host groups:
- autosign
- badges
- basset
- bastion
- batcave
- blockerbugs
- bodhi
- bugzilla2fedmsg
- busgateway
- datagrepper
- dbserver
- dns
- fedimg
- github2fedmsg
- ipa
- kernel-qa
- kerneltest
- kojibuilder
- kojihub
- kojipkgs
- logging
- mailman
- memcached
- mirrormanager
- nagios
- notifs
- oci-registry
- odcs
- openqa
- openqa-workers
- osbs
- packages
- pdc-web
- pkgs
- proxies
- rabbitmq
- releng-compose
- resultsdb
- secondary
- sign-bridge
- sundries
- value
- wiki
Signed-off-by: Nils Philippsen <nils@redhat.com>
The main goal of these changes is to allow all workers in each
deployment NFS write access to the factory share. This is because
I want to try using os-autoinst's at-job-run-time decompression
of disk images instead of openQA's at-asset-download-time
decompression; it avoids some awkwardness with the asset file
name, and should also actually allow us to drop the decompression
code from openQA I think.
I also rejigged various other things at the same time as they
kinda logically go together. It's mostly cleanups and tweaks to
group variables. I tried to handle more things explicitly with
variables, as it's better for use of these plays outside of
Fedora infra.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2020-11-05 16:10:32 -08:00
Renamed from inventory/group_vars/openqa_common (Browse further)