AFAIK we use `services_disabled: true` only when upgrading our severs
from Fedora N to Fedora N+1 which is a big task and we go instances
one by one. Maybe the easier ones in parallel but copr-backend
requires full attention for sure.
If we set `services_disabled: true` for the whole devl/production, we
can't test if the services on each server are starting properly nor we
can't run at least the frontend until all servers are ready.
Let's configure the variable per instance, so whoever is upgrading it
can disable/enable the services when needed.
It should be redundant and we observe strage things such as 4x
removing and adding ssh keys, having to manualy confirm "Are you sure
you want to continue connecting (yes/no/[fingerprint])?" and so
on. Let's try to disable the role.
Seems like either the RHEL 8 (batcave) or Fedora 35 system (Fedora Copr
Infra) prefers ed25519 keys over rsa, leading to weird auth problems:
TASK [allow root ssh connections] ***************************************************************************************************************************
Monday 29 November 2021 13:06:43 +0000 (0:00:00.314) 0:00:03.632 *******
Monday 29 November 2021 13:06:43 +0000 (0:00:00.314) 0:00:03.632 *******
fatal: [copr-be-dev.aws.fedoraproject.org]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"copr-be-dev.aws.fedoraproject.org\". Make sure this host can be reached over ssh: Certificate invalid: name is not a listed principal\r\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\n@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @\r\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\nIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\r\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\r\nIt is also possible that a host key has just been changed.\r\nThe fingerprint for the ED25519 key sent by the remote host is\nSHA256:Cgs/aoJl9OJheAtZZ2CDiYx9ZeFMwD6dUYUJpPDTl58.\r\nPlease contact your system administrator.\r\nAdd correct host key in /root/.ssh/known_hosts to get rid of this message.\r\nOffending RSA key in /root/.ssh/known_hosts:21\r\nED25519 host key for copr-be-dev.aws.fedoraproject.org has changed and you have requested strict checking.\r\nHost key verification failed.\r\n", "unreachable": true}
This lets us move forward with the tomorrow's update. The previous
hack(s) were not OK.
We observed a situation when two keys were specified in known_hosts, and
only one was removed by the playbook. At least we think this is what is
actually happening.
We shouldn't install `nrpe` package in the `copr/base` playbook
because it creates the following user
nrpe❌992:991:NRPE user for the NRPE service:/var/run/nrpe:/sbin/nologin
That UID collides with an user for keygen
- user: name="copr-signer" group=copr-signer groups=apache uid=992
The `nrpe` installation needs to be done later, in the `nagios_client`
role that we call after `copr/keygen` role.
This also tears down our swtpm systemd service setup, as
os-autoinst should now handle swtpm device setup for us.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
sigh, needs to be here too as it's used from outside of the role
where the default is set. Not sure if there's a better fix for
this.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This sets us up for scheduling FCOS tests from messages, not
using a cron job. Also reduces some duplication of variables
between openqa-servers-common and the dispatcher role defaults.
Signed-off-by: Adam Williamson <awilliam@redhat.com>