infra-docs-fpo/modules/sysadmin_guide/pages/koji.adoc
Michal Konecny a00e7fc828 Add another option of disabling builder
Signed-off-by: Michal Konecny <mkonecny@redhat.com>
2024-10-02 10:50:19 +00:00

121 lines
3.8 KiB
Text

= Koji Infrastructure SOP
== Contents
* <<_contact_information>>
* <<_description>>
* <<_add_packages_into_buildroot>>
* <<_troubleshooting_and_resolution>>
** <<_kojid_wont_start_or_some_builders_wont_connect>>
** <<_disk_space_issues>>
** <<_checking_builders_are_all_checking_in_correctly>>
** <<_mntkoji_is_not_accessible_on_s390x_builder>>
== Contact Information
Owner::
Fedora Infrastructure Team
Contact::
#fedora-admin, sysadmin-build group
Persons::
mbonnet, dgilmore, f13, notting, mmcgrath, SmootherFrOgZ
Servers::
* koji.fedoraproject.org
* buildsys.fedoraproject.org
* xenbuilder[1-4]
* hammer1, ppc[1-4]
Purpose::
Build packages for Fedora.
== Description
Users submit builds to _koji.fedoraproject.org_ or
_buildsys.fedoraproject.org_. From there it gets passed on to the
builders.
== Add packages into Buildroot
Some contributors may have the need to build packages against fresh
built packages which are not into buildroot yet. Koji has override tags
as a Inheritance to the build tag in order to include them into
buildroot which can be set by:
....
koji tag-pkg dist-$release-override <package_nvr>
....
== Troubleshooting and Resolution
=== kojid won't start or some builders won't connect
In the event that some items are able to connect to koji while some are
not, please make sure that the database is not filled up on connections.
This is common if koji crashes and the db connections aren't properly
cleared. Upon restart many of the connections are full so koji cannot
reconnect. Clearing old connections is easy, guess about how long it the
new koji has been up and pick a number of minutes larger then that and
kill those queries. From _db-koji01_ as _postgres_ run:
....
echo "select procpid from pg_stat_activity where usename='koji' and now() - query_start \
>= '00:40:00' order by query_start;" | psql koji | grep "^ " | xargs kill
....
=== Disk Space Issues
The builders use a lot of temporary storage. Failed builds and old mock buildroots should be automatic cleaned, but in case it doesn't, remove /var/lib/mock/* and restart kojid on the affected builder:
....
systemctl restart kojid
....
[IMPORTANT]
====
aarch64 buildhw's have a lot of space taken up in /var/log/libvirt/qemu/nvram/ which can all be deleted to free up space.
====
=== Checking builders are all checking in correctly
To check builders, list all builders and grep by the time of last update. If the builders are checking in correctly the time of last update should be close to your current date/time, so use a command like the following example:
....
koji list-hosts --enabled | grep -v '04 Dec 2022 12:1'
....
[IMPORTANT]
====
Kojira process should only run on koji02. Never on koji01.
====
=== /mnt/koji is not accessible on s390x builder
After restarting any `s390x` machine in `inventory/builders` `[runroot]` group sshfs mounts are not mounted automatically. Those needs to be mounted manually.
. `mount /mnt/koji`
. `mount /srv/odcs`
[NOTE]
====
You need to have access to Bitwarden Vault for the password prompt.
====
=== OSError: [Errno 30] Read-only file system: "/var/tmp/koji/tasks/xxx"
For more information about this issue see link:<https://bugzilla.redhat.com/show_bug.cgi?id=2312886>[relevant bugzilla ticket].
The issue is also reported on parent task, make sure you found the exact task that failed to find the failing host.
Only way to deal with it currently is to disable the builder with issue in koji and reinstall it again.
==== Disabling builder in koji
. Generate a kerberos ticket
. `koji disable <builder_with_issue>`
[NOTE]
====
If you don't have permissions to disable builder using `koji disable` command. Ssh to the machine and stop
kojid service by running `systemctl stop kojid`. This will prevent the machine to serve any more builds.
====
==== Reinstalling the builder
. Run `groups/buildvm.yml` ansible playbook with `--limit <builder_hostname>`