infra-docs-fpo/modules/sysadmin_guide/pages/koschei.adoc

241 lines
8.3 KiB
Text
Raw Normal View History

= Koschei SOP
Koschei is a continuous integration system for RPM packages. Koschei
runs package scratch builds after dependency change or after time elapse
and reports package buildability status to interested parties.
Production instance::
2024-06-06 07:16:52 +02:00
https://koschei.fedoraproject.org/
Staging instance::
https://koschei.stg.fedoraproject.org/
== Contact Information
Owner::
2024-06-06 07:16:52 +02:00
mizdebsk
Contact::
#fedora-admin
Location::
2024-06-06 07:16:52 +02:00
Fedora infrastructure OpenShift
Purpose::
continuous integration system
2024-06-06 07:16:52 +02:00
== Description
Koschei consists of frontend and backend.
Frontend is a web application written in Python using Flask framework.
It is ran under Apache httpd with mod_wsgi as a WSGi application.
Frontend displays information to users and allows editing package
groups and changing priorities.
Backend consists of a couple of loosely-coupled microservices,
including:
* `watcher` - listens to events on Fedora messaging bus for complete
builds and changes build states in the database.
* `repo-resolver` - resolves package dependencies in given repo using
hawkey and compares them with previous iteration to get a dependency
diff. It resolves all packages in the newest repo available in
Koji. The output is a base for scheduling new builds.
* `build-resolver` - resolves complete builds in the repo in which
they were done in Koji. Produces the dependency differences visible in
the frontend.
* `scheduler` - schedules new builds based on multiple criteria:
** dependency priority - dependency changes since last build valued by
their distance in the dependency graph
** manual and static priorities - set manually in the frontend. Manual
priority is reset after each build, static priority persists
** time priority - time elapsed since the last build.
* `polling` - polls the same types of events as `watcher` without
reliance on the messaging bus. Additionally takes care of package list
synchronization and other regularly executed tasks.
== Deployment
2024-06-06 07:16:52 +02:00
Koschei deployment is managed by an Ansible playbook:
....
2024-06-06 07:16:52 +02:00
sudo rbac-playbook openshift-apps/koschei.yml
....
2024-06-06 07:16:52 +02:00
The above playbook is idempotent, which means that running it has no
effect when everything is already configured as expected.
2024-06-06 07:16:52 +02:00
Koschei is fully-containerized. It is deployed on OpenShift.
Koschei is stateless. It doesn't use any persistent storage. All
non-volatile information is stored in PostgreSQL database, which is
not part of Koschei, but an external service that Koschei depends on.
There is one common container image for different Koschei workloads --
frontend and backend containers are all ran from the same image.
2024-06-06 07:16:52 +02:00
Koschei images are built by upstream on Quay.io. Upstream implements
continuous delivery of container images to Quay.io registry. Code
pushed to fedora-prod or fedora-stage git branches in upstream GitHub
repository are automatically built as container images and pushed to
Quay.io registry with appropriate tags.
2024-06-06 07:16:52 +02:00
Pristine upstream Koschei images are then imported into internal
OpenShift registry -- Fedora OpenShift does not build any Koschei
container images by itself. Image import into OpenShift is always
done manually by a Koschei sysadmin, usually by running a manual
Ansible playbook. This way we ensure that developers who can push
code to GitHub repository don't have any control over Fedora
infrastructure deployment process.
2024-06-06 07:16:52 +02:00
Upstream images don't contain any Fedora-specific configuration. Such
configuration is mounted into containers as read-only volumes backed
by Kubernetes Secrets.
2024-06-06 07:16:52 +02:00
Frontend is ran as Kubernetes Deployment with multiple replicas for
high availability. Frontend supports rolling update, which allows it
to be updated with no user-visible downtime.
2024-06-06 07:16:52 +02:00
Each of backend services has its own Kubernetes Deployment with a
single replica. Because backend downtime is not user-visible, rolling
updates are not used by backend.
2024-06-06 07:16:52 +02:00
In addition to frontend and backend, there is also `admin` Deployment,
which runs a container that does nothing but waits for sysadmin to
`rsh` into it for running manual admin commands.
2024-06-06 07:16:52 +02:00
Besides the forementioned Kubernetes Deployments, some ad-hoc tasks
are ran as Kubernetes Jobs, either created on a time schedule from
CronJobs or created by running manual Ansible playbooks by Koschei
sysadmins.
2024-06-06 07:16:52 +02:00
== Upgrade
Upgrading Koschei to a new upstream version is done by running one of
manual Ansible playbooks:
....
2024-06-06 07:16:52 +02:00
sudo rbac-playbook manual/upgrade/koschei-rolling.yml
sudo rbac-playbook manual/upgrade/koschei-full.yml
....
2024-06-06 07:16:52 +02:00
The first rolling update playbook should be used when given update is
known not to change database schema. In this case new upstream image
is simply imported into internal OpenShift registry and all
Deployments are restarted. OpenShift takes care of doing rolling
update of frontend, so that no downtime is experienced by
users. Backend Pods are also recreated with the new image.
The second full update playbook is used when given update changes
database schema. This playbook pauses all Deployments and terminates
all Pods. Users experience frontend downtime. When everything is
stopped, the playbook creates Kubernetes Jobs to run database
migrations and perform other maintenance tasks. Once the Jobs are
done, new Deployments are rolled.
2024-06-06 07:16:52 +02:00
== Admin shell
2024-06-06 07:16:52 +02:00
Certain Koschei operation tasks are done with the `koschei-admin` CLI
tool. The container where the tool is available can be accessed with:
2024-06-06 07:16:52 +02:00
...
oc project koschei
oc rsh deploy/admin
...
2024-06-06 07:16:52 +02:00
== Suspending Koschei operation
For stopping builds from being scheduled, scaling down the `scheduler`
Deployment to zero replicas is enough. For planned Koji outages, it's
recommended to stop the scheduler service. It is not necessary, as
Koschei can recover from Koji errors and network errors automatically,
but when Koji builders are stopped, it may cause unexpected build
failures that would be reported to users. Other backend services can
be left running as they automatically restart themselves on Koji and
network errors.
== Limiting Koji usage
Koschei is by default limited to 30 concurrently running builds. This
limit can be changed in the configuration under `koji_config.max_builds`
key. There's also Koji load monitoring, that prevents builds from being
scheduled when Koji load is higher that certain threshold. That should
prevent scheduling builds during mass rebuilds, so it's not necessary to
stop scheduling during those.
== Setting admin announcement
Koschei can display announcement in web UI. This is mostly useful to
inform users about outages or other problems.
2024-06-06 07:16:52 +02:00
To set announcement, run:
....
koschei-admin set-notice "Koschei operation is currently suspended due to scheduled Koji outage"
....
or:
....
2024-06-06 07:16:52 +02:00
koschei-admin set-notice "Submitting scratch builds by Koschei is currently disabled due to Fedora 23 mass rebuild"
....
2024-06-06 07:16:52 +02:00
To clear announcement, run:
....
koschei-admin clear-notice
....
== Adding package groups
Packages can be added to one or more group.
2024-06-06 07:16:52 +02:00
To add new group named `mynewgroup`, run:
....
koschei-admin add-group mynewgroup
....
2024-06-06 07:16:52 +02:00
To add new group named `mynewgroup` and populate it with some
packages, run:
....
koschei-admin add-group mynewgroup pkg1 pkg2 pkg3
....
== Set package static priority
Some packages are more or less important and can have higher or lower
priority. Any user can change manual priority, which is reset after
package is rebuilt. Admins can additionally set static priority, which
is not affected by package rebuilds.
2024-06-06 07:16:52 +02:00
To set static priority of package `foo` to value `100`, run:
....
koschei-admin --collection f27 set-priority --static foo 100
....
== Branching a new Fedora release
After branching occurs and Koji build targets have been created, Koschei
should be updated to reflect the new state. There is a special admin
command for this purpose, which takes care of copying the configuration
and also last builds from the history.
To branch the collection from Fedora 27 to Fedora 28, use the following:
....
koschei-admin branch-collection f27 f28 -d 'Fedora 27' -t f28 --bugzilla-version 27
....
Then you can optionally verify that the collection configuration is
2024-06-06 07:16:52 +02:00
correct by visiting https://koschei.fedoraproject.org/collections
and examining the configuration of the newly branched collection.
== Edit Koschei group to make it global
2024-06-06 07:16:52 +02:00
To turn `mygroup` group created by user `someuser` into a global group
`thegroup`, run:
....
2024-06-06 07:16:52 +02:00
koschei-admin edit-group someuser/mygroup --make-global --new-name thegroup
....