Update Koschei SOP

Closes #78
This commit is contained in:
Mikolaj Izdebski 2024-06-06 07:16:52 +02:00
parent d035653f65
commit fe02817e0e

View file

@ -5,121 +5,151 @@ runs package scratch builds after dependency change or after time elapse
and reports package buildability status to interested parties. and reports package buildability status to interested parties.
Production instance:: Production instance::
https://apps.fedoraproject.org/koschei https://koschei.fedoraproject.org/
Staginginstance:: Staging instance::
https://apps.stg.fedoraproject.org/koschei https://koschei.stg.fedoraproject.org/
== Contact Information == Contact Information
Owner:: Owner::
mizdebsk, msimacek mizdebsk
Contact:: Contact::
#fedora-admin #fedora-admin
Location:: Location::
Fedora Cloud Fedora infrastructure OpenShift
Purpose:: Purpose::
continuous integration system continuous integration system
== Deployment
Koschei deployment is managed by two Ansible playbooks:
....
sudo rbac-playbook groups/koschei-backend.yml
sudo rbac-playbook groups/koschei-web.yml
....
== Description == Description
Koschei is deployed on two separate machines - `koschei-backend` and Koschei consists of frontend and backend.
`koschei-web`
Frontend (`koschei-web`) is a Flask WSGi application running with httpd. Frontend is a web application written in Python using Flask framework.
It displays information to users and allows editing package groups and It is ran under Apache httpd with mod_wsgi as a WSGi application.
changing priorities. Frontend displays information to users and allows editing package
groups and changing priorities.
Backend (`koschei-backend`) consists of multiple services: Backend consists of a couple of loosely-coupled microservices,
including:
* `koschei-watcher` - listens to fedmsg events for complete builds and * `watcher` - listens to events on Fedora messaging bus for complete
changes build states in the database builds and changes build states in the database.
* `koschei-repo-resolver` - resolves package dependencies in given repo * `repo-resolver` - resolves package dependencies in given repo using
using hawkey and compares them with previous iteration to get a hawkey and compares them with previous iteration to get a dependency
dependency diff. It resolves all packages in the newest repo available diff. It resolves all packages in the newest repo available in
in Koji. The output is a base for scheduling new builds Koji. The output is a base for scheduling new builds.
* `koschei-build-resolver` - resolves complete builds in the repo in * `build-resolver` - resolves complete builds in the repo in which
which they were done in Koji. Produces the dependency differences they were done in Koji. Produces the dependency differences visible in
visible in the frontend the frontend.
* `koschei-scheduler` - schedules new builds based on multiple criteria: * `scheduler` - schedules new builds based on multiple criteria:
** dependency priority - dependency changes since last build valued by ** dependency priority - dependency changes since last build valued by
their distance in the dependency graph their distance in the dependency graph
** manual and static priorities - set manually in the frontend. Manual ** manual and static priorities - set manually in the frontend. Manual
priority is reset after each build, static priority persists priority is reset after each build, static priority persists
** time priority - time elapsed since the last build ** time priority - time elapsed since the last build.
* `koschei-polling` - polls the same types of events as koschei-watcher * `polling` - polls the same types of events as `watcher` without
without reliance on fedmsg. Additionaly takes care of package list reliance on the messaging bus. Additionally takes care of package list
synchronization and other regularly executed tasks synchronization and other regularly executed tasks.
== Configuration == Deployment
Koschei configuration is in `/etc/koschei/config-backend.cfg` and Koschei deployment is managed by an Ansible playbook:
`/etc/koschei/config-frontend.cfg`, and is merged with the default
configuration in `/usr/share/koschei/config.cfg` (the ones in `/etc`
overrides the defaults in `/usr`). Note the merge is recursive. The
configuration contains all configurable items for all Koschei services
and the frontend. The alterations to configuration that aren't temporary
should be done through ansible playbook. Configuration changes have no
effect on already running services -- they need to be restarted, which
happens automatically when using the playbook.
== Disk usage
Koschei doesn't keep on disk anything that couldn't be recreated easily -
all important data is stored in PostgreSQL database, configuration is
managed by Ansible, code installed by RPM and so on.
To speed up operation and reduce load on external servers, Koschei
caches some data obtained from services it integrates with. Most
notably, YUM repositories downloaded from Koji are kept in
`/var/cache/koschei/repodata`. Each repository takes about 100 MB of
disk space. Maximal number of repositories kept at time is controlled by
`cache_l2_capacity` parameter in `config-backend.cfg`
(`config-backend.cfg.j2` in Ansible). If repodata cache starts to
consume too much disk space, that value can be decreased - after
restart, `koschei-*-resolver` will remove least recently used cache
entries to respect configured cache capacity.
== Database
Koschei needs to connect to a PostgreSQL database, other database
systems are not supported. Database connection is specified in the
configuration under the `database_config` key that can contain the
following keys: `username, password, host, port, database`.
After an update of koschei, the database needs to be migrated to new
schema. This happens automatically when using the upgrade playbook.
Alternatively, it can be executed manulally using:
.... ....
koschei-admin alembic upgrade head sudo rbac-playbook openshift-apps/koschei.yml
.... ....
The backend services need to be stopped during the migration. The above playbook is idempotent, which means that running it has no
effect when everything is already configured as expected.
== Managing koschei services Koschei is fully-containerized. It is deployed on OpenShift.
Koschei services are systemd units managed through `systemctl`. They can Koschei is stateless. It doesn't use any persistent storage. All
be started and stopped independently in any order. The frontend is run non-volatile information is stored in PostgreSQL database, which is
using httpd. not part of Koschei, but an external service that Koschei depends on.
== Suspending koschei operation There is one common container image for different Koschei workloads --
frontend and backend containers are all ran from the same image.
For stopping builds from being scheduled, stopping the Koschei images are built by upstream on Quay.io. Upstream implements
`koschei-scheduler` service is enough. For planned Koji outages, it's continuous delivery of container images to Quay.io registry. Code
recommended to stop `koschei-scheduler`. It is not necessary, as koschei pushed to fedora-prod or fedora-stage git branches in upstream GitHub
can recover from Koji errors and network errors automatically, but when repository are automatically built as container images and pushed to
Koji builders are stopped, it may cause unexpected build failures that Quay.io registry with appropriate tags.
would be reported to users. Other services can be left running as they
automatically restart themselves on Koji and network errors. Pristine upstream Koschei images are then imported into internal
OpenShift registry -- Fedora OpenShift does not build any Koschei
container images by itself. Image import into OpenShift is always
done manually by a Koschei sysadmin, usually by running a manual
Ansible playbook. This way we ensure that developers who can push
code to GitHub repository don't have any control over Fedora
infrastructure deployment process.
Upstream images don't contain any Fedora-specific configuration. Such
configuration is mounted into containers as read-only volumes backed
by Kubernetes Secrets.
Frontend is ran as Kubernetes Deployment with multiple replicas for
high availability. Frontend supports rolling update, which allows it
to be updated with no user-visible downtime.
Each of backend services has its own Kubernetes Deployment with a
single replica. Because backend downtime is not user-visible, rolling
updates are not used by backend.
In addition to frontend and backend, there is also `admin` Deployment,
which runs a container that does nothing but waits for sysadmin to
`rsh` into it for running manual admin commands.
Besides the forementioned Kubernetes Deployments, some ad-hoc tasks
are ran as Kubernetes Jobs, either created on a time schedule from
CronJobs or created by running manual Ansible playbooks by Koschei
sysadmins.
== Upgrade
Upgrading Koschei to a new upstream version is done by running one of
manual Ansible playbooks:
....
sudo rbac-playbook manual/upgrade/koschei-rolling.yml
sudo rbac-playbook manual/upgrade/koschei-full.yml
....
The first rolling update playbook should be used when given update is
known not to change database schema. In this case new upstream image
is simply imported into internal OpenShift registry and all
Deployments are restarted. OpenShift takes care of doing rolling
update of frontend, so that no downtime is experienced by
users. Backend Pods are also recreated with the new image.
The second full update playbook is used when given update changes
database schema. This playbook pauses all Deployments and terminates
all Pods. Users experience frontend downtime. When everything is
stopped, the playbook creates Kubernetes Jobs to run database
migrations and perform other maintenance tasks. Once the Jobs are
done, new Deployments are rolled.
== Admin shell
Certain Koschei operation tasks are done with the `koschei-admin` CLI
tool. The container where the tool is available can be accessed with:
...
oc project koschei
oc rsh deploy/admin
...
== Suspending Koschei operation
For stopping builds from being scheduled, scaling down the `scheduler`
Deployment to zero replicas is enough. For planned Koji outages, it's
recommended to stop the scheduler service. It is not necessary, as
Koschei can recover from Koji errors and network errors automatically,
but when Koji builders are stopped, it may cause unexpected build
failures that would be reported to users. Other backend services can
be left running as they automatically restart themselves on Koji and
network errors.
== Limiting Koji usage == Limiting Koji usage
@ -130,20 +160,12 @@ scheduled when Koji load is higher that certain threshold. That should
prevent scheduling builds during mass rebuilds, so it's not necessary to prevent scheduling builds during mass rebuilds, so it's not necessary to
stop scheduling during those. stop scheduling during those.
== Fedmsg notifications
Koschei optionally supports sending fedmsg notifications for package
state changes. The fedmsg dispatch can be turned on and off in the
configuration (key `fedmsg-publisher.enabled`). Koschei doesn't supply
configuration for fedmsg, it lets the library to load it's own (in
`/etc/fedmsg.d/`).
== Setting admin announcement == Setting admin announcement
Koschei can display announcement in web UI. This is mostly useful to Koschei can display announcement in web UI. This is mostly useful to
inform users about outages or other problems. inform users about outages or other problems.
To set announcement, run as koschei user: To set announcement, run:
.... ....
koschei-admin set-notice "Koschei operation is currently suspended due to scheduled Koji outage" koschei-admin set-notice "Koschei operation is currently suspended due to scheduled Koji outage"
@ -152,10 +174,10 @@ koschei-admin set-notice "Koschei operation is currently suspended due to schedu
or: or:
.... ....
koschei-admin set-notice "Sumbitting scratch builds by Koschei is currently disabled due to Fedora 23 mass rebuild" koschei-admin set-notice "Submitting scratch builds by Koschei is currently disabled due to Fedora 23 mass rebuild"
.... ....
To clear announcement, run as koschei user: To clear announcement, run:
.... ....
koschei-admin clear-notice koschei-admin clear-notice
@ -165,14 +187,14 @@ koschei-admin clear-notice
Packages can be added to one or more group. Packages can be added to one or more group.
To add new group named `mynewgroup`, run as `koschei` user: To add new group named `mynewgroup`, run:
.... ....
koschei-admin add-group mynewgroup koschei-admin add-group mynewgroup
.... ....
To add new group named `mynewgroup` and populate it with some packages, To add new group named `mynewgroup` and populate it with some
run as `koschei` user: packages, run:
.... ....
koschei-admin add-group mynewgroup pkg1 pkg2 pkg3 koschei-admin add-group mynewgroup pkg1 pkg2 pkg3
@ -185,8 +207,7 @@ priority. Any user can change manual priority, which is reset after
package is rebuilt. Admins can additionally set static priority, which package is rebuilt. Admins can additionally set static priority, which
is not affected by package rebuilds. is not affected by package rebuilds.
To set static priority of package `foo` to value `100`, run as `koschei` To set static priority of package `foo` to value `100`, run:
user:
.... ....
koschei-admin --collection f27 set-priority --static foo 100 koschei-admin --collection f27 set-priority --static foo 100
@ -206,15 +227,14 @@ koschei-admin branch-collection f27 f28 -d 'Fedora 27' -t f28 --bugzilla-version
.... ....
Then you can optionally verify that the collection configuration is Then you can optionally verify that the collection configuration is
correct by visiting https://apps.fedoraproject.org/koschei/collections correct by visiting https://koschei.fedoraproject.org/collections
and examining the configuration of the newly branched collection. and examining the configuration of the newly branched collection.
== Edit Koschei group to make it global == Edit Koschei group to make it global
Koschei runs in an openshift instance. Connect to the openshift control vm using `ssh` and run the following commands: To turn `mygroup` group created by user `someuser` into a global group
`thegroup`, run:
.... ....
oc project koschei koschei-admin edit-group someuser/mygroup --make-global --new-name thegroup
oc rsh <admin pod in the koschei project>
koschei-admin edit-group myuser/mygroup --make-global --new-name mygroup
.... ....