parent
d035653f65
commit
fe02817e0e
1 changed files with 128 additions and 108 deletions
|
@ -5,121 +5,151 @@ runs package scratch builds after dependency change or after time elapse
|
|||
and reports package buildability status to interested parties.
|
||||
|
||||
Production instance::
|
||||
https://apps.fedoraproject.org/koschei
|
||||
https://koschei.fedoraproject.org/
|
||||
Staging instance::
|
||||
https://apps.stg.fedoraproject.org/koschei
|
||||
https://koschei.stg.fedoraproject.org/
|
||||
|
||||
== Contact Information
|
||||
|
||||
Owner::
|
||||
mizdebsk, msimacek
|
||||
mizdebsk
|
||||
Contact::
|
||||
#fedora-admin
|
||||
Location::
|
||||
Fedora Cloud
|
||||
Fedora infrastructure OpenShift
|
||||
Purpose::
|
||||
continuous integration system
|
||||
|
||||
== Deployment
|
||||
|
||||
Koschei deployment is managed by two Ansible playbooks:
|
||||
|
||||
....
|
||||
sudo rbac-playbook groups/koschei-backend.yml
|
||||
sudo rbac-playbook groups/koschei-web.yml
|
||||
....
|
||||
|
||||
== Description
|
||||
|
||||
Koschei is deployed on two separate machines - `koschei-backend` and
|
||||
`koschei-web`
|
||||
Koschei consists of frontend and backend.
|
||||
|
||||
Frontend (`koschei-web`) is a Flask WSGi application running with httpd.
|
||||
It displays information to users and allows editing package groups and
|
||||
changing priorities.
|
||||
Frontend is a web application written in Python using Flask framework.
|
||||
It is ran under Apache httpd with mod_wsgi as a WSGi application.
|
||||
Frontend displays information to users and allows editing package
|
||||
groups and changing priorities.
|
||||
|
||||
Backend (`koschei-backend`) consists of multiple services:
|
||||
Backend consists of a couple of loosely-coupled microservices,
|
||||
including:
|
||||
|
||||
* `koschei-watcher` - listens to fedmsg events for complete builds and
|
||||
changes build states in the database
|
||||
* `koschei-repo-resolver` - resolves package dependencies in given repo
|
||||
using hawkey and compares them with previous iteration to get a
|
||||
dependency diff. It resolves all packages in the newest repo available
|
||||
in Koji. The output is a base for scheduling new builds
|
||||
* `koschei-build-resolver` - resolves complete builds in the repo in
|
||||
which they were done in Koji. Produces the dependency differences
|
||||
visible in the frontend
|
||||
* `koschei-scheduler` - schedules new builds based on multiple criteria:
|
||||
* `watcher` - listens to events on Fedora messaging bus for complete
|
||||
builds and changes build states in the database.
|
||||
* `repo-resolver` - resolves package dependencies in given repo using
|
||||
hawkey and compares them with previous iteration to get a dependency
|
||||
diff. It resolves all packages in the newest repo available in
|
||||
Koji. The output is a base for scheduling new builds.
|
||||
* `build-resolver` - resolves complete builds in the repo in which
|
||||
they were done in Koji. Produces the dependency differences visible in
|
||||
the frontend.
|
||||
* `scheduler` - schedules new builds based on multiple criteria:
|
||||
** dependency priority - dependency changes since last build valued by
|
||||
their distance in the dependency graph
|
||||
** manual and static priorities - set manually in the frontend. Manual
|
||||
priority is reset after each build, static priority persists
|
||||
** time priority - time elapsed since the last build
|
||||
* `koschei-polling` - polls the same types of events as koschei-watcher
|
||||
without reliance on fedmsg. Additionaly takes care of package list
|
||||
synchronization and other regularly executed tasks
|
||||
** time priority - time elapsed since the last build.
|
||||
* `polling` - polls the same types of events as `watcher` without
|
||||
reliance on the messaging bus. Additionally takes care of package list
|
||||
synchronization and other regularly executed tasks.
|
||||
|
||||
== Configuration
|
||||
== Deployment
|
||||
|
||||
Koschei configuration is in `/etc/koschei/config-backend.cfg` and
|
||||
`/etc/koschei/config-frontend.cfg`, and is merged with the default
|
||||
configuration in `/usr/share/koschei/config.cfg` (the ones in `/etc`
|
||||
overrides the defaults in `/usr`). Note the merge is recursive. The
|
||||
configuration contains all configurable items for all Koschei services
|
||||
and the frontend. The alterations to configuration that aren't temporary
|
||||
should be done through ansible playbook. Configuration changes have no
|
||||
effect on already running services -- they need to be restarted, which
|
||||
happens automatically when using the playbook.
|
||||
|
||||
== Disk usage
|
||||
|
||||
Koschei doesn't keep on disk anything that couldn't be recreated easily -
|
||||
all important data is stored in PostgreSQL database, configuration is
|
||||
managed by Ansible, code installed by RPM and so on.
|
||||
|
||||
To speed up operation and reduce load on external servers, Koschei
|
||||
caches some data obtained from services it integrates with. Most
|
||||
notably, YUM repositories downloaded from Koji are kept in
|
||||
`/var/cache/koschei/repodata`. Each repository takes about 100 MB of
|
||||
disk space. Maximal number of repositories kept at time is controlled by
|
||||
`cache_l2_capacity` parameter in `config-backend.cfg`
|
||||
(`config-backend.cfg.j2` in Ansible). If repodata cache starts to
|
||||
consume too much disk space, that value can be decreased - after
|
||||
restart, `koschei-*-resolver` will remove least recently used cache
|
||||
entries to respect configured cache capacity.
|
||||
|
||||
== Database
|
||||
|
||||
Koschei needs to connect to a PostgreSQL database, other database
|
||||
systems are not supported. Database connection is specified in the
|
||||
configuration under the `database_config` key that can contain the
|
||||
following keys: `username, password, host, port, database`.
|
||||
|
||||
After an update of koschei, the database needs to be migrated to new
|
||||
schema. This happens automatically when using the upgrade playbook.
|
||||
Alternatively, it can be executed manulally using:
|
||||
Koschei deployment is managed by an Ansible playbook:
|
||||
|
||||
....
|
||||
koschei-admin alembic upgrade head
|
||||
sudo rbac-playbook openshift-apps/koschei.yml
|
||||
....
|
||||
|
||||
The backend services need to be stopped during the migration.
|
||||
The above playbook is idempotent, which means that running it has no
|
||||
effect when everything is already configured as expected.
|
||||
|
||||
== Managing koschei services
|
||||
Koschei is fully-containerized. It is deployed on OpenShift.
|
||||
|
||||
Koschei services are systemd units managed through `systemctl`. They can
|
||||
be started and stopped independently in any order. The frontend is run
|
||||
using httpd.
|
||||
Koschei is stateless. It doesn't use any persistent storage. All
|
||||
non-volatile information is stored in PostgreSQL database, which is
|
||||
not part of Koschei, but an external service that Koschei depends on.
|
||||
|
||||
== Suspending koschei operation
|
||||
There is one common container image for different Koschei workloads --
|
||||
frontend and backend containers are all ran from the same image.
|
||||
|
||||
For stopping builds from being scheduled, stopping the
|
||||
`koschei-scheduler` service is enough. For planned Koji outages, it's
|
||||
recommended to stop `koschei-scheduler`. It is not necessary, as koschei
|
||||
can recover from Koji errors and network errors automatically, but when
|
||||
Koji builders are stopped, it may cause unexpected build failures that
|
||||
would be reported to users. Other services can be left running as they
|
||||
automatically restart themselves on Koji and network errors.
|
||||
Koschei images are built by upstream on Quay.io. Upstream implements
|
||||
continuous delivery of container images to Quay.io registry. Code
|
||||
pushed to fedora-prod or fedora-stage git branches in upstream GitHub
|
||||
repository are automatically built as container images and pushed to
|
||||
Quay.io registry with appropriate tags.
|
||||
|
||||
Pristine upstream Koschei images are then imported into internal
|
||||
OpenShift registry -- Fedora OpenShift does not build any Koschei
|
||||
container images by itself. Image import into OpenShift is always
|
||||
done manually by a Koschei sysadmin, usually by running a manual
|
||||
Ansible playbook. This way we ensure that developers who can push
|
||||
code to GitHub repository don't have any control over Fedora
|
||||
infrastructure deployment process.
|
||||
|
||||
Upstream images don't contain any Fedora-specific configuration. Such
|
||||
configuration is mounted into containers as read-only volumes backed
|
||||
by Kubernetes Secrets.
|
||||
|
||||
Frontend is ran as Kubernetes Deployment with multiple replicas for
|
||||
high availability. Frontend supports rolling update, which allows it
|
||||
to be updated with no user-visible downtime.
|
||||
|
||||
Each of backend services has its own Kubernetes Deployment with a
|
||||
single replica. Because backend downtime is not user-visible, rolling
|
||||
updates are not used by backend.
|
||||
|
||||
In addition to frontend and backend, there is also `admin` Deployment,
|
||||
which runs a container that does nothing but waits for sysadmin to
|
||||
`rsh` into it for running manual admin commands.
|
||||
|
||||
Besides the forementioned Kubernetes Deployments, some ad-hoc tasks
|
||||
are ran as Kubernetes Jobs, either created on a time schedule from
|
||||
CronJobs or created by running manual Ansible playbooks by Koschei
|
||||
sysadmins.
|
||||
|
||||
== Upgrade
|
||||
|
||||
Upgrading Koschei to a new upstream version is done by running one of
|
||||
manual Ansible playbooks:
|
||||
|
||||
....
|
||||
sudo rbac-playbook manual/upgrade/koschei-rolling.yml
|
||||
sudo rbac-playbook manual/upgrade/koschei-full.yml
|
||||
....
|
||||
|
||||
The first rolling update playbook should be used when given update is
|
||||
known not to change database schema. In this case new upstream image
|
||||
is simply imported into internal OpenShift registry and all
|
||||
Deployments are restarted. OpenShift takes care of doing rolling
|
||||
update of frontend, so that no downtime is experienced by
|
||||
users. Backend Pods are also recreated with the new image.
|
||||
|
||||
The second full update playbook is used when given update changes
|
||||
database schema. This playbook pauses all Deployments and terminates
|
||||
all Pods. Users experience frontend downtime. When everything is
|
||||
stopped, the playbook creates Kubernetes Jobs to run database
|
||||
migrations and perform other maintenance tasks. Once the Jobs are
|
||||
done, new Deployments are rolled.
|
||||
|
||||
== Admin shell
|
||||
|
||||
Certain Koschei operation tasks are done with the `koschei-admin` CLI
|
||||
tool. The container where the tool is available can be accessed with:
|
||||
|
||||
...
|
||||
oc project koschei
|
||||
oc rsh deploy/admin
|
||||
...
|
||||
|
||||
== Suspending Koschei operation
|
||||
|
||||
For stopping builds from being scheduled, scaling down the `scheduler`
|
||||
Deployment to zero replicas is enough. For planned Koji outages, it's
|
||||
recommended to stop the scheduler service. It is not necessary, as
|
||||
Koschei can recover from Koji errors and network errors automatically,
|
||||
but when Koji builders are stopped, it may cause unexpected build
|
||||
failures that would be reported to users. Other backend services can
|
||||
be left running as they automatically restart themselves on Koji and
|
||||
network errors.
|
||||
|
||||
== Limiting Koji usage
|
||||
|
||||
|
@ -130,20 +160,12 @@ scheduled when Koji load is higher that certain threshold. That should
|
|||
prevent scheduling builds during mass rebuilds, so it's not necessary to
|
||||
stop scheduling during those.
|
||||
|
||||
== Fedmsg notifications
|
||||
|
||||
Koschei optionally supports sending fedmsg notifications for package
|
||||
state changes. The fedmsg dispatch can be turned on and off in the
|
||||
configuration (key `fedmsg-publisher.enabled`). Koschei doesn't supply
|
||||
configuration for fedmsg, it lets the library to load it's own (in
|
||||
`/etc/fedmsg.d/`).
|
||||
|
||||
== Setting admin announcement
|
||||
|
||||
Koschei can display announcement in web UI. This is mostly useful to
|
||||
inform users about outages or other problems.
|
||||
|
||||
To set announcement, run as koschei user:
|
||||
To set announcement, run:
|
||||
|
||||
....
|
||||
koschei-admin set-notice "Koschei operation is currently suspended due to scheduled Koji outage"
|
||||
|
@ -152,10 +174,10 @@ koschei-admin set-notice "Koschei operation is currently suspended due to schedu
|
|||
or:
|
||||
|
||||
....
|
||||
koschei-admin set-notice "Sumbitting scratch builds by Koschei is currently disabled due to Fedora 23 mass rebuild"
|
||||
koschei-admin set-notice "Submitting scratch builds by Koschei is currently disabled due to Fedora 23 mass rebuild"
|
||||
....
|
||||
|
||||
To clear announcement, run as koschei user:
|
||||
To clear announcement, run:
|
||||
|
||||
....
|
||||
koschei-admin clear-notice
|
||||
|
@ -165,14 +187,14 @@ koschei-admin clear-notice
|
|||
|
||||
Packages can be added to one or more group.
|
||||
|
||||
To add new group named `mynewgroup`, run as `koschei` user:
|
||||
To add new group named `mynewgroup`, run:
|
||||
|
||||
....
|
||||
koschei-admin add-group mynewgroup
|
||||
....
|
||||
|
||||
To add new group named `mynewgroup` and populate it with some packages,
|
||||
run as `koschei` user:
|
||||
To add new group named `mynewgroup` and populate it with some
|
||||
packages, run:
|
||||
|
||||
....
|
||||
koschei-admin add-group mynewgroup pkg1 pkg2 pkg3
|
||||
|
@ -185,8 +207,7 @@ priority. Any user can change manual priority, which is reset after
|
|||
package is rebuilt. Admins can additionally set static priority, which
|
||||
is not affected by package rebuilds.
|
||||
|
||||
To set static priority of package `foo` to value `100`, run as `koschei`
|
||||
user:
|
||||
To set static priority of package `foo` to value `100`, run:
|
||||
|
||||
....
|
||||
koschei-admin --collection f27 set-priority --static foo 100
|
||||
|
@ -206,15 +227,14 @@ koschei-admin branch-collection f27 f28 -d 'Fedora 27' -t f28 --bugzilla-version
|
|||
....
|
||||
|
||||
Then you can optionally verify that the collection configuration is
|
||||
correct by visiting https://apps.fedoraproject.org/koschei/collections
|
||||
correct by visiting https://koschei.fedoraproject.org/collections
|
||||
and examining the configuration of the newly branched collection.
|
||||
|
||||
== Edit Koschei group to make it global
|
||||
|
||||
Koschei runs in an openshift instance. Connect to the openshift control vm using `ssh` and run the following commands:
|
||||
To turn `mygroup` group created by user `someuser` into a global group
|
||||
`thegroup`, run:
|
||||
|
||||
....
|
||||
oc project koschei
|
||||
oc rsh <admin pod in the koschei project>
|
||||
koschei-admin edit-group myuser/mygroup --make-global --new-name mygroup
|
||||
koschei-admin edit-group someuser/mygroup --make-global --new-name thegroup
|
||||
....
|
Loading…
Add table
Add a link
Reference in a new issue