parent
d035653f65
commit
fe02817e0e
1 changed files with 128 additions and 108 deletions
|
@ -5,121 +5,151 @@ runs package scratch builds after dependency change or after time elapse
|
||||||
and reports package buildability status to interested parties.
|
and reports package buildability status to interested parties.
|
||||||
|
|
||||||
Production instance::
|
Production instance::
|
||||||
https://apps.fedoraproject.org/koschei
|
https://koschei.fedoraproject.org/
|
||||||
Staginginstance::
|
Staging instance::
|
||||||
https://apps.stg.fedoraproject.org/koschei
|
https://koschei.stg.fedoraproject.org/
|
||||||
|
|
||||||
== Contact Information
|
== Contact Information
|
||||||
|
|
||||||
Owner::
|
Owner::
|
||||||
mizdebsk, msimacek
|
mizdebsk
|
||||||
Contact::
|
Contact::
|
||||||
#fedora-admin
|
#fedora-admin
|
||||||
Location::
|
Location::
|
||||||
Fedora Cloud
|
Fedora infrastructure OpenShift
|
||||||
Purpose::
|
Purpose::
|
||||||
continuous integration system
|
continuous integration system
|
||||||
|
|
||||||
== Deployment
|
|
||||||
|
|
||||||
Koschei deployment is managed by two Ansible playbooks:
|
|
||||||
|
|
||||||
....
|
|
||||||
sudo rbac-playbook groups/koschei-backend.yml
|
|
||||||
sudo rbac-playbook groups/koschei-web.yml
|
|
||||||
....
|
|
||||||
|
|
||||||
== Description
|
== Description
|
||||||
|
|
||||||
Koschei is deployed on two separate machines - `koschei-backend` and
|
Koschei consists of frontend and backend.
|
||||||
`koschei-web`
|
|
||||||
|
|
||||||
Frontend (`koschei-web`) is a Flask WSGi application running with httpd.
|
Frontend is a web application written in Python using Flask framework.
|
||||||
It displays information to users and allows editing package groups and
|
It is ran under Apache httpd with mod_wsgi as a WSGi application.
|
||||||
changing priorities.
|
Frontend displays information to users and allows editing package
|
||||||
|
groups and changing priorities.
|
||||||
|
|
||||||
Backend (`koschei-backend`) consists of multiple services:
|
Backend consists of a couple of loosely-coupled microservices,
|
||||||
|
including:
|
||||||
|
|
||||||
* `koschei-watcher` - listens to fedmsg events for complete builds and
|
* `watcher` - listens to events on Fedora messaging bus for complete
|
||||||
changes build states in the database
|
builds and changes build states in the database.
|
||||||
* `koschei-repo-resolver` - resolves package dependencies in given repo
|
* `repo-resolver` - resolves package dependencies in given repo using
|
||||||
using hawkey and compares them with previous iteration to get a
|
hawkey and compares them with previous iteration to get a dependency
|
||||||
dependency diff. It resolves all packages in the newest repo available
|
diff. It resolves all packages in the newest repo available in
|
||||||
in Koji. The output is a base for scheduling new builds
|
Koji. The output is a base for scheduling new builds.
|
||||||
* `koschei-build-resolver` - resolves complete builds in the repo in
|
* `build-resolver` - resolves complete builds in the repo in which
|
||||||
which they were done in Koji. Produces the dependency differences
|
they were done in Koji. Produces the dependency differences visible in
|
||||||
visible in the frontend
|
the frontend.
|
||||||
* `koschei-scheduler` - schedules new builds based on multiple criteria:
|
* `scheduler` - schedules new builds based on multiple criteria:
|
||||||
** dependency priority - dependency changes since last build valued by
|
** dependency priority - dependency changes since last build valued by
|
||||||
their distance in the dependency graph
|
their distance in the dependency graph
|
||||||
** manual and static priorities - set manually in the frontend. Manual
|
** manual and static priorities - set manually in the frontend. Manual
|
||||||
priority is reset after each build, static priority persists
|
priority is reset after each build, static priority persists
|
||||||
** time priority - time elapsed since the last build
|
** time priority - time elapsed since the last build.
|
||||||
* `koschei-polling` - polls the same types of events as koschei-watcher
|
* `polling` - polls the same types of events as `watcher` without
|
||||||
without reliance on fedmsg. Additionaly takes care of package list
|
reliance on the messaging bus. Additionally takes care of package list
|
||||||
synchronization and other regularly executed tasks
|
synchronization and other regularly executed tasks.
|
||||||
|
|
||||||
== Configuration
|
== Deployment
|
||||||
|
|
||||||
Koschei configuration is in `/etc/koschei/config-backend.cfg` and
|
Koschei deployment is managed by an Ansible playbook:
|
||||||
`/etc/koschei/config-frontend.cfg`, and is merged with the default
|
|
||||||
configuration in `/usr/share/koschei/config.cfg` (the ones in `/etc`
|
|
||||||
overrides the defaults in `/usr`). Note the merge is recursive. The
|
|
||||||
configuration contains all configurable items for all Koschei services
|
|
||||||
and the frontend. The alterations to configuration that aren't temporary
|
|
||||||
should be done through ansible playbook. Configuration changes have no
|
|
||||||
effect on already running services -- they need to be restarted, which
|
|
||||||
happens automatically when using the playbook.
|
|
||||||
|
|
||||||
== Disk usage
|
|
||||||
|
|
||||||
Koschei doesn't keep on disk anything that couldn't be recreated easily -
|
|
||||||
all important data is stored in PostgreSQL database, configuration is
|
|
||||||
managed by Ansible, code installed by RPM and so on.
|
|
||||||
|
|
||||||
To speed up operation and reduce load on external servers, Koschei
|
|
||||||
caches some data obtained from services it integrates with. Most
|
|
||||||
notably, YUM repositories downloaded from Koji are kept in
|
|
||||||
`/var/cache/koschei/repodata`. Each repository takes about 100 MB of
|
|
||||||
disk space. Maximal number of repositories kept at time is controlled by
|
|
||||||
`cache_l2_capacity` parameter in `config-backend.cfg`
|
|
||||||
(`config-backend.cfg.j2` in Ansible). If repodata cache starts to
|
|
||||||
consume too much disk space, that value can be decreased - after
|
|
||||||
restart, `koschei-*-resolver` will remove least recently used cache
|
|
||||||
entries to respect configured cache capacity.
|
|
||||||
|
|
||||||
== Database
|
|
||||||
|
|
||||||
Koschei needs to connect to a PostgreSQL database, other database
|
|
||||||
systems are not supported. Database connection is specified in the
|
|
||||||
configuration under the `database_config` key that can contain the
|
|
||||||
following keys: `username, password, host, port, database`.
|
|
||||||
|
|
||||||
After an update of koschei, the database needs to be migrated to new
|
|
||||||
schema. This happens automatically when using the upgrade playbook.
|
|
||||||
Alternatively, it can be executed manulally using:
|
|
||||||
|
|
||||||
....
|
....
|
||||||
koschei-admin alembic upgrade head
|
sudo rbac-playbook openshift-apps/koschei.yml
|
||||||
....
|
....
|
||||||
|
|
||||||
The backend services need to be stopped during the migration.
|
The above playbook is idempotent, which means that running it has no
|
||||||
|
effect when everything is already configured as expected.
|
||||||
|
|
||||||
== Managing koschei services
|
Koschei is fully-containerized. It is deployed on OpenShift.
|
||||||
|
|
||||||
Koschei services are systemd units managed through `systemctl`. They can
|
Koschei is stateless. It doesn't use any persistent storage. All
|
||||||
be started and stopped independently in any order. The frontend is run
|
non-volatile information is stored in PostgreSQL database, which is
|
||||||
using httpd.
|
not part of Koschei, but an external service that Koschei depends on.
|
||||||
|
|
||||||
== Suspending koschei operation
|
There is one common container image for different Koschei workloads --
|
||||||
|
frontend and backend containers are all ran from the same image.
|
||||||
|
|
||||||
For stopping builds from being scheduled, stopping the
|
Koschei images are built by upstream on Quay.io. Upstream implements
|
||||||
`koschei-scheduler` service is enough. For planned Koji outages, it's
|
continuous delivery of container images to Quay.io registry. Code
|
||||||
recommended to stop `koschei-scheduler`. It is not necessary, as koschei
|
pushed to fedora-prod or fedora-stage git branches in upstream GitHub
|
||||||
can recover from Koji errors and network errors automatically, but when
|
repository are automatically built as container images and pushed to
|
||||||
Koji builders are stopped, it may cause unexpected build failures that
|
Quay.io registry with appropriate tags.
|
||||||
would be reported to users. Other services can be left running as they
|
|
||||||
automatically restart themselves on Koji and network errors.
|
Pristine upstream Koschei images are then imported into internal
|
||||||
|
OpenShift registry -- Fedora OpenShift does not build any Koschei
|
||||||
|
container images by itself. Image import into OpenShift is always
|
||||||
|
done manually by a Koschei sysadmin, usually by running a manual
|
||||||
|
Ansible playbook. This way we ensure that developers who can push
|
||||||
|
code to GitHub repository don't have any control over Fedora
|
||||||
|
infrastructure deployment process.
|
||||||
|
|
||||||
|
Upstream images don't contain any Fedora-specific configuration. Such
|
||||||
|
configuration is mounted into containers as read-only volumes backed
|
||||||
|
by Kubernetes Secrets.
|
||||||
|
|
||||||
|
Frontend is ran as Kubernetes Deployment with multiple replicas for
|
||||||
|
high availability. Frontend supports rolling update, which allows it
|
||||||
|
to be updated with no user-visible downtime.
|
||||||
|
|
||||||
|
Each of backend services has its own Kubernetes Deployment with a
|
||||||
|
single replica. Because backend downtime is not user-visible, rolling
|
||||||
|
updates are not used by backend.
|
||||||
|
|
||||||
|
In addition to frontend and backend, there is also `admin` Deployment,
|
||||||
|
which runs a container that does nothing but waits for sysadmin to
|
||||||
|
`rsh` into it for running manual admin commands.
|
||||||
|
|
||||||
|
Besides the forementioned Kubernetes Deployments, some ad-hoc tasks
|
||||||
|
are ran as Kubernetes Jobs, either created on a time schedule from
|
||||||
|
CronJobs or created by running manual Ansible playbooks by Koschei
|
||||||
|
sysadmins.
|
||||||
|
|
||||||
|
== Upgrade
|
||||||
|
|
||||||
|
Upgrading Koschei to a new upstream version is done by running one of
|
||||||
|
manual Ansible playbooks:
|
||||||
|
|
||||||
|
....
|
||||||
|
sudo rbac-playbook manual/upgrade/koschei-rolling.yml
|
||||||
|
sudo rbac-playbook manual/upgrade/koschei-full.yml
|
||||||
|
....
|
||||||
|
|
||||||
|
The first rolling update playbook should be used when given update is
|
||||||
|
known not to change database schema. In this case new upstream image
|
||||||
|
is simply imported into internal OpenShift registry and all
|
||||||
|
Deployments are restarted. OpenShift takes care of doing rolling
|
||||||
|
update of frontend, so that no downtime is experienced by
|
||||||
|
users. Backend Pods are also recreated with the new image.
|
||||||
|
|
||||||
|
The second full update playbook is used when given update changes
|
||||||
|
database schema. This playbook pauses all Deployments and terminates
|
||||||
|
all Pods. Users experience frontend downtime. When everything is
|
||||||
|
stopped, the playbook creates Kubernetes Jobs to run database
|
||||||
|
migrations and perform other maintenance tasks. Once the Jobs are
|
||||||
|
done, new Deployments are rolled.
|
||||||
|
|
||||||
|
== Admin shell
|
||||||
|
|
||||||
|
Certain Koschei operation tasks are done with the `koschei-admin` CLI
|
||||||
|
tool. The container where the tool is available can be accessed with:
|
||||||
|
|
||||||
|
...
|
||||||
|
oc project koschei
|
||||||
|
oc rsh deploy/admin
|
||||||
|
...
|
||||||
|
|
||||||
|
== Suspending Koschei operation
|
||||||
|
|
||||||
|
For stopping builds from being scheduled, scaling down the `scheduler`
|
||||||
|
Deployment to zero replicas is enough. For planned Koji outages, it's
|
||||||
|
recommended to stop the scheduler service. It is not necessary, as
|
||||||
|
Koschei can recover from Koji errors and network errors automatically,
|
||||||
|
but when Koji builders are stopped, it may cause unexpected build
|
||||||
|
failures that would be reported to users. Other backend services can
|
||||||
|
be left running as they automatically restart themselves on Koji and
|
||||||
|
network errors.
|
||||||
|
|
||||||
== Limiting Koji usage
|
== Limiting Koji usage
|
||||||
|
|
||||||
|
@ -130,20 +160,12 @@ scheduled when Koji load is higher that certain threshold. That should
|
||||||
prevent scheduling builds during mass rebuilds, so it's not necessary to
|
prevent scheduling builds during mass rebuilds, so it's not necessary to
|
||||||
stop scheduling during those.
|
stop scheduling during those.
|
||||||
|
|
||||||
== Fedmsg notifications
|
|
||||||
|
|
||||||
Koschei optionally supports sending fedmsg notifications for package
|
|
||||||
state changes. The fedmsg dispatch can be turned on and off in the
|
|
||||||
configuration (key `fedmsg-publisher.enabled`). Koschei doesn't supply
|
|
||||||
configuration for fedmsg, it lets the library to load it's own (in
|
|
||||||
`/etc/fedmsg.d/`).
|
|
||||||
|
|
||||||
== Setting admin announcement
|
== Setting admin announcement
|
||||||
|
|
||||||
Koschei can display announcement in web UI. This is mostly useful to
|
Koschei can display announcement in web UI. This is mostly useful to
|
||||||
inform users about outages or other problems.
|
inform users about outages or other problems.
|
||||||
|
|
||||||
To set announcement, run as koschei user:
|
To set announcement, run:
|
||||||
|
|
||||||
....
|
....
|
||||||
koschei-admin set-notice "Koschei operation is currently suspended due to scheduled Koji outage"
|
koschei-admin set-notice "Koschei operation is currently suspended due to scheduled Koji outage"
|
||||||
|
@ -152,10 +174,10 @@ koschei-admin set-notice "Koschei operation is currently suspended due to schedu
|
||||||
or:
|
or:
|
||||||
|
|
||||||
....
|
....
|
||||||
koschei-admin set-notice "Sumbitting scratch builds by Koschei is currently disabled due to Fedora 23 mass rebuild"
|
koschei-admin set-notice "Submitting scratch builds by Koschei is currently disabled due to Fedora 23 mass rebuild"
|
||||||
....
|
....
|
||||||
|
|
||||||
To clear announcement, run as koschei user:
|
To clear announcement, run:
|
||||||
|
|
||||||
....
|
....
|
||||||
koschei-admin clear-notice
|
koschei-admin clear-notice
|
||||||
|
@ -165,14 +187,14 @@ koschei-admin clear-notice
|
||||||
|
|
||||||
Packages can be added to one or more group.
|
Packages can be added to one or more group.
|
||||||
|
|
||||||
To add new group named `mynewgroup`, run as `koschei` user:
|
To add new group named `mynewgroup`, run:
|
||||||
|
|
||||||
....
|
....
|
||||||
koschei-admin add-group mynewgroup
|
koschei-admin add-group mynewgroup
|
||||||
....
|
....
|
||||||
|
|
||||||
To add new group named `mynewgroup` and populate it with some packages,
|
To add new group named `mynewgroup` and populate it with some
|
||||||
run as `koschei` user:
|
packages, run:
|
||||||
|
|
||||||
....
|
....
|
||||||
koschei-admin add-group mynewgroup pkg1 pkg2 pkg3
|
koschei-admin add-group mynewgroup pkg1 pkg2 pkg3
|
||||||
|
@ -185,8 +207,7 @@ priority. Any user can change manual priority, which is reset after
|
||||||
package is rebuilt. Admins can additionally set static priority, which
|
package is rebuilt. Admins can additionally set static priority, which
|
||||||
is not affected by package rebuilds.
|
is not affected by package rebuilds.
|
||||||
|
|
||||||
To set static priority of package `foo` to value `100`, run as `koschei`
|
To set static priority of package `foo` to value `100`, run:
|
||||||
user:
|
|
||||||
|
|
||||||
....
|
....
|
||||||
koschei-admin --collection f27 set-priority --static foo 100
|
koschei-admin --collection f27 set-priority --static foo 100
|
||||||
|
@ -206,15 +227,14 @@ koschei-admin branch-collection f27 f28 -d 'Fedora 27' -t f28 --bugzilla-version
|
||||||
....
|
....
|
||||||
|
|
||||||
Then you can optionally verify that the collection configuration is
|
Then you can optionally verify that the collection configuration is
|
||||||
correct by visiting https://apps.fedoraproject.org/koschei/collections
|
correct by visiting https://koschei.fedoraproject.org/collections
|
||||||
and examining the configuration of the newly branched collection.
|
and examining the configuration of the newly branched collection.
|
||||||
|
|
||||||
== Edit Koschei group to make it global
|
== Edit Koschei group to make it global
|
||||||
|
|
||||||
Koschei runs in an openshift instance. Connect to the openshift control vm using `ssh` and run the following commands:
|
To turn `mygroup` group created by user `someuser` into a global group
|
||||||
|
`thegroup`, run:
|
||||||
|
|
||||||
....
|
....
|
||||||
oc project koschei
|
koschei-admin edit-group someuser/mygroup --make-global --new-name thegroup
|
||||||
oc rsh <admin pod in the koschei project>
|
|
||||||
koschei-admin edit-group myuser/mygroup --make-global --new-name mygroup
|
|
||||||
....
|
....
|
Loading…
Add table
Add a link
Reference in a new issue