diff --git a/modules/sysadmin_guide/pages/koschei.adoc b/modules/sysadmin_guide/pages/koschei.adoc index 8ebedb7..f52b2bc 100644 --- a/modules/sysadmin_guide/pages/koschei.adoc +++ b/modules/sysadmin_guide/pages/koschei.adoc @@ -5,121 +5,151 @@ runs package scratch builds after dependency change or after time elapse and reports package buildability status to interested parties. Production instance:: - https://apps.fedoraproject.org/koschei -Staginginstance:: - https://apps.stg.fedoraproject.org/koschei + https://koschei.fedoraproject.org/ +Staging instance:: + https://koschei.stg.fedoraproject.org/ == Contact Information Owner:: - mizdebsk, msimacek + mizdebsk Contact:: #fedora-admin Location:: - Fedora Cloud + Fedora infrastructure OpenShift Purpose:: continuous integration system -== Deployment - -Koschei deployment is managed by two Ansible playbooks: - -.... -sudo rbac-playbook groups/koschei-backend.yml -sudo rbac-playbook groups/koschei-web.yml -.... - == Description -Koschei is deployed on two separate machines - `koschei-backend` and -`koschei-web` +Koschei consists of frontend and backend. -Frontend (`koschei-web`) is a Flask WSGi application running with httpd. -It displays information to users and allows editing package groups and -changing priorities. +Frontend is a web application written in Python using Flask framework. +It is ran under Apache httpd with mod_wsgi as a WSGi application. +Frontend displays information to users and allows editing package +groups and changing priorities. -Backend (`koschei-backend`) consists of multiple services: +Backend consists of a couple of loosely-coupled microservices, +including: -* `koschei-watcher` - listens to fedmsg events for complete builds and -changes build states in the database -* `koschei-repo-resolver` - resolves package dependencies in given repo -using hawkey and compares them with previous iteration to get a -dependency diff. It resolves all packages in the newest repo available -in Koji. The output is a base for scheduling new builds -* `koschei-build-resolver` - resolves complete builds in the repo in -which they were done in Koji. Produces the dependency differences -visible in the frontend -* `koschei-scheduler` - schedules new builds based on multiple criteria: +* `watcher` - listens to events on Fedora messaging bus for complete +builds and changes build states in the database. +* `repo-resolver` - resolves package dependencies in given repo using +hawkey and compares them with previous iteration to get a dependency +diff. It resolves all packages in the newest repo available in +Koji. The output is a base for scheduling new builds. +* `build-resolver` - resolves complete builds in the repo in which +they were done in Koji. Produces the dependency differences visible in +the frontend. +* `scheduler` - schedules new builds based on multiple criteria: ** dependency priority - dependency changes since last build valued by their distance in the dependency graph ** manual and static priorities - set manually in the frontend. Manual priority is reset after each build, static priority persists -** time priority - time elapsed since the last build -* `koschei-polling` - polls the same types of events as koschei-watcher -without reliance on fedmsg. Additionaly takes care of package list -synchronization and other regularly executed tasks +** time priority - time elapsed since the last build. +* `polling` - polls the same types of events as `watcher` without +reliance on the messaging bus. Additionally takes care of package list +synchronization and other regularly executed tasks. -== Configuration +== Deployment -Koschei configuration is in `/etc/koschei/config-backend.cfg` and -`/etc/koschei/config-frontend.cfg`, and is merged with the default -configuration in `/usr/share/koschei/config.cfg` (the ones in `/etc` -overrides the defaults in `/usr`). Note the merge is recursive. The -configuration contains all configurable items for all Koschei services -and the frontend. The alterations to configuration that aren't temporary -should be done through ansible playbook. Configuration changes have no -effect on already running services -- they need to be restarted, which -happens automatically when using the playbook. - -== Disk usage - -Koschei doesn't keep on disk anything that couldn't be recreated easily - -all important data is stored in PostgreSQL database, configuration is -managed by Ansible, code installed by RPM and so on. - -To speed up operation and reduce load on external servers, Koschei -caches some data obtained from services it integrates with. Most -notably, YUM repositories downloaded from Koji are kept in -`/var/cache/koschei/repodata`. Each repository takes about 100 MB of -disk space. Maximal number of repositories kept at time is controlled by -`cache_l2_capacity` parameter in `config-backend.cfg` -(`config-backend.cfg.j2` in Ansible). If repodata cache starts to -consume too much disk space, that value can be decreased - after -restart, `koschei-*-resolver` will remove least recently used cache -entries to respect configured cache capacity. - -== Database - -Koschei needs to connect to a PostgreSQL database, other database -systems are not supported. Database connection is specified in the -configuration under the `database_config` key that can contain the -following keys: `username, password, host, port, database`. - -After an update of koschei, the database needs to be migrated to new -schema. This happens automatically when using the upgrade playbook. -Alternatively, it can be executed manulally using: +Koschei deployment is managed by an Ansible playbook: .... -koschei-admin alembic upgrade head +sudo rbac-playbook openshift-apps/koschei.yml .... -The backend services need to be stopped during the migration. +The above playbook is idempotent, which means that running it has no +effect when everything is already configured as expected. -== Managing koschei services +Koschei is fully-containerized. It is deployed on OpenShift. -Koschei services are systemd units managed through `systemctl`. They can -be started and stopped independently in any order. The frontend is run -using httpd. +Koschei is stateless. It doesn't use any persistent storage. All +non-volatile information is stored in PostgreSQL database, which is +not part of Koschei, but an external service that Koschei depends on. -== Suspending koschei operation +There is one common container image for different Koschei workloads -- +frontend and backend containers are all ran from the same image. -For stopping builds from being scheduled, stopping the -`koschei-scheduler` service is enough. For planned Koji outages, it's -recommended to stop `koschei-scheduler`. It is not necessary, as koschei -can recover from Koji errors and network errors automatically, but when -Koji builders are stopped, it may cause unexpected build failures that -would be reported to users. Other services can be left running as they -automatically restart themselves on Koji and network errors. +Koschei images are built by upstream on Quay.io. Upstream implements +continuous delivery of container images to Quay.io registry. Code +pushed to fedora-prod or fedora-stage git branches in upstream GitHub +repository are automatically built as container images and pushed to +Quay.io registry with appropriate tags. + +Pristine upstream Koschei images are then imported into internal +OpenShift registry -- Fedora OpenShift does not build any Koschei +container images by itself. Image import into OpenShift is always +done manually by a Koschei sysadmin, usually by running a manual +Ansible playbook. This way we ensure that developers who can push +code to GitHub repository don't have any control over Fedora +infrastructure deployment process. + +Upstream images don't contain any Fedora-specific configuration. Such +configuration is mounted into containers as read-only volumes backed +by Kubernetes Secrets. + +Frontend is ran as Kubernetes Deployment with multiple replicas for +high availability. Frontend supports rolling update, which allows it +to be updated with no user-visible downtime. + +Each of backend services has its own Kubernetes Deployment with a +single replica. Because backend downtime is not user-visible, rolling +updates are not used by backend. + +In addition to frontend and backend, there is also `admin` Deployment, +which runs a container that does nothing but waits for sysadmin to +`rsh` into it for running manual admin commands. + +Besides the forementioned Kubernetes Deployments, some ad-hoc tasks +are ran as Kubernetes Jobs, either created on a time schedule from +CronJobs or created by running manual Ansible playbooks by Koschei +sysadmins. + +== Upgrade + +Upgrading Koschei to a new upstream version is done by running one of +manual Ansible playbooks: + +.... +sudo rbac-playbook manual/upgrade/koschei-rolling.yml +sudo rbac-playbook manual/upgrade/koschei-full.yml +.... + +The first rolling update playbook should be used when given update is +known not to change database schema. In this case new upstream image +is simply imported into internal OpenShift registry and all +Deployments are restarted. OpenShift takes care of doing rolling +update of frontend, so that no downtime is experienced by +users. Backend Pods are also recreated with the new image. + +The second full update playbook is used when given update changes +database schema. This playbook pauses all Deployments and terminates +all Pods. Users experience frontend downtime. When everything is +stopped, the playbook creates Kubernetes Jobs to run database +migrations and perform other maintenance tasks. Once the Jobs are +done, new Deployments are rolled. + +== Admin shell + +Certain Koschei operation tasks are done with the `koschei-admin` CLI +tool. The container where the tool is available can be accessed with: + +... +oc project koschei +oc rsh deploy/admin +... + +== Suspending Koschei operation + +For stopping builds from being scheduled, scaling down the `scheduler` +Deployment to zero replicas is enough. For planned Koji outages, it's +recommended to stop the scheduler service. It is not necessary, as +Koschei can recover from Koji errors and network errors automatically, +but when Koji builders are stopped, it may cause unexpected build +failures that would be reported to users. Other backend services can +be left running as they automatically restart themselves on Koji and +network errors. == Limiting Koji usage @@ -130,20 +160,12 @@ scheduled when Koji load is higher that certain threshold. That should prevent scheduling builds during mass rebuilds, so it's not necessary to stop scheduling during those. -== Fedmsg notifications - -Koschei optionally supports sending fedmsg notifications for package -state changes. The fedmsg dispatch can be turned on and off in the -configuration (key `fedmsg-publisher.enabled`). Koschei doesn't supply -configuration for fedmsg, it lets the library to load it's own (in -`/etc/fedmsg.d/`). - == Setting admin announcement Koschei can display announcement in web UI. This is mostly useful to inform users about outages or other problems. -To set announcement, run as koschei user: +To set announcement, run: .... koschei-admin set-notice "Koschei operation is currently suspended due to scheduled Koji outage" @@ -152,10 +174,10 @@ koschei-admin set-notice "Koschei operation is currently suspended due to schedu or: .... -koschei-admin set-notice "Sumbitting scratch builds by Koschei is currently disabled due to Fedora 23 mass rebuild" +koschei-admin set-notice "Submitting scratch builds by Koschei is currently disabled due to Fedora 23 mass rebuild" .... -To clear announcement, run as koschei user: +To clear announcement, run: .... koschei-admin clear-notice @@ -165,14 +187,14 @@ koschei-admin clear-notice Packages can be added to one or more group. -To add new group named `mynewgroup`, run as `koschei` user: +To add new group named `mynewgroup`, run: .... koschei-admin add-group mynewgroup .... -To add new group named `mynewgroup` and populate it with some packages, -run as `koschei` user: +To add new group named `mynewgroup` and populate it with some +packages, run: .... koschei-admin add-group mynewgroup pkg1 pkg2 pkg3 @@ -185,8 +207,7 @@ priority. Any user can change manual priority, which is reset after package is rebuilt. Admins can additionally set static priority, which is not affected by package rebuilds. -To set static priority of package `foo` to value `100`, run as `koschei` -user: +To set static priority of package `foo` to value `100`, run: .... koschei-admin --collection f27 set-priority --static foo 100 @@ -206,15 +227,14 @@ koschei-admin branch-collection f27 f28 -d 'Fedora 27' -t f28 --bugzilla-version .... Then you can optionally verify that the collection configuration is -correct by visiting https://apps.fedoraproject.org/koschei/collections +correct by visiting https://koschei.fedoraproject.org/collections and examining the configuration of the newly branched collection. == Edit Koschei group to make it global -Koschei runs in an openshift instance. Connect to the openshift control vm using `ssh` and run the following commands: +To turn `mygroup` group created by user `someuser` into a global group +`thegroup`, run: .... -oc project koschei -oc rsh -koschei-admin edit-group myuser/mygroup --make-global --new-name mygroup -.... \ No newline at end of file +koschei-admin edit-group someuser/mygroup --make-global --new-name thegroup +....