139 lines
4.5 KiB
Text
139 lines
4.5 KiB
Text
= oraculum Infrastructure SOP
|
|
|
|
https://pagure.io/fedora-qa/oraculum[oraculum] is an app developed
|
|
by Fedora QA to aid packagers with maintenance and quality
|
|
in Fedora and EPEL releases.
|
|
As such, it serves as backend for Packager Dashboard,
|
|
testcloud, Fedora Easy Karma, and Pagure dist-git (versions table).
|
|
|
|
== Contents
|
|
|
|
* <<_contact_information>>
|
|
* <<_file_locations>>
|
|
* <<_building_for_infra>>
|
|
* <<_upgrading>>
|
|
* <<_watchdog>>
|
|
* <<_components>>
|
|
|
|
== Contact Information
|
|
|
|
Owner::
|
|
Fedora QA Devel
|
|
Contact::
|
|
#fedora-qa
|
|
Persons::
|
|
jskladan, lbrabec
|
|
Servers::
|
|
* In OpenShift.
|
|
Purpose::
|
|
Hosting the https://pagure.io/fedora-qa/oraculum[oraculum] for packagers
|
|
|
|
== File Locations
|
|
|
|
`oraculum/cli.py - cli for the app
|
|
`oraculum/cli.py debug - interactive debug interface for the app
|
|
|
|
== Configuration
|
|
|
|
Configuration is loaded from the environment in the pod. The default configuration is
|
|
set in the playbook: `roles/openshift-apps/oraculum/templates/deploymentconfig.yml`. Remember that the configuration needs
|
|
to be changed for each of the various pods (described later).
|
|
|
|
The possible values to set up can be found in `oraculum/config.py` inside
|
|
the `openshift_config` function. Apart from that, secrets, tokens, and api keys
|
|
are set in the secrets Ansible repository.
|
|
|
|
== Building for Infra
|
|
|
|
The application levarages s2i containers. Both the production
|
|
and staging instances are tracking `master` branch from the oraculum
|
|
repository. The build don't happen automatically, but need
|
|
to be triggered manually from the OpenShift web console.
|
|
|
|
== Upgrading
|
|
|
|
Oraculum is currently configured through ansible and all
|
|
configuration changes need to be done through ansible.
|
|
|
|
The pod initialization is set in the way that all database upgrades
|
|
happen automatically on startup. That means the extra care is needed,
|
|
and all deployments that do database changes need to happen on stg first.
|
|
|
|
== Deployment WatchDog
|
|
|
|
The deployment is configured to perform automatic liveness testing.
|
|
The first phase is running `cli.py upgrade_db`, and the second
|
|
phase consists of the cluster trying to get HTTP return
|
|
from container on port `8080` on the `oraculum-api-endpoint` pod.
|
|
|
|
If any of these fail, the cluster automatically reverts
|
|
to the previous build, and such failure can be seen on `Events` tab
|
|
in the DeploymentConfig details.
|
|
|
|
Apart from that, the cluster regularly polls the `oraculum-api-endpoint`
|
|
for liveness testing. If that fails or times out, a pod restart occurs.
|
|
Such event can be seen in `Events` tab of the DeploymentConfig.
|
|
|
|
== Cache clearing
|
|
|
|
oraculum doesn't handle any garbage collection in the cache. In some
|
|
situations like having stale data in the cache (for example in situations where
|
|
bugzilla data wouldn't refresh due to bugs or optimization choices),
|
|
or too large db cache, it can be beneficial or even necessary to clear its cache completely. That can be done by clearing all rows in `db_cache` table:
|
|
|
|
`DELETE * FROM cached_data;`
|
|
|
|
After that, to minimize downtime, its recommended to manually re-sync
|
|
generic providers via `CACHE._refresh`, in the following order:
|
|
(in the pod terminal via debug)
|
|
|
|
[source,python]
|
|
----
|
|
python oraculum/cli.py debug
|
|
CACHE._refresh("fedora_releases")
|
|
CACHE._refresh("bodhi_updates")
|
|
CACHE._refresh("bodhi_overrides")
|
|
CACHE._refresh("package_versions_generic")
|
|
CACHE._refresh("pagure_groups")
|
|
CACHE._refresh("koschei_data")
|
|
CACHE._refresh("packages_owners_json")
|
|
----
|
|
|
|
and finally building up the static cache block manually via:
|
|
`oraculum.utils.celery_utils.celery_sync_static_package_caches()`
|
|
|
|
To do a more lightweight cleanup, removing just PRs, bugs,
|
|
and abrt cache can do the trick:
|
|
|
|
`DELETE FROM cached_data WHERE provider LIKE 'packager-dashboard__all_package_bugs%';`
|
|
|
|
`DELETE FROM cached_data WHERE provider LIKE 'packager_dashboard_package_prs%';`
|
|
|
|
`DELETE FROM cached_data WHERE provider LIKE 'packager-dashboard_abrt_issues%';`
|
|
|
|
== Components of Deployment
|
|
|
|
Oraculum deployment consists of various pods that run together.
|
|
|
|
=== oraculum-api-endpoint
|
|
|
|
Provides api responses rendering endpoint.
|
|
Runs via gunicorn in multiple threads.
|
|
|
|
=== oraculum-worker
|
|
|
|
Managed via celery, periodic and ad-hoc sync requests are processed
|
|
by these. Pods are replicated, and each pods spawns 4 workers.
|
|
|
|
=== oraculum-beat
|
|
|
|
Sends periodic sync requests to the workers.
|
|
|
|
=== oraculum-flower
|
|
|
|
Provides an overview of the celery/worker queues via http.
|
|
Current state of the workers load can be seen in https://packager-dashboard.fedoraproject.org/_flower/[Flower].
|
|
|
|
=== oraculum-redis
|
|
|
|
Provides a deployment-local redis instance.
|