infra-docs-fpo/modules/sysadmin_guide/pages/oraculum.adoc
2024-12-06 17:04:49 +01:00

139 lines
4.5 KiB
Text

= oraculum Infrastructure SOP
https://pagure.io/fedora-qa/oraculum[oraculum] is an app developed
by Fedora QA to aid packagers with maintenance and quality
in Fedora and EPEL releases.
As such, it serves as backend for Packager Dashboard,
testcloud, Fedora Easy Karma, and Pagure dist-git (versions table).
== Contents
* <<_contact_information>>
* <<_file_locations>>
* <<_building_for_infra>>
* <<_upgrading>>
* <<_watchdog>>
* <<_components>>
== Contact Information
Owner::
Fedora QA Devel
Contact::
#fedora-qa
Persons::
jskladan, lbrabec
Servers::
* In OpenShift.
Purpose::
Hosting the https://pagure.io/fedora-qa/oraculum[oraculum] for packagers
== File Locations
`oraculum/cli.py - cli for the app
`oraculum/cli.py debug - interactive debug interface for the app
== Configuration
Configuration is loaded from the environment in the pod. The default configuration is
set in the playbook: `roles/openshift-apps/oraculum/templates/deploymentconfig.yml`. Remember that the configuration needs
to be changed for each of the various pods (described later).
The possible values to set up can be found in `oraculum/config.py` inside
the `openshift_config` function. Apart from that, secrets, tokens, and api keys
are set in the secrets Ansible repository.
== Building for Infra
The application levarages s2i containers. Both the production
and staging instances are tracking `master` branch from the oraculum
repository. The build don't happen automatically, but need
to be triggered manually from the OpenShift web console.
== Upgrading
Oraculum is currently configured through ansible and all
configuration changes need to be done through ansible.
The pod initialization is set in the way that all database upgrades
happen automatically on startup. That means the extra care is needed,
and all deployments that do database changes need to happen on stg first.
== Deployment WatchDog
The deployment is configured to perform automatic liveness testing.
The first phase is running `cli.py upgrade_db`, and the second
phase consists of the cluster trying to get HTTP return
from container on port `8080` on the `oraculum-api-endpoint` pod.
If any of these fail, the cluster automatically reverts
to the previous build, and such failure can be seen on `Events` tab
in the DeploymentConfig details.
Apart from that, the cluster regularly polls the `oraculum-api-endpoint`
for liveness testing. If that fails or times out, a pod restart occurs.
Such event can be seen in `Events` tab of the DeploymentConfig.
== Cache clearing
oraculum doesn't handle any garbage collection in the cache. In some
situations like having stale data in the cache (for example in situations where
bugzilla data wouldn't refresh due to bugs or optimization choices),
or too large db cache, it can be beneficial or even necessary to clear its cache completely. That can be done by clearing all rows in `db_cache` table:
`DELETE * FROM cached_data;`
After that, to minimize downtime, its recommended to manually re-sync
generic providers via `CACHE._refresh`, in the following order:
(in the pod terminal via debug)
[source,python]
----
python oraculum/cli.py debug
CACHE._refresh("fedora_releases")
CACHE._refresh("bodhi_updates")
CACHE._refresh("bodhi_overrides")
CACHE._refresh("package_versions_generic")
CACHE._refresh("pagure_groups")
CACHE._refresh("koschei_data")
CACHE._refresh("packages_owners_json")
----
and finally building up the static cache block manually via:
`oraculum.utils.celery_utils.celery_sync_static_package_caches()`
To do a more lightweight cleanup, removing just PRs, bugs,
and abrt cache can do the trick:
`DELETE FROM cached_data WHERE provider LIKE 'packager-dashboard__all_package_bugs%';`
`DELETE FROM cached_data WHERE provider LIKE 'packager_dashboard_package_prs%';`
`DELETE FROM cached_data WHERE provider LIKE 'packager-dashboard_abrt_issues%';`
== Components of Deployment
Oraculum deployment consists of various pods that run together.
=== oraculum-api-endpoint
Provides api responses rendering endpoint.
Runs via gunicorn in multiple threads.
=== oraculum-worker
Managed via celery, periodic and ad-hoc sync requests are processed
by these. Pods are replicated, and each pods spawns 4 workers.
=== oraculum-beat
Sends periodic sync requests to the workers.
=== oraculum-flower
Provides an overview of the celery/worker queues via http.
Current state of the workers load can be seen in https://packager-dashboard.fedoraproject.org/_flower/[Flower].
=== oraculum-redis
Provides a deployment-local redis instance.