diff --git a/modules/sysadmin_guide/pages/blockerbugs.adoc b/modules/sysadmin_guide/pages/blockerbugs.adoc index b8b1507..8d93af0 100644 --- a/modules/sysadmin_guide/pages/blockerbugs.adoc +++ b/modules/sysadmin_guide/pages/blockerbugs.adoc @@ -10,9 +10,8 @@ freeze exception bugs in branched Fedora releases. * <<_file_locations>> * <<_building_for_infra>> * <<_upgrading>> -** <<_upgrade_preparation_all_upgrades>> -** <<_minor_upgrades_no_database_changes>> -** <<_major_upgrades_with_database_changes>> +* <<_watchdog>> +* <<_sync>> == Contact Information @@ -20,137 +19,61 @@ Owner:: Fedora QA Devel Contact:: #fedora-qa -Location:: - iad2 +Persons:: + jskladan, kparal Servers:: - blockerbugs01.iad2, blockerbugs02.iad2, blockerbugs01.stg.iad2 + * In OpenShift. Purpose:: Hosting the https://pagure.io/fedora-qa/blockerbugs[blocker bug tracking application] for QA == File Locations -`/etc/blockerbugs/settings.py` - configuration for the app +`blockerbugs/cli.py - cli for the app -=== Node Roles +== Configuration -blockerbugs01.stg.iad2:: - the staging instance, it is not load balanced -blockerbugs01.iad2:: - one of the load balanced production nodes, it is responsible for - running bugzilla/bodhi/koji sync -blockerbugs02.iad2:: - the other load balanced production node. It does not do any sync - operations +Configuration is loaded from the environment in the pod. The default configuration is +set in the playbook: `roles/openshift-apps/blockerbugs/templates/deploymentconfig.yml`. + +The possible values to set up can be found in `blockerbugs/config.py` inside +the `openshift_config` function. Apart from that, secrets, tokens, and api keys +are set in the secrets Ansible repository. == Building for Infra -=== Do not use mock - -For whatever reason, the `epel7-infra` koji tag rejects SRPMs with the -`el7.centos` dist tag. Make sure that you build SRPMs with: - -.... -rpmbuild -bs --define='dist .el7' blockerbugs.spec -.... - -Also note that this expects the release tarball to be in -`~/rpmbuild/SOURCES/`. - -=== Building with Koji - -You'll need to ask someone who has rights to build into `epel7-infra` -tag to make the build for you: - -.... -koji build epel7-infra blockerbugs-0.4.4.11-1.el7.src.rpm -.... - -[NOTE] -==== -The fun bit of this is that `python-flask` is only available on `x86_64` -builders. If your build is routed to one of the non-x86_64, it will -fail. The only solution available to us is to keep submitting the build -until it's routed to one of the x86_64 builders and doesn't fail. -==== - -Once the build is complete, it should be automatically tagged into -`epel7-infra-stg` (after a ~15 min delay), so that you can test it on -blockerbugs staging instance. Once you've verified it's working well, -ask someone with infra rights to move it to `epel7-infra` tag so that -you can update it in production. +The application levarages s2i containers. The production instance is +tracking `master` branch from the blockerbugs repository, the staging instance +is tracking `develop` branch. The build don't happen automatically, but need +to be triggered manually from the OpenShift web console. == Upgrading Blockerbugs is currently configured through ansible and all configuration changes need to be done through ansible. -=== Upgrade Preparation (all upgrades) +The pod initialization is set in the way that all database upgrades +happen automatically on startup. That means the extra care is needed, +and all deployments that do database changes need to happen on stg first. -Blockerbugs is not packaged in epel, so the new build needs to exist in -the infrastructure stg repo for deployment to stg or the infrastructure -repo for deployments to production. +== Deployment WatchDog -See the blockerbugs documentation for instructions on building a -blockerbugs RPM. +The deployment is configured to perform automatic liveness testing. +The first phase is running `cli.py upgrade_db`, and the second +phase consists of the cluster trying to get HTTP return +from container on port `8080` on the pod. -=== Minor Upgrades (no database changes) +If any of these fail, the cluster automatically reverts +to the previous build, and such failure can be seen on `Events` tab +in the DeploymentConfig details. -Run the following on *both* `blockerbugs01.iad2` and -`blockerbugs02.iad2` if updating in production. +Apart from that, the cluster regularly polls the pod +for liveness testing. If that fails or times out, a pod restart occurs. +Such event can be seen in `Events` tab of the DeploymentConfig. -[arabic] -. Update ansible with config changes, push changes to the ansible repo: -+ -.... -roles/blockerbugs/templates/blockerbugs-settings.py.j2 -.... -. Clear yum cache and update the blockerbugs RPM: -+ -.... -yum clean expire-cache && yum update blockerbugs -.... -. Restart httpd to reload the application: -+ -.... -service httpd restart -.... +== Periodic sync -=== Major Upgrades (with database changes) +Blockerbugs app deployment consists of two pods. One serves as both backend and +frontend, the other is spawned every 30 minutes with the `cli.py sync` executed. +This synchronizes the data from bugzilla and pagure into the blockerbugs db. -Run the following on *both* `blockerbugs01.phx2` and -`blockerbugs02.phx2` if updating in production. - -[arabic] -. Update ansible with config changes, push changes to the ansible repo: -+ -.... -roles/blockerbugs/templates/blockerbugs-settings.py.j2 -.... -. Stop httpd on *all* relevant instances (if load balanced): -+ -.... -service httpd stop -.... -. Clear yum cache and update the blockerbugs RPM on all relevant -instances: -+ -.... -yum clean expire-cache && yum update blockerbugs -.... -. Upgrade the database schema: -+ -.... -blockerbugs upgrade_db -.... -. Check the upgrade by running a manual sync to make sure that nothing -unexpected went wrong: -+ -.... -blockerbugs sync -.... -. Start httpd back up: -+ -.... -service httpd start -.... diff --git a/modules/sysadmin_guide/pages/oraculum.adoc b/modules/sysadmin_guide/pages/oraculum.adoc new file mode 100644 index 0000000..689dbb0 --- /dev/null +++ b/modules/sysadmin_guide/pages/oraculum.adoc @@ -0,0 +1,139 @@ += oraculum Infrastructure SOP + +https://pagure.io/fedora-qa/oraculum[oraculum] is an app developed +by Fedora QA to aid packagers with maintenance and quality +in Fedora and EPEL releases. +As such, it serves as backend for Packager Dashboard, +testcloud, Fedora Easy Karma, and Pagure dist-git (versions table). + +== Contents + +* <<_contact_information>> +* <<_file_locations>> +* <<_building_for_infra>> +* <<_upgrading>> +* <<_watchdog>> +* <<_components>> + +== Contact Information + +Owner:: + Fedora QA Devel +Contact:: + #fedora-qa +Persons:: + jskladan, lbrabec +Servers:: + * In OpenShift. +Purpose:: + Hosting the https://pagure.io/fedora-qa/oraculum[oraculum] for packagers + +== File Locations + +`oraculum/cli.py - cli for the app +`oraculum/cli.py debug - interactive debug interface for the app + +== Configuration + +Configuration is loaded from the environment in the pod. The default configuration is +set in the playbook: `roles/openshift-apps/oraculum/templates/deploymentconfig.yml`. Remember that the configuration needs +to be changed for each of the various pods (described later). + +The possible values to set up can be found in `oraculum/config.py` inside +the `openshift_config` function. Apart from that, secrets, tokens, and api keys +are set in the secrets Ansible repository. + +== Building for Infra + +The application levarages s2i containers. Both the production +and staging instances are tracking `master` branch from the oraculum +repository. The build don't happen automatically, but need +to be triggered manually from the OpenShift web console. + +== Upgrading + +Oraculum is currently configured through ansible and all +configuration changes need to be done through ansible. + +The pod initialization is set in the way that all database upgrades +happen automatically on startup. That means the extra care is needed, +and all deployments that do database changes need to happen on stg first. + +== Deployment WatchDog + +The deployment is configured to perform automatic liveness testing. +The first phase is running `cli.py upgrade_db`, and the second +phase consists of the cluster trying to get HTTP return +from container on port `8080` on the `oraculum-api-endpoint` pod. + +If any of these fail, the cluster automatically reverts +to the previous build, and such failure can be seen on `Events` tab +in the DeploymentConfig details. + +Apart from that, the cluster regularly polls the `oraculum-api-endpoint` +for liveness testing. If that fails or times out, a pod restart occurs. +Such event can be seen in `Events` tab of the DeploymentConfig. + +== Cache clearing + +oraculum doesn't handle any garbage collection in the cache. In some +situations like having stale data in the cache (for example in situations where +bugzilla data wouldn't refresh due to bugs or optimization choices), +or too large db cache, it can be beneficial or even necessary to clear its cache completely. That can be done by clearing all rows in `db_cache` table: + +`DELETE * FROM cached_data;` + +After that, to minimize downtime, its recommended to manually re-sync +generic providers via `CACHE._refresh`, in the following order: +(in the pod terminal via debug) + +[source,python] +---- +python oraculum/cli.py debug +CACHE._refresh("fedora_releases") +CACHE._refresh("bodhi_updates") +CACHE._refresh("bodhi_overrides") +CACHE._refresh("package_versions_generic") +CACHE._refresh("pagure_groups") +CACHE._refresh("koschei_data") +CACHE._refresh("packages_owners_json") +---- + +and finally building up the static cache block manually via: +`oraculum.utils.celery_utils.celery_sync_static_package_caches()` + +To do a more lightweight cleanup, removing just PRs, bugs, +and abrt cache can do the trick: + +`DELETE FROM cached_data WHERE provider LIKE 'packager-dashboard__all_package_bugs%';` + +`DELETE FROM cached_data WHERE provider LIKE 'packager_dashboard_package_prs%';` + +`DELETE FROM cached_data WHERE provider LIKE 'packager-dashboard_abrt_issues%';` + +== Components of Deployment + +Oraculum deployment consists of various pods that run together. + +=== oraculum-api-endpoint + +Provides api responses rendering endpoint. +Runs via gunicorn in multiple threads. + +=== oraculum-worker + +Managed via celery, periodic and ad-hoc sync requests are processed +by these. Pods are replicated, and each pods spawns 4 workers. + +=== oraculum-beat + +Sends periodic sync requests to the workers. + +=== oraculum-flower + +Provides an overview of the celery/worker queues via http. +Current state of the workers load can be seen in https://packager-dashboard.fedoraproject.org/_flower/[Flower]. + +=== oraculum-redis + +Provides a deployment-local redis instance. diff --git a/modules/sysadmin_guide/pages/testdays.adoc b/modules/sysadmin_guide/pages/testdays.adoc new file mode 100644 index 0000000..12e5b05 --- /dev/null +++ b/modules/sysadmin_guide/pages/testdays.adoc @@ -0,0 +1,99 @@ += testdays Infrastructure SOP + +https://pagure.io/fedora-qa/testdays-web/[testdays] is an app developed +by Fedora QA to aid with managing testday events for the community. + +== Contents + +* <<_contact_information>> +* <<_file_locations>> +* <<_building_for_infra>> +* <<_upgrading>> +* <<_watchdog>> +* <<_components>> + +== Contact Information + +Owner:: + Fedora QA Devel +Contact:: + #fedora-qa +Persons:: + jskladan, smukher +Servers:: + * In OpenShift. +Purpose:: + Hosting the https://pagure.io/fedora-qa/testdays-web/[testdays] for the QA ad the community + +== File Locations + +`testdays/cli.py - cli for the app +`resultsdb/cli.py - cli for the resultsDB + +== Configuration + +Configuration is loaded from the environment in the pod. The default configuration is +set in the playbook: `roles/openshift-apps/testdays/templates/deploymentconfig.yml`. Remember that the configuration needs +to be changed for the both pods (testdays, and resultsdb). + +The possible values to set up can be found in `testdays/config.py` and +`resultsdb/config.py` inside the `openshift_config` function. +Apart from that, secrets, tokens, and api keys are set +in the secrets Ansible repository. + +== Building for Infra + +The application levarages s2i containers. Both the production +and staging instances of testcloud are tracking `master` +branch from the testdays-web repository, resultsdb instance +is tracking legacy_testdays branch on both prod and stg. +The build don't happen automatically, but need +to be triggered manually from the OpenShift web console. + +== Upgrading + +Testdays is currently configured through ansible and all +configuration changes need to be done through ansible. + +The pod initialization is set in the way that all database upgrades +happen automatically on startup. That means the extra care is needed, +and all deployments that do database changes need to happen on stg first. + +== Deployment sanity test + +The deployment is configured to perform automatic sanity testing. +The first phase is running `cli.py upgrade_db`, and the second +phase consists of the cluster trying to get HTTP return +from container on port `8080` on the `testdays` pod. + +If any of these fail, the cluster automatically reverts +to the previous build, and such failure can be seen on `Events` tab +in the DeploymentConfig details. + +== Deployment WatchDog + +The deployment is configured to perform automatic liveness testing. +The first phase is running `cli.py upgrade_db`, and the second +phase consists of the cluster trying to get HTTP return +from container on port `8080` on the `testdays` and `resutlsdb` pods. + +If any of these fail, the cluster automatically reverts +to the previous build, and such failure can be seen on `Events` tab +in the DeploymentConfig details. + +Apart from that, the cluster regularly polls the `testdays` and `resultsdb` +for liveness testing. If that fails or times out, a pod restart occurs. +Such event can be seen in `Events` tab of the DeploymentConfig. + +== Components of Deployment + +=== Testdays + +The base testdays app that provides both backend and frontend +inside the single deployment. + +=== ResultsDB + +Forked state of the upstream ResultsDB that has OpenShift changes +applied on top of it while not introducing any other changes that +are in upstream branch. Available on https://pagure.io/taskotron/resultsdb/tree/legacy_testdays[Pagure].