From 8881c92d7d8d47aa2ce41fb33fe577d8d74e787b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Kone=C4=8Dn=C3=BD?= Date: Wed, 18 Aug 2021 12:49:35 +0200 Subject: [PATCH] Review copr SOP MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Michal Konečný --- modules/sysadmin_guide/nav.adoc | 2 +- modules/sysadmin_guide/pages/copr.adoc | 48 ++++++++++++++++---------- 2 files changed, 30 insertions(+), 20 deletions(-) diff --git a/modules/sysadmin_guide/nav.adoc b/modules/sysadmin_guide/nav.adoc index e0c3135..4a49993 100644 --- a/modules/sysadmin_guide/nav.adoc +++ b/modules/sysadmin_guide/nav.adoc @@ -16,7 +16,7 @@ ** xref:collectd.adoc[Collectd - SOP] ** xref:compose-tracker.adoc[Compose Tracker - SOP] ** xref:contenthosting.adoc[Content Hosting Infrastructure - SOP] -** xref:copr.adoc[copr - SOP in review ] +** xref:copr.adoc[Copr - SOP] ** xref:database.adoc[database - SOP in review ] ** xref:datanommer.adoc[datanommer - SOP in review ] ** xref:debuginfod.adoc[debuginfod - SOP in review ] diff --git a/modules/sysadmin_guide/pages/copr.adoc b/modules/sysadmin_guide/pages/copr.adoc index c193a9f..8796caf 100644 --- a/modules/sysadmin_guide/pages/copr.adoc +++ b/modules/sysadmin_guide/pages/copr.adoc @@ -67,7 +67,7 @@ $ systemctl start copr-backend .... Sometimes OpenStack can not handle spawning too much VMs at the same -time. So it is safer to edit on copr-be.cloud.fedoraproject.org: +time. So it is safer to edit on _copr-be.cloud.fedoraproject.org_: .... vi /etc/copr/copr-be.conf @@ -165,7 +165,7 @@ $ rm -rf ./appdata === Backend action queue issues -First check the link:[number of not-yet-processed actions]. If that +First check the _number of not-yet-processed actions_. If that number isn't equal to zero, and is not decrementing relatively fast (say single action takes longer than 30s) -- there might be some problem. Logs for the action dispatcher can be found in: @@ -188,12 +188,13 @@ $ sudo rbac-playbook groups/copr-keygen.yml $ sudo rbac-playbook groups/copr-dist-git.yml .... -https://pagure.io/copr/copr/blob/master/f/copr-setup.txt The -[.title-ref]#copr-setup.txt# manual is severely outdated, but there is +The +https://pagure.io/copr/copr/blob/main/f/copr-setup.txt[copr-setup.txt] +manual is severely outdated, but there is no up-to-date alternative. We should extract useful information from it and put it here in the SOP or into https://docs.pagure.org/copr.copr/maintenance_documentation.html and -then throw the [.title-ref]#copr-setup.txt# away. +then throw the _copr-setup.txt_ away. On backend should run copr-backend service (which spawns several processes). Backend spawns VM from Fedora Cloud. You could not login to @@ -265,9 +266,9 @@ shouldn't be worried with. * redis * lighttpd -All the [.title-ref]#copr-backend-*.service# are configured to be a part -of the [.title-ref]#copr-backend.service# so e.g. in case of restarting -all of them, just restart the [.title-ref]#copr-backend.service#. +All the _copr-backend-*.service_ are configured to be a part +of the _copr-backend.service_ so e.g. in case of restarting +all of them, just restart the _copr-backend.service_. === Frontend @@ -289,18 +290,27 @@ Builders for PPC64 are located at rh-power2.fit.vutbr.cz and anyone with access to buildsys ssh key can get there using keys as:: msuchy@rh-power2.fit.vutbr.cz -There are commands: $ ls bin/ destroy-all.sh reinit-vm26.sh +There are commands: +.... +$ ls bin/ +destroy-all.sh reinit-vm26.sh reinit-vm28.sh virsh-destroy-vm26.sh virsh-destroy-vm28.sh virsh-start-vm26.sh virsh-start-vm28.sh get-one-vm.sh reinit-vm27.sh reinit-vm29.sh virsh-destroy-vm27.sh virsh-destroy-vm29.sh virsh-start-vm27.sh virsh-start-vm29.sh +.... -bin/destroy-all.sh destroy all VM and reinit them reinit-vmXX.sh copy VM -image from template virsh-destroy-vmXX.sh destroys VM -virsh-start-vmXX.sh starts VM get-one-vm.sh start one VM and return its -IP - this is used in Copr playbooks. +`destroy-all.sh` destroy all VM and reinit them -In case of big queue of PPC64 tasks simply call bin/destroy-all.sh and +`reinit-vmXX.sh` copy VM image from template + +`virsh-destroy-vmXX.sh` destroys VM + +`virsh-start-vmXX.sh` starts VM + +`get-one-vm.sh` start one VM and return its IP - this is used in Copr playbooks. + +In case of big queue of PPC64 tasks simply call `bin/destroy-all.sh` and it will destroy stuck VM and copr backend will spawn new VM. == Ports opened for public @@ -360,7 +370,7 @@ I don't think we can settle down with any instance that provides less than (2G RAM, obviously), but ideally, we need 3G+. 2-core CPU is good enough. -* Disk space: 17G for system and 8G for [.title-ref]#pgsqldb# directory +* Disk space: 17G for system and 8G for _pgsqldb_ directory If needed, we are able to clean-up the database directory of old dumps and backups and get down to around 4G disk space. @@ -371,7 +381,7 @@ and backups and get down to around 4G disk space. * CPU: 8 cores (3400MHz) with load 4.09, 4.55, 4.24 Backend takes care of spinning-up builders and running ansible playbooks -on them, running [.title-ref]#createrepo_c# (on big repositories) and so +on them, running _createrepo_c_ (on big repositories) and so on. Copr utilizes two queues, one for builds, which are delegated to OpenStack builders, and action queue. Actions, however, are processed directly by the backend, so it can spike our load up. We would ideally @@ -406,9 +416,9 @@ distgit data, so we can't go any lower than what we have. * RAM: ~150M (out of 2G) * CPU: 1 core (3400MHz) with load 0.10, 0.31, 0.25 -We are basically running just [.title-ref]#signd# and -[.title-ref]#httpd# here, both with minimal resource requirements. The -memory usage is topped by [.title-ref]#systemd-journald#. +We are basically running just _signd_ and +_httpd_ here, both with minimal resource requirements. The +memory usage is topped by _systemd-journald_. * Disk space: 7G for system and ~500M (out of ~700M) for GPG keys