infra-docs-fpo/modules/sysadmin_guide/pages/copr.adoc

= Copr

Copr is build system for 3rd party packages.

Frontend:::
  * http://copr.fedorainfracloud.org/
Backend:::
  * http://copr-be.cloud.fedoraproject.org/
Package signer:::
  * copr-keygen.cloud.fedoraproject.org
Dist-git::
  * copr-dist-git.fedorainfracloud.org
Devel instances (NO NEED TO CARE ABOUT THEM, JUST THOSE ABOVE):::
  * http://copr-fe-dev.cloud.fedoraproject.org/
  * http://copr-be-dev.cloud.fedoraproject.org/
  * copr-keygen-dev.cloud.fedoraproject.org
  * copr-dist-git-dev.fedorainfracloud.org

== Contact Information

Owner::
  msuchy (mirek)
Contact::
  #fedora-admin, #fedora-buildsys
Location::
  Fedora Cloud
Purpose::
  Build system

== This document

This document provides a condensed information allowing you to keep Copr
alive and working. For more sofisticated business processes, please see
https://docs.pagure.org/copr.copr/maintenance_documentation.html

== TROUBLESHOOTING

Almost every problem with Copr is due problem with spawning builder VMs,
or with processing action queue on backend.

=== VM spawning/termination problems

Try to restart copr-backend service:

....
$ ssh root@copr-be.cloud.fedoraproject.org
$ systemctl restart copr-backend
....

If this doesn't solve the problem, try to follow logs for some clues:

....
$ tail -f /var/log/copr-backend/{vmm,spawner,terminator}.log
....

As the last resort option, you can terminate all builders and let
copr-backend to throw all information about them. This action will
obviously interrupt all running builds and reschedule them:

....
$ ssh root@copr-be.cloud.fedoraproject.org
$ systemctl stop copr-backend
$ cleanup_vm_nova.py
$ redis-cli
> FLUSHALL
$ systemctl start copr-backend
....

Sometimes OpenStack can not handle spawning too much VMs at the same
time. So it is safer to edit on _copr-be.cloud.fedoraproject.org_:

....
vi /etc/copr/copr-be.conf
....

and change:

....
group0_max_workers=12
....

to "6". Start copr-backend service and some time later increase it to
original value. Copr automaticaly detect change in script and increase
number of workers.

The set of aarch64 VMs isn't maintained by OpenStack, but by Copr's
backend itself. Steps to diagnose:

....
$ ssh root@copr-be.cloud.fedoraproject.org
[root@copr-be ~][PROD]# systemctl status resalloc
● resalloc.service - Resource allocator server
...

[root@copr-be ~][PROD]# less /var/log/resallocserver/main.log

[root@copr-be ~][PROD]# su - resalloc

[resalloc@copr-be ~][PROD]$ resalloc-maint resource-list
13569 - aarch64_01_prod_00013569_20190613_151319 pool=aarch64_01_prod tags=aarch64 status=UP
13597 - aarch64_01_prod_00013597_20190614_083418 pool=aarch64_01_prod tags=aarch64 status=UP
13594 - aarch64_02_prod_00013594_20190614_082303 pool=aarch64_02_prod tags=aarch64 status=STARTING
...

[resalloc@copr-be ~][PROD]$ resalloc-maint ticket-list
879 - state=OPEN tags=aarch64 resource=aarch64_01_prod_00013569_20190613_151319
918 - state=OPEN tags=aarch64 resource=aarch64_01_prod_00013608_20190614_135536
904 - state=OPEN tags=aarch64 resource=aarch64_02_prod_00013594_20190614_082303
919 - state=OPEN tags=aarch64
...
....

Be careful when there's some resource in `STARTING` state. If that's so,
check
`/usr/bin/tail -F -n +0 /var/log/resallocserver/hooks/013594_alloc`.
Copr takes tickets from resalloc server; and if the resources fail to
spawn, the ticket numbers are not assigned with appropriately tagged
resource for a long time.

If that happens (it shouldn't) and there's some inconsistency between
resalloc's database and the actual status on aarch64 hypervisors
(`ssh copr@virthost-aarch64-os0{1,2}.fedorainfracloud.org`) - use
`virsh` there to introspect theirs statuses - use
`resalloc-maint resource-delete`, `resalloc ticket-close` or `psql`
commands to fix-up the resalloc's DB.

=== Backend Troubleshoting

Information about status of Copr backend services:

....
systemctl status copr-backend*.service
....

Utilization of workers:

....
ps axf
....

Worker process change $0 to list which task they are working on and on
which builder.

To list which VM builders are tracked by copr-vmm service:

....
/usr/bin/copr_get_vm_info.py
....

=== Appstream builder troubleshoting

Appstream builder is painfully slow when running on a repository with a
huge amount of packages. See
https://github.com/hughsie/appstream-glib/issues/301 . You might need to
disable it for some projects:

....
$ ssh root@copr-be.cloud.fedoraproject.org
$ cd /var/lib/copr/public_html/results/<owner>/<project>/
$ touch .disable-appstream
# You should probably also delete existing appstream data because
# they might be obsolete
$ rm -rf ./appdata
....

=== Backend action queue issues

First check the _number of not-yet-processed actions_. If that
number isn't equal to zero, and is not decrementing relatively fast (say
single action takes longer than 30s) -- there might be some problem.
Logs for the action dispatcher can be found in:

....
/var/log/copr-backend/action_dispatcher.log
....

Check if there's no stucked process under `Action dispatch` parent
process in `pstree -a copr` output.

== Deploy information

Using playbooks and rbac:

....
$ sudo rbac-playbook groups/copr-backend.yml
$ sudo rbac-playbook groups/copr-frontend-cloud.yml
$ sudo rbac-playbook groups/copr-keygen.yml
$ sudo rbac-playbook groups/copr-dist-git.yml
....

The
https://pagure.io/copr/copr/blob/main/f/copr-setup.txt[copr-setup.txt]
manual is severely outdated, but there is
no up-to-date alternative. We should extract useful information from it
and put it here in the SOP or into
https://docs.pagure.org/copr.copr/maintenance_documentation.html and
then throw the _copr-setup.txt_ away.

On backend should run copr-backend service (which spawns several
processes). Backend spawns VM from Fedora Cloud. You could not login to
those machines directly. You have to:

....
$ ssh root@copr-be.cloud.fedoraproject.org
$ su - copr
$ copr_get_vm_info.py
# find IP address of the VM that you want
$ ssh root@172.16.3.3
....

Instances can be easily terminated in
https://fedorainfracloud.org/dashboard

=== Order of start up

When reprovision you should start first: copr-keygen and copr-dist-git
machines (in any order). Then you can start copr-be. Well you can start
it sooner, but make sure that copr-* services are stopped.

Copr-fe machine is completly independent and can be start any time. If
backend is stopped it will just queue jobs.

== Logs

=== Backend

* /var/log/copr-backend/action_dispatcher.log
* /var/log/copr-backend/actions.log
* /var/log/copr-backend/backend.log
* /var/log/copr-backend/build_dispatcher.log
* /var/log/copr-backend/logger.log
* /var/log/copr-backend/spawner.log
* /var/log/copr-backend/terminator.log
* /var/log/copr-backend/vmm.log
* /var/log/copr-backend/worker.log

And several logs for non-essential features such as
copr_prune_results.log, hitcounter.log, cleanup_vms.log, that you
shouldn't be worried with.

=== Frontend

* /var/log/copr-frontend/frontend.log
* /var/log/httpd/access_log
* /var/log/httpd/error_log

=== Keygen

* /var/log/copr-keygen/main.log

=== Dist-git

* /var/log/copr-dist-git/main.log
* /var/log/httpd/access_log
* /var/log/httpd/error_log

== Services

=== Backend

* copr-backend
** copr-backend-action
** copr-backend-build
** copr-backend-log
** copr-backend-vmm
* redis
* lighttpd

All the _copr-backend-*.service_ are configured to be a part
of the _copr-backend.service_ so e.g. in case of restarting
all of them, just restart the _copr-backend.service_.

=== Frontend

* httpd
* postgresql

=== Keygen

* signd

=== Dist-git

* httpd
* copr-dist-git

== PPC64LE Builders

Builders for PPC64 are located at rh-power2.fit.vutbr.cz and anyone with
access to buildsys ssh key can get there using keys as::
  msuchy@rh-power2.fit.vutbr.cz

There are commands:
....
$ ls bin/
destroy-all.sh reinit-vm26.sh
reinit-vm28.sh virsh-destroy-vm26.sh virsh-destroy-vm28.sh
virsh-start-vm26.sh virsh-start-vm28.sh get-one-vm.sh reinit-vm27.sh
reinit-vm29.sh virsh-destroy-vm27.sh virsh-destroy-vm29.sh
virsh-start-vm27.sh virsh-start-vm29.sh
....

`destroy-all.sh` destroy all VM and reinit them

`reinit-vmXX.sh` copy VM image from template

`virsh-destroy-vmXX.sh` destroys VM

`virsh-start-vmXX.sh` starts VM

`get-one-vm.sh` start one VM and return its IP - this is used in Copr playbooks.

In case of big queue of PPC64 tasks simply call `bin/destroy-all.sh` and
it will destroy stuck VM and copr backend will spawn new VM.

== Ports opened for public

Frontend:

[width="86%",cols="13%,17%,16%,54%",options="header",]
|===
|Port |Protocol |Service |Reason
|22 |TCP |ssh |Remote control
|80 |TCP |http |Serving Copr frontend website
|443 |TCP |https |^^
|===

Backend:

[width="86%",cols="13%,17%,16%,54%",options="header",]
|===
|Port |Protocol |Service |Reason
|22 |TCP |ssh |Remote control
|80 |TCP |http |Serving build results and repos
|443 |TCP |https |^^
|===

Distgit:

[width="86%",cols="13%,17%,16%,54%",options="header",]
|===
|Port |Protocol |Service |Reason
|22 |TCP |ssh |Remote control
|80 |TCP |http |Serving cgit interface
|443 |TCP |https |^^
|===

Keygen:

[width="86%",cols="13%,17%,16%,54%",options="header",]
|===
|Port |Protocol |Service |Reason
|22 |TCP |ssh |Remote control
|===

== Resources justification

Copr currently uses the following resources.

=== Frontend

* RAM: 2G (out of 4G) and some swap
* CPU: 2 cores (3400mhz) with load 0.92, 0.68, 0.65

Most of the memory is eaten by PostgreSQL, followed by Apache. The CPU
usage is also mainly used for those two services but in the reversed
order.

I don't think we can settle down with any instance that provides less
than (2G RAM, obviously), but ideally, we need 3G+. 2-core CPU is good
enough.

* Disk space: 17G for system and 8G for _pgsqldb_ directory

If needed, we are able to clean-up the database directory of old dumps
and backups and get down to around 4G disk space.

=== Backend

* RAM: 5G (out of 16G)
* CPU: 8 cores (3400MHz) with load 4.09, 4.55, 4.24

Backend takes care of spinning-up builders and running ansible playbooks
on them, running _createrepo_c_ (on big repositories) and so
on. Copr utilizes two queues, one for builds, which are delegated to
OpenStack builders, and action queue. Actions, however, are processed
directly by the backend, so it can spike our load up. We would ideally
like to have the same computing power that we have now. Maybe we can go
lower than 16G RAM, possibly down to 12G RAM.

* Disk space: 30G for the system, 5.6T (out of 6.8T) for build results

Currently, we have 1.3T of backup data, that is going to be deleted
soon, but nevertheless, we cannot go any lower on storage. Disk space is
a long-term issue for us and we need to do a lot of compromises and
settling down just to survive our daily increase (which is around 10G of
new data). Many features are blocked by not having enough storage. We
cannot go any lower and also we cannot go much longer with the current
storage.

=== Distgit

* RAM: ~270M (out of 4G), but climbs to ~1G when busy
* CPU: 2 cores (3400MHz) with load 1.35, 1.00, 0.53

Personally, I wouldn't downgrade the machine too much. Possibly we can
live with 3G ram, but I wouldn't go any lower.

* Disk space: 7G for system, 1.3T dist-git data

We currently employ a lot of aggressive cleaning strategies on our
distgit data, so we can't go any lower than what we have.

=== Keygen

* RAM: ~150M (out of 2G)
* CPU: 1 core (3400MHz) with load 0.10, 0.31, 0.25

We are basically running just _signd_ and
_httpd_ here, both with minimal resource requirements. The
memory usage is topped by _systemd-journald_.

* Disk space: 7G for system and ~500M (out of ~700M) for GPG keys

We are slowly pushing the GPG keys storage to its limit, so in the case
of migrating copr-keygen somewhere, we would like to scale-up it to at
least 1G.
Added the infra SOPs ported to asciidoc. 2021-07-26 10:39:47 +02:00			`= Copr`

			`Copr is build system for 3rd party packages.`

			`Frontend:::`
			`* http://copr.fedorainfracloud.org/`
			`Backend:::`
			`* http://copr-be.cloud.fedoraproject.org/`
			`Package signer:::`
			`* copr-keygen.cloud.fedoraproject.org`
			`Dist-git::`
			`* copr-dist-git.fedorainfracloud.org`
			`Devel instances (NO NEED TO CARE ABOUT THEM, JUST THOSE ABOVE):::`
			`* http://copr-fe-dev.cloud.fedoraproject.org/`
			`* http://copr-be-dev.cloud.fedoraproject.org/`
			`* copr-keygen-dev.cloud.fedoraproject.org`
			`* copr-dist-git-dev.fedorainfracloud.org`

			`== Contact Information`

			`Owner::`
			`msuchy (mirek)`
			`Contact::`
			`#fedora-admin, #fedora-buildsys`
			`Location::`
			`Fedora Cloud`
			`Purpose::`
			`Build system`

			`== This document`

			`This document provides a condensed information allowing you to keep Copr`
			`alive and working. For more sofisticated business processes, please see`
			`https://docs.pagure.org/copr.copr/maintenance_documentation.html`

			`== TROUBLESHOOTING`

			`Almost every problem with Copr is due problem with spawning builder VMs,`
			`or with processing action queue on backend.`

			`=== VM spawning/termination problems`

			`Try to restart copr-backend service:`

			`....`
			`$ ssh root@copr-be.cloud.fedoraproject.org`
			`$ systemctl restart copr-backend`
			`....`

			`If this doesn't solve the problem, try to follow logs for some clues:`

			`....`
			`$ tail -f /var/log/copr-backend/{vmm,spawner,terminator}.log`
			`....`

			`As the last resort option, you can terminate all builders and let`
			`copr-backend to throw all information about them. This action will`
			`obviously interrupt all running builds and reschedule them:`

			`....`
			`$ ssh root@copr-be.cloud.fedoraproject.org`
			`$ systemctl stop copr-backend`
			`$ cleanup_vm_nova.py`
			`$ redis-cli`
			`> FLUSHALL`
			`$ systemctl start copr-backend`
			`....`

			`Sometimes OpenStack can not handle spawning too much VMs at the same`
Review copr SOP Signed-off-by: Michal Konečný <mkonecny@redhat.com> 2021-08-18 12:49:35 +02:00			`time. So it is safer to edit on _copr-be.cloud.fedoraproject.org_:`
Added the infra SOPs ported to asciidoc. 2021-07-26 10:39:47 +02:00
			`....`
			`vi /etc/copr/copr-be.conf`
			`....`

			`and change:`

			`....`
			`group0_max_workers=12`
			`....`

			`to "6". Start copr-backend service and some time later increase it to`
			`original value. Copr automaticaly detect change in script and increase`
			`number of workers.`

			`The set of aarch64 VMs isn't maintained by OpenStack, but by Copr's`
			`backend itself. Steps to diagnose:`

			`....`
			`$ ssh root@copr-be.cloud.fedoraproject.org`
			`[root@copr-be ~][PROD]# systemctl status resalloc`
			`● resalloc.service - Resource allocator server`
			`...`

			`[root@copr-be ~][PROD]# less /var/log/resallocserver/main.log`

			`[root@copr-be ~][PROD]# su - resalloc`

			`[resalloc@copr-be ~][PROD]$ resalloc-maint resource-list`
			`13569 - aarch64_01_prod_00013569_20190613_151319 pool=aarch64_01_prod tags=aarch64 status=UP`
			`13597 - aarch64_01_prod_00013597_20190614_083418 pool=aarch64_01_prod tags=aarch64 status=UP`
			`13594 - aarch64_02_prod_00013594_20190614_082303 pool=aarch64_02_prod tags=aarch64 status=STARTING`
			`...`

			`[resalloc@copr-be ~][PROD]$ resalloc-maint ticket-list`
			`879 - state=OPEN tags=aarch64 resource=aarch64_01_prod_00013569_20190613_151319`
			`918 - state=OPEN tags=aarch64 resource=aarch64_01_prod_00013608_20190614_135536`
			`904 - state=OPEN tags=aarch64 resource=aarch64_02_prod_00013594_20190614_082303`
			`919 - state=OPEN tags=aarch64`
			`...`
			`....`

			Be careful when there's some resource in `STARTING` state. If that's so,
			`check`
			`/usr/bin/tail -F -n +0 /var/log/resallocserver/hooks/013594_alloc`.
			`Copr takes tickets from resalloc server; and if the resources fail to`
			`spawn, the ticket numbers are not assigned with appropriately tagged`
			`resource for a long time.`

			`If that happens (it shouldn't) and there's some inconsistency between`
			`resalloc's database and the actual status on aarch64 hypervisors`
			(`ssh copr@virthost-aarch64-os0{1,2}.fedorainfracloud.org`) - use
			`virsh` there to introspect theirs statuses - use
			`resalloc-maint resource-delete`, `resalloc ticket-close` or `psql`
			`commands to fix-up the resalloc's DB.`

			`=== Backend Troubleshoting`

			`Information about status of Copr backend services:`

			`....`
			`systemctl status copr-backend*.service`
			`....`

			`Utilization of workers:`

			`....`
			`ps axf`
			`....`

			`Worker process change $0 to list which task they are working on and on`
			`which builder.`

			`To list which VM builders are tracked by copr-vmm service:`

			`....`
			`/usr/bin/copr_get_vm_info.py`
			`....`

			`=== Appstream builder troubleshoting`

			`Appstream builder is painfully slow when running on a repository with a`
			`huge amount of packages. See`
			`https://github.com/hughsie/appstream-glib/issues/301 . You might need to`
			`disable it for some projects:`

			`....`
			`$ ssh root@copr-be.cloud.fedoraproject.org`
			`$ cd /var/lib/copr/public_html/results/<owner>/<project>/`
			`$ touch .disable-appstream`
			`# You should probably also delete existing appstream data because`
			`# they might be obsolete`
			`$ rm -rf ./appdata`
			`....`

			`=== Backend action queue issues`

Review copr SOP Signed-off-by: Michal Konečný <mkonecny@redhat.com> 2021-08-18 12:49:35 +02:00			`First check the _number of not-yet-processed actions_. If that`
Added the infra SOPs ported to asciidoc. 2021-07-26 10:39:47 +02:00			`number isn't equal to zero, and is not decrementing relatively fast (say`
			`single action takes longer than 30s) -- there might be some problem.`
			`Logs for the action dispatcher can be found in:`

			`....`
			`/var/log/copr-backend/action_dispatcher.log`
			`....`

			Check if there's no stucked process under `Action dispatch` parent
			process in `pstree -a copr` output.

			`== Deploy information`

			`Using playbooks and rbac:`

			`....`
			`$ sudo rbac-playbook groups/copr-backend.yml`
			`$ sudo rbac-playbook groups/copr-frontend-cloud.yml`
			`$ sudo rbac-playbook groups/copr-keygen.yml`
			`$ sudo rbac-playbook groups/copr-dist-git.yml`
			`....`

Review copr SOP Signed-off-by: Michal Konečný <mkonecny@redhat.com> 2021-08-18 12:49:35 +02:00			`The`
			`https://pagure.io/copr/copr/blob/main/f/copr-setup.txt[copr-setup.txt]`
			`manual is severely outdated, but there is`
Added the infra SOPs ported to asciidoc. 2021-07-26 10:39:47 +02:00			`no up-to-date alternative. We should extract useful information from it`
			`and put it here in the SOP or into`
			`https://docs.pagure.org/copr.copr/maintenance_documentation.html and`
Review copr SOP Signed-off-by: Michal Konečný <mkonecny@redhat.com> 2021-08-18 12:49:35 +02:00			`then throw the _copr-setup.txt_ away.`
Added the infra SOPs ported to asciidoc. 2021-07-26 10:39:47 +02:00
			`On backend should run copr-backend service (which spawns several`
			`processes). Backend spawns VM from Fedora Cloud. You could not login to`
			`those machines directly. You have to:`

			`....`
			`$ ssh root@copr-be.cloud.fedoraproject.org`
			`$ su - copr`
			`$ copr_get_vm_info.py`
			`# find IP address of the VM that you want`
			`$ ssh root@172.16.3.3`
			`....`

			`Instances can be easily terminated in`
			`https://fedorainfracloud.org/dashboard`

			`=== Order of start up`

			`When reprovision you should start first: copr-keygen and copr-dist-git`
			`machines (in any order). Then you can start copr-be. Well you can start`
			`it sooner, but make sure that copr-* services are stopped.`

			`Copr-fe machine is completly independent and can be start any time. If`
			`backend is stopped it will just queue jobs.`

			`== Logs`

			`=== Backend`

			`* /var/log/copr-backend/action_dispatcher.log`
			`* /var/log/copr-backend/actions.log`
			`* /var/log/copr-backend/backend.log`
			`* /var/log/copr-backend/build_dispatcher.log`
			`* /var/log/copr-backend/logger.log`
			`* /var/log/copr-backend/spawner.log`
			`* /var/log/copr-backend/terminator.log`
			`* /var/log/copr-backend/vmm.log`
			`* /var/log/copr-backend/worker.log`

			`And several logs for non-essential features such as`
			`copr_prune_results.log, hitcounter.log, cleanup_vms.log, that you`
			`shouldn't be worried with.`

			`=== Frontend`

			`* /var/log/copr-frontend/frontend.log`
			`* /var/log/httpd/access_log`
			`* /var/log/httpd/error_log`

			`=== Keygen`

			`* /var/log/copr-keygen/main.log`

			`=== Dist-git`

			`* /var/log/copr-dist-git/main.log`
			`* /var/log/httpd/access_log`
			`* /var/log/httpd/error_log`

			`== Services`

			`=== Backend`

			`* copr-backend`
			`** copr-backend-action`
			`** copr-backend-build`
			`** copr-backend-log`
			`** copr-backend-vmm`
			`* redis`
			`* lighttpd`

Review copr SOP Signed-off-by: Michal Konečný <mkonecny@redhat.com> 2021-08-18 12:49:35 +02:00			`All the _copr-backend-*.service_ are configured to be a part`
			`of the _copr-backend.service_ so e.g. in case of restarting`
			`all of them, just restart the _copr-backend.service_.`
Added the infra SOPs ported to asciidoc. 2021-07-26 10:39:47 +02:00
			`=== Frontend`

			`* httpd`
			`* postgresql`

			`=== Keygen`

			`* signd`

			`=== Dist-git`

			`* httpd`
			`* copr-dist-git`

			`== PPC64LE Builders`

			`Builders for PPC64 are located at rh-power2.fit.vutbr.cz and anyone with`
			`access to buildsys ssh key can get there using keys as::`
			`msuchy@rh-power2.fit.vutbr.cz`

Review copr SOP Signed-off-by: Michal Konečný <mkonecny@redhat.com> 2021-08-18 12:49:35 +02:00			`There are commands:`
			`....`
			`$ ls bin/`
			`destroy-all.sh reinit-vm26.sh`
Added the infra SOPs ported to asciidoc. 2021-07-26 10:39:47 +02:00			`reinit-vm28.sh virsh-destroy-vm26.sh virsh-destroy-vm28.sh`
			`virsh-start-vm26.sh virsh-start-vm28.sh get-one-vm.sh reinit-vm27.sh`
			`reinit-vm29.sh virsh-destroy-vm27.sh virsh-destroy-vm29.sh`
			`virsh-start-vm27.sh virsh-start-vm29.sh`
Review copr SOP Signed-off-by: Michal Konečný <mkonecny@redhat.com> 2021-08-18 12:49:35 +02:00			`....`

			`destroy-all.sh` destroy all VM and reinit them

			`reinit-vmXX.sh` copy VM image from template

			`virsh-destroy-vmXX.sh` destroys VM

			`virsh-start-vmXX.sh` starts VM
Added the infra SOPs ported to asciidoc. 2021-07-26 10:39:47 +02:00
Review copr SOP Signed-off-by: Michal Konečný <mkonecny@redhat.com> 2021-08-18 12:49:35 +02:00			`get-one-vm.sh` start one VM and return its IP - this is used in Copr playbooks.
Added the infra SOPs ported to asciidoc. 2021-07-26 10:39:47 +02:00
Review copr SOP Signed-off-by: Michal Konečný <mkonecny@redhat.com> 2021-08-18 12:49:35 +02:00			In case of big queue of PPC64 tasks simply call `bin/destroy-all.sh` and
Added the infra SOPs ported to asciidoc. 2021-07-26 10:39:47 +02:00			`it will destroy stuck VM and copr backend will spawn new VM.`

			`== Ports opened for public`

			`Frontend:`

			`[width="86%",cols="13%,17%,16%,54%",options="header",]`
			`\|===`
			`\|Port \|Protocol \|Service \|Reason`
			`\|22 \|TCP \|ssh \|Remote control`
			`\|80 \|TCP \|http \|Serving Copr frontend website`
			`\|443 \|TCP \|https \|^^`
			`\|===`

			`Backend:`

			`[width="86%",cols="13%,17%,16%,54%",options="header",]`
			`\|===`
			`\|Port \|Protocol \|Service \|Reason`
			`\|22 \|TCP \|ssh \|Remote control`
			`\|80 \|TCP \|http \|Serving build results and repos`
			`\|443 \|TCP \|https \|^^`
			`\|===`

			`Distgit:`

			`[width="86%",cols="13%,17%,16%,54%",options="header",]`
			`\|===`
			`\|Port \|Protocol \|Service \|Reason`
			`\|22 \|TCP \|ssh \|Remote control`
			`\|80 \|TCP \|http \|Serving cgit interface`
			`\|443 \|TCP \|https \|^^`
			`\|===`

			`Keygen:`

			`[width="86%",cols="13%,17%,16%,54%",options="header",]`
			`\|===`
			`\|Port \|Protocol \|Service \|Reason`
			`\|22 \|TCP \|ssh \|Remote control`
			`\|===`

			`== Resources justification`

			`Copr currently uses the following resources.`

			`=== Frontend`

			`* RAM: 2G (out of 4G) and some swap`
			`* CPU: 2 cores (3400mhz) with load 0.92, 0.68, 0.65`

			`Most of the memory is eaten by PostgreSQL, followed by Apache. The CPU`
			`usage is also mainly used for those two services but in the reversed`
			`order.`

			`I don't think we can settle down with any instance that provides less`
			`than (2G RAM, obviously), but ideally, we need 3G+. 2-core CPU is good`
			`enough.`

Review copr SOP Signed-off-by: Michal Konečný <mkonecny@redhat.com> 2021-08-18 12:49:35 +02:00			`* Disk space: 17G for system and 8G for _pgsqldb_ directory`
Added the infra SOPs ported to asciidoc. 2021-07-26 10:39:47 +02:00
			`If needed, we are able to clean-up the database directory of old dumps`
			`and backups and get down to around 4G disk space.`

			`=== Backend`

			`* RAM: 5G (out of 16G)`
			`* CPU: 8 cores (3400MHz) with load 4.09, 4.55, 4.24`

			`Backend takes care of spinning-up builders and running ansible playbooks`
Review copr SOP Signed-off-by: Michal Konečný <mkonecny@redhat.com> 2021-08-18 12:49:35 +02:00			`on them, running _createrepo_c_ (on big repositories) and so`
Added the infra SOPs ported to asciidoc. 2021-07-26 10:39:47 +02:00			`on. Copr utilizes two queues, one for builds, which are delegated to`
			`OpenStack builders, and action queue. Actions, however, are processed`
			`directly by the backend, so it can spike our load up. We would ideally`
			`like to have the same computing power that we have now. Maybe we can go`
			`lower than 16G RAM, possibly down to 12G RAM.`

			`* Disk space: 30G for the system, 5.6T (out of 6.8T) for build results`

			`Currently, we have 1.3T of backup data, that is going to be deleted`
			`soon, but nevertheless, we cannot go any lower on storage. Disk space is`
			`a long-term issue for us and we need to do a lot of compromises and`
			`settling down just to survive our daily increase (which is around 10G of`
			`new data). Many features are blocked by not having enough storage. We`
			`cannot go any lower and also we cannot go much longer with the current`
			`storage.`

			`=== Distgit`

			`* RAM: ~270M (out of 4G), but climbs to ~1G when busy`
			`* CPU: 2 cores (3400MHz) with load 1.35, 1.00, 0.53`

			`Personally, I wouldn't downgrade the machine too much. Possibly we can`
			`live with 3G ram, but I wouldn't go any lower.`

			`* Disk space: 7G for system, 1.3T dist-git data`

			`We currently employ a lot of aggressive cleaning strategies on our`
			`distgit data, so we can't go any lower than what we have.`

			`=== Keygen`

			`* RAM: ~150M (out of 2G)`
			`* CPU: 1 core (3400MHz) with load 0.10, 0.31, 0.25`

Review copr SOP Signed-off-by: Michal Konečný <mkonecny@redhat.com> 2021-08-18 12:49:35 +02:00			`We are basically running just _signd_ and`
			`_httpd_ here, both with minimal resource requirements. The`
			`memory usage is topped by _systemd-journald_.`
Added the infra SOPs ported to asciidoc. 2021-07-26 10:39:47 +02:00
			`* Disk space: 7G for system and ~500M (out of ~700M) for GPG keys`

			`We are slowly pushing the GPG keys storage to its limit, so in the case`
			`of migrating copr-keygen somewhere, we would like to scale-up it to at`
			`least 1G.`