infra-docs-fpo/modules/sysadmin_guide/pages/odcs.adoc
Patrik Polakovič a107944a1f Remove bits about modularity from the docs
Signed-off-by: Patrik Polakovič <patrik@alphamail.org>
2023-11-20 19:51:59 +01:00

163 lines
5.8 KiB
Text

= On Demand Compose Service SOP
[NOTE]
====
The ODCS is very new and changing rapidly. We'll try to keep this up to
date as best we can.
====
The ODCS is a service generating temporary compose from Koji tag(s)
using Pungi.
== Contact Information
Owner::
Factory2 Team, Release Engineering Team, Infrastructure Team
Contact::
fedora-admin, #fedora-releng
Persons::
jkaluza, cqi, qwan, threebean
Public addresses::
* odcs.fedoraproject.org
Servers::
* odcs-frontend0[1-2].iad2.fedoraproject.org
* odcs-backend01.iad2.fedoraproject.org
Purpose::
Generate temporary compose from Koji tag(s) using Pungi.
== Description
ODCS clients submit request for a compose to _odcs.fedoraproject.org_. The
requests are submitted using `python2-odcs-client` Python module or just
using plain JSON.
The request contains all the information needed to build a compose:
* *source type*: Type of compose source, for example "tag" or "module"
* *source*: Name of Koji tag or list of modules defined by
name-stream-version.
* *packages*: List of packages to include in a compose.
* *seconds to live*: Number of seconds after which the compose is removed
from the filesystem and is marked as "removed".
* *flags*: Various flags further defining the compose - for example the
"no_deps" flag saying that the *packages* dependencies
should not be included in a compose.
The request is received by the ODCS flask app running on odcs-frontend
nodes. The frontend does input validation of the request and then adds
the compose request to database with "wait" state and sends fedmsg
message about this event. The compose request gets its unique id which
can be used by a client to query its status using frontend REST API.
The odcs-backend node then handles the compose requests in "wait" state
and starts generating the compose using the Pungi tool. It does so by
generating all the configuration files for Pungi and executing "pungi"
executable. Backend also changes the compose request status to
"generating" and sends fedmsg message about this event.
The number of concurrent pungi processes can be set using the
_num_concurrent_pungi_ variable in ODCS configuration file.
The output directory for a compose is shared between frontend and
backend node. Once the compose is generated, the backend changes the
status of compose request to "done" and again sends fedmsg message about
this event.
The shared directory with a compose is available using httpd on the
frontend node and ODCS client can access the generated compose. By
default this is on https://odcs.fedoraproject.org/composes/ URL.
If the compose generation goes wrong, the backend changes the state of
the compose request to "failed" and again sends fedmsg message about
this event. The "failed" compose is still available for
*seconds to live* time in the shared directory for further
examination of pungi logs if needed.
After the *seconds to live* time, the backend node removes
the compose from filesystem and changes the state of compose request to
"removed".
If there are compose requests for the very same composes, the ODCS will
reuse older compose instead of generating new one and points the new
compose to older one.
The "removed" compose can be renewed by a client to generate the same
compose as in the past. The *seconds to live* attribute of a
compose can be extended by a client when needed.
== Observing ODCS Behavior
There is currently no command line tool to query ODCS, but ODCS provides
REST API which can be used to observe the ODCS behavior. This is
available on https://odcs.fedoraproject.org/api/1/composes.
The API can be filtered by following keys entered as HTTP GET variables:
* owner
* source_type
* source
* state
It is also possible to see all the current composes in the compose
output directory, which is available on the frontend on
https://odcs.fedoraproject.org/composes.
== Removing compose before its expiration time
Members of FAS group defined in the _admins_ section of ODCS
configuration can remove any compose by sending DELETE request to
following URL:
https://odcs.fedoraproject.org/api/1/composes/$compose_id
== Logs
The frontend logs are on odcs-frontend0[1-2] in
`/var/log/httpd/error_log` or `/var/log/httpd/ssl_error_log`.
The backend logs are on odcs-backend01. Look in the journal for the
_odcs-backend_ service.
== Upgrading
The package in question is _odcs-server_. Please use the
https://pagure.io/fedora-infra/ansible/blob/main/f/playbooks/manual/upgrade/odcs.yml[playbooks/manual/upgrade/odcs.yml]
playbook.
== Things that could go wrong
=== Not enough space on shared volume
In case there are too many composes, member of FAS group defined in the
ODCS configuration file _admins_ section should:
* Remove the oldest composes to get some free space immediatelly. List
of such composes can be found on
https://odcs.fedoraproject.org/composes/ by sorting by Last modified
fields.
* Decrease the *max_seconds_to_live* in ODCS configuration
file.
=== The OIDC token expires
This will cause the cron job to fail on the backend. Tokens have a lifetime of one year, and should be therefore periodically regenerated.
To regenerate the token, run the following command in the ansible repo:
....
scripts/generate-oidc-token odcs-prod -e 365 -u releng-odcs@service -s https://id.fedoraproject.org/scope/groups -s https://pagure.io/odcs/new-compose -s https://pagure.io/odcs/renew-compose -s https://pagure.io/odcs/delete-compose
....
Follow the instructions given by the script: run the SQL command on the Ipsilon database server:
....
ssh db-fas01.iad2.fedoraproject.org
sudo -u postgres -i ipsilon
ipsilon=# BEGIN;
[...]
ipsilon=# COMMIT;
....
Save the value of the token generated by the script in the ansible-private repo under `files/releng/production/releng-odcs-oidc-token`.
Deploy the change by running the `playbooks/groups/odcs.yml` playbook.