arc/docs/fcas/creation_workflow.rst
Akashdeep Dhar 511e6323a8 Correct minor header mistake on creation_workflow.rst
Signed-off-by: Akashdeep Dhar <akashdeep.dhar@gmail.com>
2023-05-23 12:44:42 +05:30

144 lines
7.8 KiB
ReStructuredText

.. _creation_workflow:
Existing solution
====
The existing solution for the problem statement came to address the previous
ticket which can be found
`here <https://pagure.io/fedora-infrastructure/issue/11149>`_. The project
repository is located at
`Fedora User Activity Statistics <https://github.com/t0xic0der/fuas>`_.
How does it work?
----
The project consists of two main functional units: the ``namelist`` unit and
the ``actvlist`` unit. The ``namelist`` unit facilitates the retrieval of
usernames from the FASJSON service by the service runner, while the
``actvlist`` unit verifies the activity status of the names listed in the
aforementioned file through Datagrepper. Both units are executed as automated
cronjobs, scheduled to run at specific intervals. This ensures that the service
maintains an up-to-date list of usernames and a count of active users. The
service's behavior is controlled by a configurable file, allowing
administrators to tailor it according to their specific needs.
**Usage** ::
Usage: fuas [OPTIONS] COMMAND [ARGS]...
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
activity Fetch a list of active usernames from Datagrepper
namelist Fetch a list of usernames on the Fedora Account System
Configuration file
----
The sample configuration file can be found
`here <https://github.com/t0xic0der/fuas/blob/main/fuas/conf.py>`_ that can be
made a copy of and edited by the users to tailor-fit the service according to
their requirements.
The following is an exhaustive list of customizable variables. These variables
are intended to be customized by the users.
1. ``daysqant`` (Default - 90) - Number of days for which the activity record is requested for
2. ``pagerows`` (Default - 1) - Number of rows to be displayed on a page when requesting data from Datagrepper
3. ``minactqt`` (Default - 5) - Minimum number of activities to be considered to count as user as "active"
4. ``services`` (Default - ["Pagure"]) - Services to probe into for activity records pertaining to the users
5. ``jsonloca`` (Default - "https://fasjson.fedoraproject.org") - Location where the FASJSON service is hosted
6. ``dgprlink`` (Default - "https://apps.fedoraproject.org/datagrepper/v2/search") - Location where the Datagrepper service is hosted
7. ``useriden`` (Default - "t0xic0der@FEDORAPROJECT.ORG") - User to masquerade as for probing into the FASJSON records
8. ``listlink`` (Default - "https://raw.githubusercontent.com/t0xic0der/fuas/main/data/namefile") - Location where the list of available users is present
9. ``namefile`` (Default - "/var/tmp/namefile") - Location where the list of available users is to be stored locally
10. ``actvfile`` (Default - "/var/tmp/actvfile") - Location where the list of active users is to be stored locally
11. ``acqtfile`` (Default - "/var/tmp/acqtfile") - Location where the count of active users is to be stored locally
The config file also consists of a list of computing variables of the global
scope. These variables are intended only for developers.
1. ``dfltsize`` (Default - 1000) - Size of iterable pages for all entities present in the FASJSON service
The ``namelist`` unit
----
The service unit takes up the configuration variables like ``username`` and
``password`` for the user to masquerade as while probing into the FASJSON
service, ``jsonloca`` for getting the location where the FASJSON service is
hosted, ``namefile`` for storing the list of usernames received. Using a
session created with the masquerading user, the unit queries for the list of
all available users to FASJSON service and stores them into the file specified
in the configuration variable.
The aforementioned session is created by using the ``krb5`` packages and the
``username`` and ``password`` are passed in the standard input of the console.
While this works for a smaller scale run where the said service unit is run in
ephemeral containers, this approach is highly discouraged and instead, a
session created using a keytab file is recommended in its stead. Also, a set of
workarounds must be placed in the default ``krb5`` configuration file to allow
for seamless authentication.
As this is a unit that runs for a longer period of time and makes queries that
are performance intensive in nature, it is strongly recommended to run this
unit no more than once or twice in a span of 24 hours. Also, it is essential to
ensure that the internet connection is reliable and the devices are not turned
off while the long-running service unit is in progress. This is owing to the
fact that the service unit is non-resumable in nature and writes to disk only
when the fetch is complete.
To ensure a proper running of the service unit without any possible
interruptions, the service unit is run as a workflow on
`GitHub Actions <https://github.com/features/actions>`_. The workflow file can
be found here at
`https://github.com/t0xic0der/fuas/blob/main/.github/workflows/main.yml <https://github.com/t0xic0der/fuas/blob/main/.github/workflows/main.yml>`_
that helps to set up the environment for the service unit to run, fetches the
list of usernames and then commits them back to the same repository - making
that list publicly available for consumption. The time limit for running a
workflow on GitHub Actions is, however, 6 hours and that might, in some cases,
lead to timeouts and incomplete runs.
The ``actvlist`` unit
----
The service unit takes up the configuration variables like ``listlink`` for
locating the file containing the list of all users registered on Fedora
Accounts System, ``daysqant`` for limiting the activity queries to under a said
number of days, ``minactqt`` for getting the bare minimum amount of activities
for a user to be counted as "active", ``services`` for looking into their
records for activities, ``dgprlink`` for getting the location where the
Datagrepper service is hosted, ``actvfile`` and ``acqtfile`` for storing the
names as well as counts of the active users respectively.
The service unit fetches the list of users from the aforementioned
configuration variables and iterates through them to find the activities
pertaining to the user in question. The period limit is appropriately set and
if the count of activities under the said period comes out to be greater than
or equal to the minimum number of activities decided - that user is considered
to be "active". Their username gets added to the list of all active users and
the count of active users is incremented accordingly. Both of these are stored
in the files specified in the configuration variables.
As this is a unit that runs for a longer period of time and makes queries that
are performance intensive in nature, it is strongly recommended to run this
unit no more than once or twice in a span of 24 hours. Also, it is essential to
ensure that the internet connection is reliable and the devices are not turned
off while the long-running service unit is in progress. This is owing to the
fact that the service unit is non-resumable in nature and writes to disk only
when the fetch is complete. In an average, this service unit takes at least 4-6
times more time than the former service unit.
To ensure a proper running of the service unit without any possible hiccups,
the service unit is run as a workflow on
`GitHub Actions <https://github.com/features/actions>`_. The workflow file can
be found here at
`https://github.com/t0xic0der/fuas/blob/main/.github/workflows/actv.yml <https://github.com/t0xic0der/fuas/blob/main/.github/workflows/actv.yml>`_
than helps to set up the environment for the service unit to run, fetches the
list of active usernames as well as the count and then commits them back to the
repository - making that list as well as the count publicly available for
consumption. The time limit for running a workflow on GitHub Actions is,
however, 6 hours and that might, in some cases, lead to timeouts and incomplete
runs.