arc/docs/fcas/creation_workflow.rst
Ryan Lerch ba720c3d77 fix parsing errors and sphinx warnings
Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2023-11-20 13:04:34 +00:00

146 lines
7.8 KiB
ReStructuredText

.. _creation_workflow:
Existing solution
=================
The existing solution for the problem statement came to address the previous ticket
which can be found `here <https://pagure.io/fedora-infrastructure/issue/11149>`__. The
project repository is located at `Fedora User Activity Statistics
<https://github.com/t0xic0der/fuas>`_.
How does it work?
-----------------
The project consists of two main functional units: the ``namelist`` unit and the
``actvlist`` unit. The ``namelist`` unit facilitates the retrieval of usernames from the
FASJSON service by the service runner, while the ``actvlist`` unit verifies the activity
status of the names listed in the aforementioned file through Datagrepper. Both units
are executed as automated cronjobs, scheduled to run at specific intervals. This ensures
that the service maintains an up-to-date list of usernames and a count of active users.
The service's behavior is controlled by a configurable file, allowing administrators to
tailor it according to their specific needs.
**Usage**
.. code-block::
Usage: fuas [OPTIONS] COMMAND [ARGS]...
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
activity Fetch a list of active usernames from Datagrepper
namelist Fetch a list of usernames on the Fedora Account System
Configuration file
------------------
The sample configuration file can be found `here
<https://github.com/t0xic0der/fuas/blob/main/fuas/conf.py>`__ that can be made a copy of
and edited by the users to tailor-fit the service according to their requirements.
The following is an exhaustive list of customizable variables. These variables are
intended to be customized by the users.
1. ``daysqant`` (Default - 90) - Number of days for which the activity record is
requested for
2. ``pagerows`` (Default - 1) - Number of rows to be displayed on a page when requesting
data from Datagrepper
3. ``minactqt`` (Default - 5) - Minimum number of activities to be considered to count
as user as "active"
4. ``services`` (Default - ["Pagure"]) - Services to probe into for activity records
pertaining to the users
5. ``jsonloca`` (Default - "https://fasjson.fedoraproject.org") - Location where the
FASJSON service is hosted
6. ``dgprlink`` (Default - "https://apps.fedoraproject.org/datagrepper/v2/search") -
Location where the Datagrepper service is hosted
7. ``useriden`` (Default - "t0xic0der@FEDORAPROJECT.ORG") - User to masquerade as for
probing into the FASJSON records
8. ``listlink`` (Default -
"https://raw.githubusercontent.com/t0xic0der/fuas/main/data/namefile") - Location
where the list of available users is present
9. ``namefile`` (Default - "/var/tmp/namefile") - Location where the list of available
users is to be stored locally
10. ``actvfile`` (Default - "/var/tmp/actvfile") - Location where the list of active
users is to be stored locally
11. ``acqtfile`` (Default - "/var/tmp/acqtfile") - Location where the count of active
users is to be stored locally
The config file also consists of a list of computing variables of the global scope.
These variables are intended only for developers.
1. ``dfltsize`` (Default - 1000) - Size of iterable pages for all entities present in
the FASJSON service
The ``namelist`` unit
---------------------
The service unit takes up the configuration variables like ``username`` and ``password``
for the user to masquerade as while probing into the FASJSON service, ``jsonloca`` for
getting the location where the FASJSON service is hosted, ``namefile`` for storing the
list of usernames received. Using a session created with the masquerading user, the unit
queries for the list of all available users to FASJSON service and stores them into the
file specified in the configuration variable.
The aforementioned session is created by using the ``krb5`` packages and the
``username`` and ``password`` are passed in the standard input of the console. While
this works for a smaller scale run where the said service unit is run in ephemeral
containers, this approach is highly discouraged and instead, a session created using a
keytab file is recommended in its stead. Also, a set of workarounds must be placed in
the default ``krb5`` configuration file to allow for seamless authentication.
As this is a unit that runs for a longer period of time and makes queries that are
performance intensive in nature, it is strongly recommended to run this unit no more
than once or twice in a span of 24 hours. Also, it is essential to ensure that the
internet connection is reliable and the devices are not turned off while the
long-running service unit is in progress. This is owing to the fact that the service
unit is non-resumable in nature and writes to disk only when the fetch is complete.
To ensure a proper running of the service unit without any possible interruptions, the
service unit is run as a workflow on `GitHub Actions
<https://github.com/features/actions>`_. The workflow file can be found here at
https://github.com/t0xic0der/fuas/blob/main/.github/workflows/main.yml that helps to set
up the environment for the service unit to run, fetches the list of usernames and then
commits them back to the same repository - making that list publicly available for
consumption. The time limit for running a workflow on GitHub Actions is, however, 6
hours and that might, in some cases, lead to timeouts and incomplete runs.
The ``actvlist`` unit
---------------------
The service unit takes up the configuration variables like ``listlink`` for locating the
file containing the list of all users registered on Fedora Accounts System, ``daysqant``
for limiting the activity queries to under a said number of days, ``minactqt`` for
getting the bare minimum amount of activities for a user to be counted as "active",
``services`` for looking into their records for activities, ``dgprlink`` for getting the
location where the Datagrepper service is hosted, ``actvfile`` and ``acqtfile`` for
storing the names as well as counts of the active users respectively.
The service unit fetches the list of users from the aforementioned configuration
variables and iterates through them to find the activities pertaining to the user in
question. The period limit is appropriately set and if the count of activities under the
said period comes out to be greater than or equal to the minimum number of activities
decided - that user is considered to be "active". Their username gets added to the list
of all active users and the count of active users is incremented accordingly. Both of
these are stored in the files specified in the configuration variables.
As this is a unit that runs for a longer period of time and makes queries that are
performance intensive in nature, it is strongly recommended to run this unit no more
than once or twice in a span of 24 hours. Also, it is essential to ensure that the
internet connection is reliable and the devices are not turned off while the
long-running service unit is in progress. This is owing to the fact that the service
unit is non-resumable in nature and writes to disk only when the fetch is complete. In
an average, this service unit takes at least 4-6 times more time than the former service
unit.
To ensure a proper running of the service unit without any possible hiccups, the service
unit is run as a workflow on `GitHub Actions <https://github.com/features/actions>`_.
The workflow file can be found here at
https://github.com/t0xic0der/fuas/blob/main/.github/workflows/actv.yml than helps to set
up the environment for the service unit to run, fetches the list of active usernames as
well as the count and then commits them back to the repository - making that list as
well as the count publicly available for consumption. The time limit for running a
workflow on GitHub Actions is, however, 6 hours and that might, in some cases, lead to
timeouts and incomplete runs.