Add working details about the prototype created

Signed-off-by: Akashdeep Dhar <akashdeep.dhar@gmail.com>
This commit is contained in:
Akashdeep Dhar 2023-05-11 11:38:48 +05:30 committed by t0xic0der
parent 3ea2c45745
commit 54bd8c8eea

View file

@ -0,0 +1,143 @@
.. _creation_workflow:
Existing solution
====
The existing solution for the problem statement came to address the previous
ticket which can be found
`here <https://pagure.io/fedora-infrastructure/issue/11149>`_. The project
repository is located at
`Fedora User Activity Statistics <https://github.com/t0xic0der/fuas>`_.
How does it work?
----
There are essentially two functional units in the project - the ``namelist`` that
lets the service runner fetch the usernames from the FASJSON service on a file
and the ``actvlist`` that checks the activity for the names present in the
aforementioned file on Datagrepper. These are run as automated cronjobs to be
run once every after a certain period of time to obtain both an updated list
of usernames as well as an updated count of active users. The functioning of
the service is governed by a configuration file that can be edited to suit the
requirements of those running the service.
**Usage** ::
Usage: fuas [OPTIONS] COMMAND [ARGS]...
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
activity Fetch a list of active usernames from Datagrepper
namelist Fetch a list of usernames on the Fedora Account System
Configuration file
^^^^
The sample configuration file can be found
`here <https://github.com/t0xic0der/fuas/blob/main/fuas/conf.py>`_ that can be
made a copy of and edited by the users to tailor-fit the service according to
their requirements.
The following is an exhaustive list of customizable variables. These variables
are intended to be customized by the users.
1. ``daysqant`` (Default - 90) - Number of days for which the activity record is requested for
2. ``pagerows`` (Default - 1) - Number of rows to be displayed on a page when requesting data from Datagrepper
3. ``minactqt`` (Default - 5) - Minimum number of activities to be considered to count as user as "active"
4. ``services`` (Default - ["Pagure"]) - Services to probe into for activity records pertaining to the users
5. ``jsonloca`` (Default - "https://fasjson.fedoraproject.org") - Location where the FASJSON service is hosted
6. ``dgprlink`` (Default - "https://apps.fedoraproject.org/datagrepper/v2/search") - Location where the Datagrepper service is hosted
7. ``useriden`` (Default - "t0xic0der@FEDORAPROJECT.ORG") - User to masquerade as for probing into the FASJSON records
8. ``listlink`` (Default - "https://raw.githubusercontent.com/t0xic0der/fuas/main/data/namefile") - Location where the list of available users is present
9. ``namefile`` (Default - "/var/tmp/namefile") - Location where the list of available users is to be stored locally
10. ``actvfile`` (Default - "/var/tmp/actvfile") - Location where the list of active users is to be stored locally
11. ``acqtfile`` (Default - "/var/tmp/acqtfile") - Location where the count of active users is to be stored locally
The config file also consists of a list of computing variables of the global
scope. These variables are intended only for developers.
1. ``dfltsize`` (Default - 1000) - Size of iterable pages for all entities present in the FASJSON service
The ``namelist`` unit
^^^^
The service unit takes up the configuration variables like ``username`` and
``password`` for the user to masquerade as while probing into the FASJSON
service, ``jsonloca`` for getting the location where the FASJSON service is
hosted, ``namefile`` for storing the list of usernames received. Using a
session created with the masquerading user, the unit queries for the list of
all available users to FASJSON service and stores them into the file specified
in the configuration variable.
The aforementioned session is created by using the ``krb5`` packages and the
``username`` and ``password`` are passed in the standard input of the console.
While this works for a smaller scale run where the said service unit is run in
ephemeral containers, this approach is highly discouraged and instead, a
session created using a keytab file is recommended in its stead. Also, a set of
workarounds must be placed in the default ``krb5`` configuration file to allow
for seamless authentication.
As this is a unit that runs for a longer period of time and makes queries that
are performance intensive in nature, it is strongly recommended to run this
unit no more than once or twice in a span of 24 hours. Also, it is essential to
ensure that the internet connection is reliable and the devices are not turned
off while the long-running service unit is in progress. This is owing to the
fact that the service unit is non-resumable in nature and writes to disk only
when the fetch is complete.
To ensure a proper running of the service unit without any possible
interruptions, the service unit is run as a workflow on
`GitHub Actions <https://github.com/features/actions>`_. The workflow file can
be found here at
`https://github.com/t0xic0der/fuas/blob/main/.github/workflows/main.yml <https://github.com/t0xic0der/fuas/blob/main/.github/workflows/main.yml>`_
that helps to set up the environment for the service unit to run, fetches the
list of usernames and then commits them back to the same repository - making
that list publicly available for consumption. The time limit for running a
workflow on GitHub Actions is, however, 6 hours and that might, in some cases,
lead to timeouts and incomplete runs.
The ``actvlist`` unit
^^^^
The service unit takes up the configuration variables like ``listlink`` for
locating the file containing the list of all users registered on Fedora
Accounts System, ``daysqant`` for limiting the activity queries to under a said
number of days, ``minactqt`` for getting the bare minimum amount of activities
for a user to be counted as "active", ``services`` for looking into their
records for activities, ``dgprlink`` for getting the location where the
Datagrepper service is hosted, ``actvfile`` and ``acqtfile`` for storing the
names as well as counts of the active users respectively.
The service unit fetches the list of users from the aforementioned
configuration variables and iterates through them to find the activities
pertaining to the user in question. The period limit is appropriately set and
if the count of activities under the said period comes out to be greater than
or equal to the minimum number of activities decided - that user is considered
to be "active". Their username gets added to the list of all active users and
the count of active users is incremented accordingly. Both of these are stored
in the files specified in the configuration variables.
As this is a unit that runs for a longer period of time and makes queries that
are performance intensive in nature, it is strongly recommended to run this
unit no more than once or twice in a span of 24 hours. Also, it is essential to
ensure that the internet connection is reliable and the devices are not turned
off while the long-running service unit is in progress. This is owing to the
fact that the service unit is non-resumable in nature and writes to disk only
when the fetch is complete. In an average, this service unit takes at least 4-6
times more time than the former service unit.
To ensure a proper running of the service unit without any possible hiccups,
the service unit is run as a workflow on
`GitHub Actions <https://github.com/features/actions>`_. The workflow file can
be found here at
`https://github.com/t0xic0der/fuas/blob/main/.github/workflows/actv.yml <https://github.com/t0xic0der/fuas/blob/main/.github/workflows/actv.yml>`_
than helps to set up the environment for the service unit to run, fetches the
list of active usernames as well as the count and then commits them back to the
repository - making that list as well as the count publicly available for
consumption. The time limit for running a workflow on GitHub Actions is,
however, 6 hours and that might, in some cases, lead to timeouts and incomplete
runs.