Add working details about the prototype created
Signed-off-by: Akashdeep Dhar <akashdeep.dhar@gmail.com>
This commit is contained in:
parent
3ea2c45745
commit
54bd8c8eea
1 changed files with 143 additions and 0 deletions
143
docs/fcas/creation_workflow.rst
Normal file
143
docs/fcas/creation_workflow.rst
Normal file
|
@ -0,0 +1,143 @@
|
|||
.. _creation_workflow:
|
||||
|
||||
Existing solution
|
||||
====
|
||||
|
||||
The existing solution for the problem statement came to address the previous
|
||||
ticket which can be found
|
||||
`here <https://pagure.io/fedora-infrastructure/issue/11149>`_. The project
|
||||
repository is located at
|
||||
`Fedora User Activity Statistics <https://github.com/t0xic0der/fuas>`_.
|
||||
|
||||
How does it work?
|
||||
----
|
||||
|
||||
There are essentially two functional units in the project - the ``namelist`` that
|
||||
lets the service runner fetch the usernames from the FASJSON service on a file
|
||||
and the ``actvlist`` that checks the activity for the names present in the
|
||||
aforementioned file on Datagrepper. These are run as automated cronjobs to be
|
||||
run once every after a certain period of time to obtain both an updated list
|
||||
of usernames as well as an updated count of active users. The functioning of
|
||||
the service is governed by a configuration file that can be edited to suit the
|
||||
requirements of those running the service.
|
||||
|
||||
**Usage** ::
|
||||
|
||||
Usage: fuas [OPTIONS] COMMAND [ARGS]...
|
||||
|
||||
Options:
|
||||
--version Show the version and exit.
|
||||
--help Show this message and exit.
|
||||
|
||||
Commands:
|
||||
activity Fetch a list of active usernames from Datagrepper
|
||||
namelist Fetch a list of usernames on the Fedora Account System
|
||||
|
||||
Configuration file
|
||||
^^^^
|
||||
|
||||
The sample configuration file can be found
|
||||
`here <https://github.com/t0xic0der/fuas/blob/main/fuas/conf.py>`_ that can be
|
||||
made a copy of and edited by the users to tailor-fit the service according to
|
||||
their requirements.
|
||||
|
||||
The following is an exhaustive list of customizable variables. These variables
|
||||
are intended to be customized by the users.
|
||||
|
||||
1. ``daysqant`` (Default - 90) - Number of days for which the activity record is requested for
|
||||
2. ``pagerows`` (Default - 1) - Number of rows to be displayed on a page when requesting data from Datagrepper
|
||||
3. ``minactqt`` (Default - 5) - Minimum number of activities to be considered to count as user as "active"
|
||||
4. ``services`` (Default - ["Pagure"]) - Services to probe into for activity records pertaining to the users
|
||||
5. ``jsonloca`` (Default - "https://fasjson.fedoraproject.org") - Location where the FASJSON service is hosted
|
||||
6. ``dgprlink`` (Default - "https://apps.fedoraproject.org/datagrepper/v2/search") - Location where the Datagrepper service is hosted
|
||||
7. ``useriden`` (Default - "t0xic0der@FEDORAPROJECT.ORG") - User to masquerade as for probing into the FASJSON records
|
||||
8. ``listlink`` (Default - "https://raw.githubusercontent.com/t0xic0der/fuas/main/data/namefile") - Location where the list of available users is present
|
||||
9. ``namefile`` (Default - "/var/tmp/namefile") - Location where the list of available users is to be stored locally
|
||||
10. ``actvfile`` (Default - "/var/tmp/actvfile") - Location where the list of active users is to be stored locally
|
||||
11. ``acqtfile`` (Default - "/var/tmp/acqtfile") - Location where the count of active users is to be stored locally
|
||||
|
||||
The config file also consists of a list of computing variables of the global
|
||||
scope. These variables are intended only for developers.
|
||||
|
||||
1. ``dfltsize`` (Default - 1000) - Size of iterable pages for all entities present in the FASJSON service
|
||||
|
||||
The ``namelist`` unit
|
||||
^^^^
|
||||
|
||||
The service unit takes up the configuration variables like ``username`` and
|
||||
``password`` for the user to masquerade as while probing into the FASJSON
|
||||
service, ``jsonloca`` for getting the location where the FASJSON service is
|
||||
hosted, ``namefile`` for storing the list of usernames received. Using a
|
||||
session created with the masquerading user, the unit queries for the list of
|
||||
all available users to FASJSON service and stores them into the file specified
|
||||
in the configuration variable.
|
||||
|
||||
The aforementioned session is created by using the ``krb5`` packages and the
|
||||
``username`` and ``password`` are passed in the standard input of the console.
|
||||
While this works for a smaller scale run where the said service unit is run in
|
||||
ephemeral containers, this approach is highly discouraged and instead, a
|
||||
session created using a keytab file is recommended in its stead. Also, a set of
|
||||
workarounds must be placed in the default ``krb5`` configuration file to allow
|
||||
for seamless authentication.
|
||||
|
||||
As this is a unit that runs for a longer period of time and makes queries that
|
||||
are performance intensive in nature, it is strongly recommended to run this
|
||||
unit no more than once or twice in a span of 24 hours. Also, it is essential to
|
||||
ensure that the internet connection is reliable and the devices are not turned
|
||||
off while the long-running service unit is in progress. This is owing to the
|
||||
fact that the service unit is non-resumable in nature and writes to disk only
|
||||
when the fetch is complete.
|
||||
|
||||
To ensure a proper running of the service unit without any possible
|
||||
interruptions, the service unit is run as a workflow on
|
||||
`GitHub Actions <https://github.com/features/actions>`_. The workflow file can
|
||||
be found here at
|
||||
`https://github.com/t0xic0der/fuas/blob/main/.github/workflows/main.yml <https://github.com/t0xic0der/fuas/blob/main/.github/workflows/main.yml>`_
|
||||
that helps to set up the environment for the service unit to run, fetches the
|
||||
list of usernames and then commits them back to the same repository - making
|
||||
that list publicly available for consumption. The time limit for running a
|
||||
workflow on GitHub Actions is, however, 6 hours and that might, in some cases,
|
||||
lead to timeouts and incomplete runs.
|
||||
|
||||
The ``actvlist`` unit
|
||||
^^^^
|
||||
|
||||
The service unit takes up the configuration variables like ``listlink`` for
|
||||
locating the file containing the list of all users registered on Fedora
|
||||
Accounts System, ``daysqant`` for limiting the activity queries to under a said
|
||||
number of days, ``minactqt`` for getting the bare minimum amount of activities
|
||||
for a user to be counted as "active", ``services`` for looking into their
|
||||
records for activities, ``dgprlink`` for getting the location where the
|
||||
Datagrepper service is hosted, ``actvfile`` and ``acqtfile`` for storing the
|
||||
names as well as counts of the active users respectively.
|
||||
|
||||
The service unit fetches the list of users from the aforementioned
|
||||
configuration variables and iterates through them to find the activities
|
||||
pertaining to the user in question. The period limit is appropriately set and
|
||||
if the count of activities under the said period comes out to be greater than
|
||||
or equal to the minimum number of activities decided - that user is considered
|
||||
to be "active". Their username gets added to the list of all active users and
|
||||
the count of active users is incremented accordingly. Both of these are stored
|
||||
in the files specified in the configuration variables.
|
||||
|
||||
As this is a unit that runs for a longer period of time and makes queries that
|
||||
are performance intensive in nature, it is strongly recommended to run this
|
||||
unit no more than once or twice in a span of 24 hours. Also, it is essential to
|
||||
ensure that the internet connection is reliable and the devices are not turned
|
||||
off while the long-running service unit is in progress. This is owing to the
|
||||
fact that the service unit is non-resumable in nature and writes to disk only
|
||||
when the fetch is complete. In an average, this service unit takes at least 4-6
|
||||
times more time than the former service unit.
|
||||
|
||||
To ensure a proper running of the service unit without any possible hiccups,
|
||||
the service unit is run as a workflow on
|
||||
`GitHub Actions <https://github.com/features/actions>`_. The workflow file can
|
||||
be found here at
|
||||
`https://github.com/t0xic0der/fuas/blob/main/.github/workflows/actv.yml <https://github.com/t0xic0der/fuas/blob/main/.github/workflows/actv.yml>`_
|
||||
than helps to set up the environment for the service unit to run, fetches the
|
||||
list of active usernames as well as the count and then commits them back to the
|
||||
repository - making that list as well as the count publicly available for
|
||||
consumption. The time limit for running a workflow on GitHub Actions is,
|
||||
however, 6 hours and that might, in some cases, lead to timeouts and incomplete
|
||||
runs.
|
||||
|
Loading…
Add table
Add a link
Reference in a new issue