Add working details about the prototype created
Signed-off-by: Akashdeep Dhar <akashdeep.dhar@gmail.com>
This commit is contained in:
parent
3ea2c45745
commit
54bd8c8eea
1 changed files with 143 additions and 0 deletions
143
docs/fcas/creation_workflow.rst
Normal file
143
docs/fcas/creation_workflow.rst
Normal file
|
@ -0,0 +1,143 @@
|
||||||
|
.. _creation_workflow:
|
||||||
|
|
||||||
|
Existing solution
|
||||||
|
====
|
||||||
|
|
||||||
|
The existing solution for the problem statement came to address the previous
|
||||||
|
ticket which can be found
|
||||||
|
`here <https://pagure.io/fedora-infrastructure/issue/11149>`_. The project
|
||||||
|
repository is located at
|
||||||
|
`Fedora User Activity Statistics <https://github.com/t0xic0der/fuas>`_.
|
||||||
|
|
||||||
|
How does it work?
|
||||||
|
----
|
||||||
|
|
||||||
|
There are essentially two functional units in the project - the ``namelist`` that
|
||||||
|
lets the service runner fetch the usernames from the FASJSON service on a file
|
||||||
|
and the ``actvlist`` that checks the activity for the names present in the
|
||||||
|
aforementioned file on Datagrepper. These are run as automated cronjobs to be
|
||||||
|
run once every after a certain period of time to obtain both an updated list
|
||||||
|
of usernames as well as an updated count of active users. The functioning of
|
||||||
|
the service is governed by a configuration file that can be edited to suit the
|
||||||
|
requirements of those running the service.
|
||||||
|
|
||||||
|
**Usage** ::
|
||||||
|
|
||||||
|
Usage: fuas [OPTIONS] COMMAND [ARGS]...
|
||||||
|
|
||||||
|
Options:
|
||||||
|
--version Show the version and exit.
|
||||||
|
--help Show this message and exit.
|
||||||
|
|
||||||
|
Commands:
|
||||||
|
activity Fetch a list of active usernames from Datagrepper
|
||||||
|
namelist Fetch a list of usernames on the Fedora Account System
|
||||||
|
|
||||||
|
Configuration file
|
||||||
|
^^^^
|
||||||
|
|
||||||
|
The sample configuration file can be found
|
||||||
|
`here <https://github.com/t0xic0der/fuas/blob/main/fuas/conf.py>`_ that can be
|
||||||
|
made a copy of and edited by the users to tailor-fit the service according to
|
||||||
|
their requirements.
|
||||||
|
|
||||||
|
The following is an exhaustive list of customizable variables. These variables
|
||||||
|
are intended to be customized by the users.
|
||||||
|
|
||||||
|
1. ``daysqant`` (Default - 90) - Number of days for which the activity record is requested for
|
||||||
|
2. ``pagerows`` (Default - 1) - Number of rows to be displayed on a page when requesting data from Datagrepper
|
||||||
|
3. ``minactqt`` (Default - 5) - Minimum number of activities to be considered to count as user as "active"
|
||||||
|
4. ``services`` (Default - ["Pagure"]) - Services to probe into for activity records pertaining to the users
|
||||||
|
5. ``jsonloca`` (Default - "https://fasjson.fedoraproject.org") - Location where the FASJSON service is hosted
|
||||||
|
6. ``dgprlink`` (Default - "https://apps.fedoraproject.org/datagrepper/v2/search") - Location where the Datagrepper service is hosted
|
||||||
|
7. ``useriden`` (Default - "t0xic0der@FEDORAPROJECT.ORG") - User to masquerade as for probing into the FASJSON records
|
||||||
|
8. ``listlink`` (Default - "https://raw.githubusercontent.com/t0xic0der/fuas/main/data/namefile") - Location where the list of available users is present
|
||||||
|
9. ``namefile`` (Default - "/var/tmp/namefile") - Location where the list of available users is to be stored locally
|
||||||
|
10. ``actvfile`` (Default - "/var/tmp/actvfile") - Location where the list of active users is to be stored locally
|
||||||
|
11. ``acqtfile`` (Default - "/var/tmp/acqtfile") - Location where the count of active users is to be stored locally
|
||||||
|
|
||||||
|
The config file also consists of a list of computing variables of the global
|
||||||
|
scope. These variables are intended only for developers.
|
||||||
|
|
||||||
|
1. ``dfltsize`` (Default - 1000) - Size of iterable pages for all entities present in the FASJSON service
|
||||||
|
|
||||||
|
The ``namelist`` unit
|
||||||
|
^^^^
|
||||||
|
|
||||||
|
The service unit takes up the configuration variables like ``username`` and
|
||||||
|
``password`` for the user to masquerade as while probing into the FASJSON
|
||||||
|
service, ``jsonloca`` for getting the location where the FASJSON service is
|
||||||
|
hosted, ``namefile`` for storing the list of usernames received. Using a
|
||||||
|
session created with the masquerading user, the unit queries for the list of
|
||||||
|
all available users to FASJSON service and stores them into the file specified
|
||||||
|
in the configuration variable.
|
||||||
|
|
||||||
|
The aforementioned session is created by using the ``krb5`` packages and the
|
||||||
|
``username`` and ``password`` are passed in the standard input of the console.
|
||||||
|
While this works for a smaller scale run where the said service unit is run in
|
||||||
|
ephemeral containers, this approach is highly discouraged and instead, a
|
||||||
|
session created using a keytab file is recommended in its stead. Also, a set of
|
||||||
|
workarounds must be placed in the default ``krb5`` configuration file to allow
|
||||||
|
for seamless authentication.
|
||||||
|
|
||||||
|
As this is a unit that runs for a longer period of time and makes queries that
|
||||||
|
are performance intensive in nature, it is strongly recommended to run this
|
||||||
|
unit no more than once or twice in a span of 24 hours. Also, it is essential to
|
||||||
|
ensure that the internet connection is reliable and the devices are not turned
|
||||||
|
off while the long-running service unit is in progress. This is owing to the
|
||||||
|
fact that the service unit is non-resumable in nature and writes to disk only
|
||||||
|
when the fetch is complete.
|
||||||
|
|
||||||
|
To ensure a proper running of the service unit without any possible
|
||||||
|
interruptions, the service unit is run as a workflow on
|
||||||
|
`GitHub Actions <https://github.com/features/actions>`_. The workflow file can
|
||||||
|
be found here at
|
||||||
|
`https://github.com/t0xic0der/fuas/blob/main/.github/workflows/main.yml <https://github.com/t0xic0der/fuas/blob/main/.github/workflows/main.yml>`_
|
||||||
|
that helps to set up the environment for the service unit to run, fetches the
|
||||||
|
list of usernames and then commits them back to the same repository - making
|
||||||
|
that list publicly available for consumption. The time limit for running a
|
||||||
|
workflow on GitHub Actions is, however, 6 hours and that might, in some cases,
|
||||||
|
lead to timeouts and incomplete runs.
|
||||||
|
|
||||||
|
The ``actvlist`` unit
|
||||||
|
^^^^
|
||||||
|
|
||||||
|
The service unit takes up the configuration variables like ``listlink`` for
|
||||||
|
locating the file containing the list of all users registered on Fedora
|
||||||
|
Accounts System, ``daysqant`` for limiting the activity queries to under a said
|
||||||
|
number of days, ``minactqt`` for getting the bare minimum amount of activities
|
||||||
|
for a user to be counted as "active", ``services`` for looking into their
|
||||||
|
records for activities, ``dgprlink`` for getting the location where the
|
||||||
|
Datagrepper service is hosted, ``actvfile`` and ``acqtfile`` for storing the
|
||||||
|
names as well as counts of the active users respectively.
|
||||||
|
|
||||||
|
The service unit fetches the list of users from the aforementioned
|
||||||
|
configuration variables and iterates through them to find the activities
|
||||||
|
pertaining to the user in question. The period limit is appropriately set and
|
||||||
|
if the count of activities under the said period comes out to be greater than
|
||||||
|
or equal to the minimum number of activities decided - that user is considered
|
||||||
|
to be "active". Their username gets added to the list of all active users and
|
||||||
|
the count of active users is incremented accordingly. Both of these are stored
|
||||||
|
in the files specified in the configuration variables.
|
||||||
|
|
||||||
|
As this is a unit that runs for a longer period of time and makes queries that
|
||||||
|
are performance intensive in nature, it is strongly recommended to run this
|
||||||
|
unit no more than once or twice in a span of 24 hours. Also, it is essential to
|
||||||
|
ensure that the internet connection is reliable and the devices are not turned
|
||||||
|
off while the long-running service unit is in progress. This is owing to the
|
||||||
|
fact that the service unit is non-resumable in nature and writes to disk only
|
||||||
|
when the fetch is complete. In an average, this service unit takes at least 4-6
|
||||||
|
times more time than the former service unit.
|
||||||
|
|
||||||
|
To ensure a proper running of the service unit without any possible hiccups,
|
||||||
|
the service unit is run as a workflow on
|
||||||
|
`GitHub Actions <https://github.com/features/actions>`_. The workflow file can
|
||||||
|
be found here at
|
||||||
|
`https://github.com/t0xic0der/fuas/blob/main/.github/workflows/actv.yml <https://github.com/t0xic0der/fuas/blob/main/.github/workflows/actv.yml>`_
|
||||||
|
than helps to set up the environment for the service unit to run, fetches the
|
||||||
|
list of active usernames as well as the count and then commits them back to the
|
||||||
|
repository - making that list as well as the count publicly available for
|
||||||
|
consumption. The time limit for running a workflow on GitHub Actions is,
|
||||||
|
however, 6 hours and that might, in some cases, lead to timeouts and incomplete
|
||||||
|
runs.
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue