144 lines
7.8 KiB
ReStructuredText
144 lines
7.8 KiB
ReStructuredText
.. _creation_workflow:
|
|
|
|
Existing solution
|
|
====
|
|
|
|
The existing solution for the problem statement came to address the previous
|
|
ticket which can be found
|
|
`here <https://pagure.io/fedora-infrastructure/issue/11149>`_. The project
|
|
repository is located at
|
|
`Fedora User Activity Statistics <https://github.com/t0xic0der/fuas>`_.
|
|
|
|
How does it work?
|
|
----
|
|
|
|
The project consists of two main functional units: the ``namelist`` unit and
|
|
the ``actvlist`` unit. The ``namelist`` unit facilitates the retrieval of
|
|
usernames from the FASJSON service by the service runner, while the
|
|
``actvlist`` unit verifies the activity status of the names listed in the
|
|
aforementioned file through Datagrepper. Both units are executed as automated
|
|
cronjobs, scheduled to run at specific intervals. This ensures that the service
|
|
maintains an up-to-date list of usernames and a count of active users. The
|
|
service's behavior is controlled by a configurable file, allowing
|
|
administrators to tailor it according to their specific needs.
|
|
|
|
**Usage** ::
|
|
|
|
Usage: fuas [OPTIONS] COMMAND [ARGS]...
|
|
|
|
Options:
|
|
--version Show the version and exit.
|
|
--help Show this message and exit.
|
|
|
|
Commands:
|
|
activity Fetch a list of active usernames from Datagrepper
|
|
namelist Fetch a list of usernames on the Fedora Account System
|
|
|
|
Configuration file
|
|
----
|
|
|
|
The sample configuration file can be found
|
|
`here <https://github.com/t0xic0der/fuas/blob/main/fuas/conf.py>`_ that can be
|
|
made a copy of and edited by the users to tailor-fit the service according to
|
|
their requirements.
|
|
|
|
The following is an exhaustive list of customizable variables. These variables
|
|
are intended to be customized by the users.
|
|
|
|
1. ``daysqant`` (Default - 90) - Number of days for which the activity record is requested for
|
|
2. ``pagerows`` (Default - 1) - Number of rows to be displayed on a page when requesting data from Datagrepper
|
|
3. ``minactqt`` (Default - 5) - Minimum number of activities to be considered to count as user as "active"
|
|
4. ``services`` (Default - ["Pagure"]) - Services to probe into for activity records pertaining to the users
|
|
5. ``jsonloca`` (Default - "https://fasjson.fedoraproject.org") - Location where the FASJSON service is hosted
|
|
6. ``dgprlink`` (Default - "https://apps.fedoraproject.org/datagrepper/v2/search") - Location where the Datagrepper service is hosted
|
|
7. ``useriden`` (Default - "t0xic0der@FEDORAPROJECT.ORG") - User to masquerade as for probing into the FASJSON records
|
|
8. ``listlink`` (Default - "https://raw.githubusercontent.com/t0xic0der/fuas/main/data/namefile") - Location where the list of available users is present
|
|
9. ``namefile`` (Default - "/var/tmp/namefile") - Location where the list of available users is to be stored locally
|
|
10. ``actvfile`` (Default - "/var/tmp/actvfile") - Location where the list of active users is to be stored locally
|
|
11. ``acqtfile`` (Default - "/var/tmp/acqtfile") - Location where the count of active users is to be stored locally
|
|
|
|
The config file also consists of a list of computing variables of the global
|
|
scope. These variables are intended only for developers.
|
|
|
|
1. ``dfltsize`` (Default - 1000) - Size of iterable pages for all entities present in the FASJSON service
|
|
|
|
The ``namelist`` unit
|
|
----
|
|
|
|
The service unit takes up the configuration variables like ``username`` and
|
|
``password`` for the user to masquerade as while probing into the FASJSON
|
|
service, ``jsonloca`` for getting the location where the FASJSON service is
|
|
hosted, ``namefile`` for storing the list of usernames received. Using a
|
|
session created with the masquerading user, the unit queries for the list of
|
|
all available users to FASJSON service and stores them into the file specified
|
|
in the configuration variable.
|
|
|
|
The aforementioned session is created by using the ``krb5`` packages and the
|
|
``username`` and ``password`` are passed in the standard input of the console.
|
|
While this works for a smaller scale run where the said service unit is run in
|
|
ephemeral containers, this approach is highly discouraged and instead, a
|
|
session created using a keytab file is recommended in its stead. Also, a set of
|
|
workarounds must be placed in the default ``krb5`` configuration file to allow
|
|
for seamless authentication.
|
|
|
|
As this is a unit that runs for a longer period of time and makes queries that
|
|
are performance intensive in nature, it is strongly recommended to run this
|
|
unit no more than once or twice in a span of 24 hours. Also, it is essential to
|
|
ensure that the internet connection is reliable and the devices are not turned
|
|
off while the long-running service unit is in progress. This is owing to the
|
|
fact that the service unit is non-resumable in nature and writes to disk only
|
|
when the fetch is complete.
|
|
|
|
To ensure a proper running of the service unit without any possible
|
|
interruptions, the service unit is run as a workflow on
|
|
`GitHub Actions <https://github.com/features/actions>`_. The workflow file can
|
|
be found here at
|
|
`https://github.com/t0xic0der/fuas/blob/main/.github/workflows/main.yml <https://github.com/t0xic0der/fuas/blob/main/.github/workflows/main.yml>`_
|
|
that helps to set up the environment for the service unit to run, fetches the
|
|
list of usernames and then commits them back to the same repository - making
|
|
that list publicly available for consumption. The time limit for running a
|
|
workflow on GitHub Actions is, however, 6 hours and that might, in some cases,
|
|
lead to timeouts and incomplete runs.
|
|
|
|
The ``actvlist`` unit
|
|
----
|
|
|
|
The service unit takes up the configuration variables like ``listlink`` for
|
|
locating the file containing the list of all users registered on Fedora
|
|
Accounts System, ``daysqant`` for limiting the activity queries to under a said
|
|
number of days, ``minactqt`` for getting the bare minimum amount of activities
|
|
for a user to be counted as "active", ``services`` for looking into their
|
|
records for activities, ``dgprlink`` for getting the location where the
|
|
Datagrepper service is hosted, ``actvfile`` and ``acqtfile`` for storing the
|
|
names as well as counts of the active users respectively.
|
|
|
|
The service unit fetches the list of users from the aforementioned
|
|
configuration variables and iterates through them to find the activities
|
|
pertaining to the user in question. The period limit is appropriately set and
|
|
if the count of activities under the said period comes out to be greater than
|
|
or equal to the minimum number of activities decided - that user is considered
|
|
to be "active". Their username gets added to the list of all active users and
|
|
the count of active users is incremented accordingly. Both of these are stored
|
|
in the files specified in the configuration variables.
|
|
|
|
As this is a unit that runs for a longer period of time and makes queries that
|
|
are performance intensive in nature, it is strongly recommended to run this
|
|
unit no more than once or twice in a span of 24 hours. Also, it is essential to
|
|
ensure that the internet connection is reliable and the devices are not turned
|
|
off while the long-running service unit is in progress. This is owing to the
|
|
fact that the service unit is non-resumable in nature and writes to disk only
|
|
when the fetch is complete. In an average, this service unit takes at least 4-6
|
|
times more time than the former service unit.
|
|
|
|
To ensure a proper running of the service unit without any possible hiccups,
|
|
the service unit is run as a workflow on
|
|
`GitHub Actions <https://github.com/features/actions>`_. The workflow file can
|
|
be found here at
|
|
`https://github.com/t0xic0der/fuas/blob/main/.github/workflows/actv.yml <https://github.com/t0xic0der/fuas/blob/main/.github/workflows/actv.yml>`_
|
|
than helps to set up the environment for the service unit to run, fetches the
|
|
list of active usernames as well as the count and then commits them back to the
|
|
repository - making that list as well as the count publicly available for
|
|
consumption. The time limit for running a workflow on GitHub Actions is,
|
|
however, 6 hours and that might, in some cases, lead to timeouts and incomplete
|
|
runs.
|
|
|