146 lines
7.8 KiB
ReStructuredText
146 lines
7.8 KiB
ReStructuredText
.. _creation_workflow:
|
|
|
|
Existing solution
|
|
=================
|
|
|
|
The existing solution for the problem statement came to address the previous ticket
|
|
which can be found `here <https://pagure.io/fedora-infrastructure/issue/11149>`__. The
|
|
project repository is located at `Fedora User Activity Statistics
|
|
<https://github.com/t0xic0der/fuas>`_.
|
|
|
|
How does it work?
|
|
-----------------
|
|
|
|
The project consists of two main functional units: the ``namelist`` unit and the
|
|
``actvlist`` unit. The ``namelist`` unit facilitates the retrieval of usernames from the
|
|
FASJSON service by the service runner, while the ``actvlist`` unit verifies the activity
|
|
status of the names listed in the aforementioned file through Datagrepper. Both units
|
|
are executed as automated cronjobs, scheduled to run at specific intervals. This ensures
|
|
that the service maintains an up-to-date list of usernames and a count of active users.
|
|
The service's behavior is controlled by a configurable file, allowing administrators to
|
|
tailor it according to their specific needs.
|
|
|
|
**Usage**
|
|
|
|
.. code-block::
|
|
|
|
Usage: fuas [OPTIONS] COMMAND [ARGS]...
|
|
|
|
Options:
|
|
--version Show the version and exit.
|
|
--help Show this message and exit.
|
|
|
|
Commands:
|
|
activity Fetch a list of active usernames from Datagrepper
|
|
namelist Fetch a list of usernames on the Fedora Account System
|
|
|
|
Configuration file
|
|
------------------
|
|
|
|
The sample configuration file can be found `here
|
|
<https://github.com/t0xic0der/fuas/blob/main/fuas/conf.py>`__ that can be made a copy of
|
|
and edited by the users to tailor-fit the service according to their requirements.
|
|
|
|
The following is an exhaustive list of customizable variables. These variables are
|
|
intended to be customized by the users.
|
|
|
|
1. ``daysqant`` (Default - 90) - Number of days for which the activity record is
|
|
requested for
|
|
2. ``pagerows`` (Default - 1) - Number of rows to be displayed on a page when requesting
|
|
data from Datagrepper
|
|
3. ``minactqt`` (Default - 5) - Minimum number of activities to be considered to count
|
|
as user as "active"
|
|
4. ``services`` (Default - ["Pagure"]) - Services to probe into for activity records
|
|
pertaining to the users
|
|
5. ``jsonloca`` (Default - "https://fasjson.fedoraproject.org") - Location where the
|
|
FASJSON service is hosted
|
|
6. ``dgprlink`` (Default - "https://apps.fedoraproject.org/datagrepper/v2/search") -
|
|
Location where the Datagrepper service is hosted
|
|
7. ``useriden`` (Default - "t0xic0der@FEDORAPROJECT.ORG") - User to masquerade as for
|
|
probing into the FASJSON records
|
|
8. ``listlink`` (Default -
|
|
"https://raw.githubusercontent.com/t0xic0der/fuas/main/data/namefile") - Location
|
|
where the list of available users is present
|
|
9. ``namefile`` (Default - "/var/tmp/namefile") - Location where the list of available
|
|
users is to be stored locally
|
|
10. ``actvfile`` (Default - "/var/tmp/actvfile") - Location where the list of active
|
|
users is to be stored locally
|
|
11. ``acqtfile`` (Default - "/var/tmp/acqtfile") - Location where the count of active
|
|
users is to be stored locally
|
|
|
|
The config file also consists of a list of computing variables of the global scope.
|
|
These variables are intended only for developers.
|
|
|
|
1. ``dfltsize`` (Default - 1000) - Size of iterable pages for all entities present in
|
|
the FASJSON service
|
|
|
|
The ``namelist`` unit
|
|
---------------------
|
|
|
|
The service unit takes up the configuration variables like ``username`` and ``password``
|
|
for the user to masquerade as while probing into the FASJSON service, ``jsonloca`` for
|
|
getting the location where the FASJSON service is hosted, ``namefile`` for storing the
|
|
list of usernames received. Using a session created with the masquerading user, the unit
|
|
queries for the list of all available users to FASJSON service and stores them into the
|
|
file specified in the configuration variable.
|
|
|
|
The aforementioned session is created by using the ``krb5`` packages and the
|
|
``username`` and ``password`` are passed in the standard input of the console. While
|
|
this works for a smaller scale run where the said service unit is run in ephemeral
|
|
containers, this approach is highly discouraged and instead, a session created using a
|
|
keytab file is recommended in its stead. Also, a set of workarounds must be placed in
|
|
the default ``krb5`` configuration file to allow for seamless authentication.
|
|
|
|
As this is a unit that runs for a longer period of time and makes queries that are
|
|
performance intensive in nature, it is strongly recommended to run this unit no more
|
|
than once or twice in a span of 24 hours. Also, it is essential to ensure that the
|
|
internet connection is reliable and the devices are not turned off while the
|
|
long-running service unit is in progress. This is owing to the fact that the service
|
|
unit is non-resumable in nature and writes to disk only when the fetch is complete.
|
|
|
|
To ensure a proper running of the service unit without any possible interruptions, the
|
|
service unit is run as a workflow on `GitHub Actions
|
|
<https://github.com/features/actions>`_. The workflow file can be found here at
|
|
https://github.com/t0xic0der/fuas/blob/main/.github/workflows/main.yml that helps to set
|
|
up the environment for the service unit to run, fetches the list of usernames and then
|
|
commits them back to the same repository - making that list publicly available for
|
|
consumption. The time limit for running a workflow on GitHub Actions is, however, 6
|
|
hours and that might, in some cases, lead to timeouts and incomplete runs.
|
|
|
|
The ``actvlist`` unit
|
|
---------------------
|
|
|
|
The service unit takes up the configuration variables like ``listlink`` for locating the
|
|
file containing the list of all users registered on Fedora Accounts System, ``daysqant``
|
|
for limiting the activity queries to under a said number of days, ``minactqt`` for
|
|
getting the bare minimum amount of activities for a user to be counted as "active",
|
|
``services`` for looking into their records for activities, ``dgprlink`` for getting the
|
|
location where the Datagrepper service is hosted, ``actvfile`` and ``acqtfile`` for
|
|
storing the names as well as counts of the active users respectively.
|
|
|
|
The service unit fetches the list of users from the aforementioned configuration
|
|
variables and iterates through them to find the activities pertaining to the user in
|
|
question. The period limit is appropriately set and if the count of activities under the
|
|
said period comes out to be greater than or equal to the minimum number of activities
|
|
decided - that user is considered to be "active". Their username gets added to the list
|
|
of all active users and the count of active users is incremented accordingly. Both of
|
|
these are stored in the files specified in the configuration variables.
|
|
|
|
As this is a unit that runs for a longer period of time and makes queries that are
|
|
performance intensive in nature, it is strongly recommended to run this unit no more
|
|
than once or twice in a span of 24 hours. Also, it is essential to ensure that the
|
|
internet connection is reliable and the devices are not turned off while the
|
|
long-running service unit is in progress. This is owing to the fact that the service
|
|
unit is non-resumable in nature and writes to disk only when the fetch is complete. In
|
|
an average, this service unit takes at least 4-6 times more time than the former service
|
|
unit.
|
|
|
|
To ensure a proper running of the service unit without any possible hiccups, the service
|
|
unit is run as a workflow on `GitHub Actions <https://github.com/features/actions>`_.
|
|
The workflow file can be found here at
|
|
https://github.com/t0xic0der/fuas/blob/main/.github/workflows/actv.yml than helps to set
|
|
up the environment for the service unit to run, fetches the list of active usernames as
|
|
well as the count and then commits them back to the repository - making that list as
|
|
well as the count publicly available for consumption. The time limit for running a
|
|
workflow on GitHub Actions is, however, 6 hours and that might, in some cases, lead to
|
|
timeouts and incomplete runs.
|