diff --git a/docs/fcas/creation_workflow.rst b/docs/fcas/creation_workflow.rst new file mode 100644 index 0000000..f526451 --- /dev/null +++ b/docs/fcas/creation_workflow.rst @@ -0,0 +1,143 @@ +.. _creation_workflow: + +Existing solution +==== + +The existing solution for the problem statement came to address the previous +ticket which can be found +`here `_. The project +repository is located at +`Fedora User Activity Statistics `_. + +How does it work? +---- + +There are essentially two functional units in the project - the ``namelist`` that +lets the service runner fetch the usernames from the FASJSON service on a file +and the ``actvlist`` that checks the activity for the names present in the +aforementioned file on Datagrepper. These are run as automated cronjobs to be +run once every after a certain period of time to obtain both an updated list +of usernames as well as an updated count of active users. The functioning of +the service is governed by a configuration file that can be edited to suit the +requirements of those running the service. + +**Usage** :: + + Usage: fuas [OPTIONS] COMMAND [ARGS]... + + Options: + --version Show the version and exit. + --help Show this message and exit. + + Commands: + activity Fetch a list of active usernames from Datagrepper + namelist Fetch a list of usernames on the Fedora Account System + +Configuration file +^^^^ + +The sample configuration file can be found +`here `_ that can be +made a copy of and edited by the users to tailor-fit the service according to +their requirements. + +The following is an exhaustive list of customizable variables. These variables +are intended to be customized by the users. + +1. ``daysqant`` (Default - 90) - Number of days for which the activity record is requested for +2. ``pagerows`` (Default - 1) - Number of rows to be displayed on a page when requesting data from Datagrepper +3. ``minactqt`` (Default - 5) - Minimum number of activities to be considered to count as user as "active" +4. ``services`` (Default - ["Pagure"]) - Services to probe into for activity records pertaining to the users +5. ``jsonloca`` (Default - "https://fasjson.fedoraproject.org") - Location where the FASJSON service is hosted +6. ``dgprlink`` (Default - "https://apps.fedoraproject.org/datagrepper/v2/search") - Location where the Datagrepper service is hosted +7. ``useriden`` (Default - "t0xic0der@FEDORAPROJECT.ORG") - User to masquerade as for probing into the FASJSON records +8. ``listlink`` (Default - "https://raw.githubusercontent.com/t0xic0der/fuas/main/data/namefile") - Location where the list of available users is present +9. ``namefile`` (Default - "/var/tmp/namefile") - Location where the list of available users is to be stored locally +10. ``actvfile`` (Default - "/var/tmp/actvfile") - Location where the list of active users is to be stored locally +11. ``acqtfile`` (Default - "/var/tmp/acqtfile") - Location where the count of active users is to be stored locally + +The config file also consists of a list of computing variables of the global +scope. These variables are intended only for developers. + +1. ``dfltsize`` (Default - 1000) - Size of iterable pages for all entities present in the FASJSON service + +The ``namelist`` unit +^^^^ + +The service unit takes up the configuration variables like ``username`` and +``password`` for the user to masquerade as while probing into the FASJSON +service, ``jsonloca`` for getting the location where the FASJSON service is +hosted, ``namefile`` for storing the list of usernames received. Using a +session created with the masquerading user, the unit queries for the list of +all available users to FASJSON service and stores them into the file specified +in the configuration variable. + +The aforementioned session is created by using the ``krb5`` packages and the +``username`` and ``password`` are passed in the standard input of the console. +While this works for a smaller scale run where the said service unit is run in +ephemeral containers, this approach is highly discouraged and instead, a +session created using a keytab file is recommended in its stead. Also, a set of +workarounds must be placed in the default ``krb5`` configuration file to allow +for seamless authentication. + +As this is a unit that runs for a longer period of time and makes queries that +are performance intensive in nature, it is strongly recommended to run this +unit no more than once or twice in a span of 24 hours. Also, it is essential to +ensure that the internet connection is reliable and the devices are not turned +off while the long-running service unit is in progress. This is owing to the +fact that the service unit is non-resumable in nature and writes to disk only +when the fetch is complete. + +To ensure a proper running of the service unit without any possible +interruptions, the service unit is run as a workflow on +`GitHub Actions `_. The workflow file can +be found here at +`https://github.com/t0xic0der/fuas/blob/main/.github/workflows/main.yml `_ +that helps to set up the environment for the service unit to run, fetches the +list of usernames and then commits them back to the same repository - making +that list publicly available for consumption. The time limit for running a +workflow on GitHub Actions is, however, 6 hours and that might, in some cases, +lead to timeouts and incomplete runs. + +The ``actvlist`` unit +^^^^ + +The service unit takes up the configuration variables like ``listlink`` for +locating the file containing the list of all users registered on Fedora +Accounts System, ``daysqant`` for limiting the activity queries to under a said +number of days, ``minactqt`` for getting the bare minimum amount of activities +for a user to be counted as "active", ``services`` for looking into their +records for activities, ``dgprlink`` for getting the location where the +Datagrepper service is hosted, ``actvfile`` and ``acqtfile`` for storing the +names as well as counts of the active users respectively. + +The service unit fetches the list of users from the aforementioned +configuration variables and iterates through them to find the activities +pertaining to the user in question. The period limit is appropriately set and +if the count of activities under the said period comes out to be greater than +or equal to the minimum number of activities decided - that user is considered +to be "active". Their username gets added to the list of all active users and +the count of active users is incremented accordingly. Both of these are stored +in the files specified in the configuration variables. + +As this is a unit that runs for a longer period of time and makes queries that +are performance intensive in nature, it is strongly recommended to run this +unit no more than once or twice in a span of 24 hours. Also, it is essential to +ensure that the internet connection is reliable and the devices are not turned +off while the long-running service unit is in progress. This is owing to the +fact that the service unit is non-resumable in nature and writes to disk only +when the fetch is complete. In an average, this service unit takes at least 4-6 +times more time than the former service unit. + +To ensure a proper running of the service unit without any possible hiccups, +the service unit is run as a workflow on +`GitHub Actions `_. The workflow file can +be found here at +`https://github.com/t0xic0der/fuas/blob/main/.github/workflows/actv.yml `_ +than helps to set up the environment for the service unit to run, fetches the +list of active usernames as well as the count and then commits them back to the +repository - making that list as well as the count publicly available for +consumption. The time limit for running a workflow on GitHub Actions is, +however, 6 hours and that might, in some cases, lead to timeouts and incomplete +runs. +