fix parsing errors and sphinx warnings

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2023-11-16 08:02:56 +10:00 · 2023-11-16 08:02:56 +10:00 · ba720c3d77
commit ba720c3d77
parent 8fb9b2fdf0
98 changed files with 4799 additions and 4788 deletions
--- a/docs/mirrors-countme/index.rst
+++ b/docs/mirrors-countme/index.rst
@ -1,69 +1,72 @@
 Improving reliability of mirrors-countme scripts
-========================
+================================================

 Notes on curent deployment
 --------------------------

-For investigating and deployment, you need to be the member of 
-sysadmin-analysis. 
+For investigating and deployment, you need to be the member of sysadmin-analysis.

 The repo that has the code is on https://pagure.io/mirrors-countme/

-The deployment configuration is stored in ansible repo, run through playbook playbooks/groups/logserver.yml,
-mostly in role roles/web-data-analysis.
+The deployment configuration is stored in ansible repo, run through playbook
+playbooks/groups/logserver.yml, mostly in role roles/web-data-analysis.

-The scripts are running on log01.iad2.fedoraproject.org. If you are a member of sysadmin-analysis, you should be able to ssh,
-and have root there.
+The scripts are running on log01.iad2.fedoraproject.org. If you are a member of
+sysadmin-analysis, you should be able to ssh, and have root there.

 There are several cron jobs responsible for running the scripts:

-* syncHttpLogs in /etc/cron.daily/ rsync logs to /var/log/hosts/$HOST/$YEAR/$MONTH/$DAY/http
-* combineHttp - in /etc/cron.d/ every day at 6, runs /usr/local/bin/combineHttpLogs.sh
-                combines logs from /var/log/hosts to /mnt/fedora_stats/combined-http 
-                based on the project. We are using /usr/share/awstats/tools/logresolvemerge.pl and I am not sure we are using it correctly
-* condense-mirrorlogs - in /etc/cron.d/ every day at 6, does some sort of analysis, posibly one of the older scripts. It seems to attempt to sort the logs again.
-* countme-update - in /etc/cron.d/ every day at 9, runs two scripts, 
-                countme-update-rawdb.sh that parses the logs and fills in the raw database
-                and countme-update-totals.sh that uses the rawdb to calculate the statistics
-                The results of countme-update-totals.sh are then copied to a web-folder to make it available at https://data-analysis.fedoraproject.org/csv-reports/countme/
-
+- syncHttpLogs in /etc/cron.daily/ rsync logs to
+  /var/log/hosts/$HOST/$YEAR/$MONTH/$DAY/http
+- combineHttp - in /etc/cron.d/ every day at 6, runs /usr/local/bin/combineHttpLogs.sh
+      combines logs from /var/log/hosts to /mnt/fedora_stats/combined-http based on the
+      project. We are using /usr/share/awstats/tools/logresolvemerge.pl and I am not
+      sure we are using it correctly
+- condense-mirrorlogs - in /etc/cron.d/ every day at 6, does some sort of analysis,
+  posibly one of the older scripts. It seems to attempt to sort the logs again.
+- countme-update - in /etc/cron.d/ every day at 9, runs two scripts,
+      countme-update-rawdb.sh that parses the logs and fills in the raw database and
+      countme-update-totals.sh that uses the rawdb to calculate the statistics The
+      results of countme-update-totals.sh are then copied to a web-folder to make it
+      available at https://data-analysis.fedoraproject.org/csv-reports/countme/

 Notes on avenues of improvement
 -------------------------------

 We have several areas we need to improve:

-* downloading and syncing the logs, sometimes can fail or hang. 
-* problems when combining them
-* instalation of the scripts, as there has been problem with updates,
-  and currently we are doing just a pull of the git repo and running the pip install 
+- downloading and syncing the logs, sometimes can fail or hang.
+- problems when combining them
+- instalation of the scripts, as there has been problem with updates, and currently we
+  are doing just a pull of the git repo and running the pip install

 Notes on replacing with off-the shelf solutions
 -----------------------------------------------

-As the raw data we are basing our staticis on are just the access-logs
-from our proxy-servers, we could be able to find an off-the shelf solution,
-that could replace our brittle scripts.
+As the raw data we are basing our staticis on are just the access-logs from our
+proxy-servers, we could be able to find an off-the shelf solution, that could replace
+our brittle scripts.

-There are two solutions that psesent themselves, ELK stack
-and Loki and Promtail by Grafana.
+There are two solutions that psesent themselves, ELK stack and Loki and Promtail by
+Grafana.

-We are already running ELK stack on our openshift,
-but our experience so far is that Elastic Search has even more brittle deployment.
+We are already running ELK stack on our openshift, but our experience so far is that
+Elastic Search has even more brittle deployment.

-We did some experiments with Loki. The technology seems promissing,
-as it is much more simple than ELK stack, with size looking comparable to the raw logs.
+We did some experiments with Loki. The technology seems promissing, as it is much more
+simple than ELK stack, with size looking comparable to the raw logs.

 Moreover, promtail that does the parsing and uploading of logs has facilities to both
-add labels to loglies that will then be indexed and queriable in the database
-and collect statistics from the loglines directly that can be gathered by prometheus.
+add labels to loglies that will then be indexed and queriable in the database and
+collect statistics from the loglines directly that can be gathered by prometheus.

 You can query the logs with language simmilar to GraphQL.

 We are not going to use it because:

-* it doesn't deal well with historical data, so any attempts at initial import of logsare pain.
-* using promtail enabled metrics wouldn't help us with double-counting of people hitting different proxy servers
-* configuration is fiddly and tricky to test
-* changing batch-process to soft-realtime sounds like a headache
- 
+- it doesn't deal well with historical data, so any attempts at initial import of
+  logsare pain.
+- using promtail enabled metrics wouldn't help us with double-counting of people hitting
+  different proxy servers
+- configuration is fiddly and tricky to test
+- changing batch-process to soft-realtime sounds like a headache