arc/docs/mirrors-countme/index.rst
Ryan Lerch ba720c3d77 fix parsing errors and sphinx warnings
Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2023-11-20 13:04:34 +00:00

72 lines
3.3 KiB
ReStructuredText

Improving reliability of mirrors-countme scripts
================================================
Notes on curent deployment
--------------------------
For investigating and deployment, you need to be the member of sysadmin-analysis.
The repo that has the code is on https://pagure.io/mirrors-countme/
The deployment configuration is stored in ansible repo, run through playbook
playbooks/groups/logserver.yml, mostly in role roles/web-data-analysis.
The scripts are running on log01.iad2.fedoraproject.org. If you are a member of
sysadmin-analysis, you should be able to ssh, and have root there.
There are several cron jobs responsible for running the scripts:
- syncHttpLogs in /etc/cron.daily/ rsync logs to
/var/log/hosts/$HOST/$YEAR/$MONTH/$DAY/http
- combineHttp - in /etc/cron.d/ every day at 6, runs /usr/local/bin/combineHttpLogs.sh
combines logs from /var/log/hosts to /mnt/fedora_stats/combined-http based on the
project. We are using /usr/share/awstats/tools/logresolvemerge.pl and I am not
sure we are using it correctly
- condense-mirrorlogs - in /etc/cron.d/ every day at 6, does some sort of analysis,
posibly one of the older scripts. It seems to attempt to sort the logs again.
- countme-update - in /etc/cron.d/ every day at 9, runs two scripts,
countme-update-rawdb.sh that parses the logs and fills in the raw database and
countme-update-totals.sh that uses the rawdb to calculate the statistics The
results of countme-update-totals.sh are then copied to a web-folder to make it
available at https://data-analysis.fedoraproject.org/csv-reports/countme/
Notes on avenues of improvement
-------------------------------
We have several areas we need to improve:
- downloading and syncing the logs, sometimes can fail or hang.
- problems when combining them
- instalation of the scripts, as there has been problem with updates, and currently we
are doing just a pull of the git repo and running the pip install
Notes on replacing with off-the shelf solutions
-----------------------------------------------
As the raw data we are basing our staticis on are just the access-logs from our
proxy-servers, we could be able to find an off-the shelf solution, that could replace
our brittle scripts.
There are two solutions that psesent themselves, ELK stack and Loki and Promtail by
Grafana.
We are already running ELK stack on our openshift, but our experience so far is that
Elastic Search has even more brittle deployment.
We did some experiments with Loki. The technology seems promissing, as it is much more
simple than ELK stack, with size looking comparable to the raw logs.
Moreover, promtail that does the parsing and uploading of logs has facilities to both
add labels to loglies that will then be indexed and queriable in the database and
collect statistics from the loglines directly that can be gathered by prometheus.
You can query the logs with language simmilar to GraphQL.
We are not going to use it because:
- it doesn't deal well with historical data, so any attempts at initial import of
logsare pain.
- using promtail enabled metrics wouldn't help us with double-counting of people hitting
different proxy servers
- configuration is fiddly and tricky to test
- changing batch-process to soft-realtime sounds like a headache