72 lines
3.3 KiB
ReStructuredText
72 lines
3.3 KiB
ReStructuredText
Improving reliability of mirrors-countme scripts
|
|
================================================
|
|
|
|
Notes on curent deployment
|
|
--------------------------
|
|
|
|
For investigating and deployment, you need to be the member of sysadmin-analysis.
|
|
|
|
The repo that has the code is on https://pagure.io/mirrors-countme/
|
|
|
|
The deployment configuration is stored in ansible repo, run through playbook
|
|
playbooks/groups/logserver.yml, mostly in role roles/web-data-analysis.
|
|
|
|
The scripts are running on log01.iad2.fedoraproject.org. If you are a member of
|
|
sysadmin-analysis, you should be able to ssh, and have root there.
|
|
|
|
There are several cron jobs responsible for running the scripts:
|
|
|
|
- syncHttpLogs in /etc/cron.daily/ rsync logs to
|
|
/var/log/hosts/$HOST/$YEAR/$MONTH/$DAY/http
|
|
- combineHttp - in /etc/cron.d/ every day at 6, runs /usr/local/bin/combineHttpLogs.sh
|
|
combines logs from /var/log/hosts to /mnt/fedora_stats/combined-http based on the
|
|
project. We are using /usr/share/awstats/tools/logresolvemerge.pl and I am not
|
|
sure we are using it correctly
|
|
- condense-mirrorlogs - in /etc/cron.d/ every day at 6, does some sort of analysis,
|
|
posibly one of the older scripts. It seems to attempt to sort the logs again.
|
|
- countme-update - in /etc/cron.d/ every day at 9, runs two scripts,
|
|
countme-update-rawdb.sh that parses the logs and fills in the raw database and
|
|
countme-update-totals.sh that uses the rawdb to calculate the statistics The
|
|
results of countme-update-totals.sh are then copied to a web-folder to make it
|
|
available at https://data-analysis.fedoraproject.org/csv-reports/countme/
|
|
|
|
Notes on avenues of improvement
|
|
-------------------------------
|
|
|
|
We have several areas we need to improve:
|
|
|
|
- downloading and syncing the logs, sometimes can fail or hang.
|
|
- problems when combining them
|
|
- instalation of the scripts, as there has been problem with updates, and currently we
|
|
are doing just a pull of the git repo and running the pip install
|
|
|
|
Notes on replacing with off-the shelf solutions
|
|
-----------------------------------------------
|
|
|
|
As the raw data we are basing our staticis on are just the access-logs from our
|
|
proxy-servers, we could be able to find an off-the shelf solution, that could replace
|
|
our brittle scripts.
|
|
|
|
There are two solutions that psesent themselves, ELK stack and Loki and Promtail by
|
|
Grafana.
|
|
|
|
We are already running ELK stack on our openshift, but our experience so far is that
|
|
Elastic Search has even more brittle deployment.
|
|
|
|
We did some experiments with Loki. The technology seems promissing, as it is much more
|
|
simple than ELK stack, with size looking comparable to the raw logs.
|
|
|
|
Moreover, promtail that does the parsing and uploading of logs has facilities to both
|
|
add labels to loglies that will then be indexed and queriable in the database and
|
|
collect statistics from the loglines directly that can be gathered by prometheus.
|
|
|
|
You can query the logs with language simmilar to GraphQL.
|
|
|
|
We are not going to use it because:
|
|
|
|
- it doesn't deal well with historical data, so any attempts at initial import of
|
|
logsare pain.
|
|
- using promtail enabled metrics wouldn't help us with double-counting of people hitting
|
|
different proxy servers
|
|
- configuration is fiddly and tricky to test
|
|
- changing batch-process to soft-realtime sounds like a headache
|