This also ensures that the previously cloned git repository and local
installation of the Python package and associated scripts are removed.
Signed-off-by: Nils Philippsen <nils@redhat.com>
This should prevent race conditions of the form that logs are attempted
to be combined while syncing those of individual hosts hasn't finished.
Signed-off-by: Nils Philippsen <nils@redhat.com>
This is to enable running the syncing and combining scripts in
series rather than from independently scheduled cron jobs.
Signed-off-by: Nils Philippsen <nils@redhat.com>
Importing the role rather than listing it in the playbook lets its tasks
have the tags used in the importing role, i.e. should ensure they are
run when the things that need simple_message_to_bus are installed.
Additionally, don't attempt to install it manually from
web-data-analysis (it isn't found because it lives in a different role).
Signed-off-by: Nils Philippsen <nils@redhat.com>
The previous one synced all hosts serially and ran rsync for each log
file. This reimplements the shell script in Python, with these changes:
- Run rsync on whole directories of log files, with much reduced
overhead.
- Use a pool of five workers which process hosts in parallel.
Additionally, remove download-rdu01.vpn.fedoraproject.org from the list
of synced hosts.
Signed-off-by: Nils Philippsen <nils@redhat.com>
This gives the web-data-analysis `countme` user a .gitconfig file so the
commits it makes in its local git repo have a proper user name and
email address. (Also it makes git stop complaining..)
The email address might not actually be valid, but this repo doesn't
currently go anywhere public so it shouldn't really matter.
This should automate running the "countme" scripts every day to parse
new log data and publish updated totals.
Here's what I've added to the ansible role:
* install package deps for `mirrors-countme`
* make "countme" user with home /srv/countme
* clone 'prod' branch of https://pagure.io/mirrors-countme to /srv/countme
* if changed: pip install /srv/countme/mirrors-countme
* make web subdir /var/www/html/csv-reports/countme
* make local data dir /var/lib/countme
* install `countme-update.sh` to /usr/local/bin
* install `countme-update.cron` to /etc/cron.d
* runs /usr/local/bin/countme-update.sh daily, as user `countme`
That should make sure `countme-update.sh` runs every day.
That script works like this:
1. Run `countme-update-rawdb.sh`
* parse new mirrors.fp.o logs in /var/log/hosts/proxy*
* write data to /var/lib/countme/raw.db
2. Run `countme-update-totals.sh`
* parse raw data from /var/lib/countme/raw.db
* write updated totals to /var/lib/countme/totals.{db,csv}
3. Track changes in updated totals
* set up /var/lib/countme as git repo (if needed)
* commit new `totals.csv` (if changed)
4. Make updated totals public
* Copy totals.{db,csv} to /var/www/html/csv-reports/countme
For safety's sake, I've tried to set up everything so it runs as the
`countme` user rather than running everything as `root`. This might be
an unnecessary complication but it seemed like the right thing to do.
Similarly, keeping totals.csv in a git repo isn't _required_, but it
seemed like a good idea to keep historical records in case we want/need
to change the counting algorithm or something.
I checked the YAML with ansible-lint and tested that all the scripts
work as expected when run as `wwoods`, so unless I've missed something
this should do the trick.