Matthew Miller found that running the awk script over multiple days
caused newer releases than F33 would go up forever. This fix should
zero out all the new variables.
Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
When a corrupt date is found in the log file, I have the program
default of 1970-01-01 but because there is a lookup used it needed to
be 1970-Jan-01 which would then get replaced by 1970-01-01.
Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
This prevents sending out unnecessary mails when run from the related
cron job:
condense-mirrorlogs.cron
-> condense-mirrorlogs.sh > /dev/null
-> mirrorlist.py
Additionally, report the failing file name in the case of an error.
Signed-off-by: Nils Philippsen <nils@redhat.com>
Previously, the script was very talkative by default. Make the default
to be silent for log levels < WARNING and allow logging (at different
level) to syslog. Additionally, configure the cronjob to log everything
of levels >= INFO to syslog.
Signed-off-by: Nils Philippsen <nils@redhat.com>
This also ensures that the previously cloned git repository and local
installation of the Python package and associated scripts are removed.
Signed-off-by: Nils Philippsen <nils@redhat.com>
This lets users of simple_message_to_bus predefine items which should be
present in all message bodies this way:
export MSGBODY_PRESET="key1=value1 key2=value2"
This doesn't work with spaces in either keys or values, any quotation
will be used verbatim.
Signed-off-by: Nils Philippsen <nils@redhat.com>
Callers of simple_message_to_bus need to set and export MSGTOPIC_PREFIX
explicitly.
This decouples the fedora-messaging-utils and web-data-analysis roles.
Additionally, don't assume /bin/sh is /bin/bash.
Signed-off-by: Nils Philippsen <nils@redhat.com>
This should prevent race conditions of the form that logs are attempted
to be combined while syncing those of individual hosts hasn't finished.
Signed-off-by: Nils Philippsen <nils@redhat.com>
This is to enable running the syncing and combining scripts in
series rather than from independently scheduled cron jobs.
Signed-off-by: Nils Philippsen <nils@redhat.com>
Importing the role rather than listing it in the playbook lets its tasks
have the tags used in the importing role, i.e. should ensure they are
run when the things that need simple_message_to_bus are installed.
Additionally, don't attempt to install it manually from
web-data-analysis (it isn't found because it lives in a different role).
Signed-off-by: Nils Philippsen <nils@redhat.com>
When renaming a file over another which is the same hard link, the
rename is a no-op. This left many temporary files in /var/log/hosts
because a file is attempted to be synced (and thus hard-linked between
dated and undated file names) over a couple of days. The solution to
this is how the `ln` command does it: rename, then unlink the temporary
file.
Signed-off-by: Nils Philippsen <nils@redhat.com>
The previous one synced all hosts serially and ran rsync for each log
file. This reimplements the shell script in Python, with these changes:
- Run rsync on whole directories of log files, with much reduced
overhead.
- Use a pool of five workers which process hosts in parallel.
Additionally, remove download-rdu01.vpn.fedoraproject.org from the list
of synced hosts.
Signed-off-by: Nils Philippsen <nils@redhat.com>
Right now countme-update.sh tries to `git commit -a` whether or not
anything has changed, which results in this output whenever there's no
new changes to commit:
On branch master
Untracked files:
(use "git add <file>..." to include in what will be committed)
raw.db
totals.db
nothing added to commit but untracked files present (use "git add" to track)
This commit tweaks `countme-update.sh` so that it only attempts `git commit`
if there are changes to be committed - i.e. when `git diff` returns 1.
Signed-off-by: Will Woods <wwoods@redhat.com>
So it turns out that pip3 installs scripts to /usr/local/bin and cron
jobs don't have /usr/local/bin in the path.
This commit adds /usr/local/bin to PATH in countme-update.sh.
For Maximum Correctness we should probably get pip to tell us where it
installed countme-update-{rawdb,totals}.sh but this'll work just fine
as long as pip keeps installing scripts to /usr/bin or /usr/local/bin.
Signed-off-by: Will Woods <wwoods@redhat.com>
This gives the web-data-analysis `countme` user a .gitconfig file so the
commits it makes in its local git repo have a proper user name and
email address. (Also it makes git stop complaining..)
The email address might not actually be valid, but this repo doesn't
currently go anywhere public so it shouldn't really matter.
This should automate running the "countme" scripts every day to parse
new log data and publish updated totals.
Here's what I've added to the ansible role:
* install package deps for `mirrors-countme`
* make "countme" user with home /srv/countme
* clone 'prod' branch of https://pagure.io/mirrors-countme to /srv/countme
* if changed: pip install /srv/countme/mirrors-countme
* make web subdir /var/www/html/csv-reports/countme
* make local data dir /var/lib/countme
* install `countme-update.sh` to /usr/local/bin
* install `countme-update.cron` to /etc/cron.d
* runs /usr/local/bin/countme-update.sh daily, as user `countme`
That should make sure `countme-update.sh` runs every day.
That script works like this:
1. Run `countme-update-rawdb.sh`
* parse new mirrors.fp.o logs in /var/log/hosts/proxy*
* write data to /var/lib/countme/raw.db
2. Run `countme-update-totals.sh`
* parse raw data from /var/lib/countme/raw.db
* write updated totals to /var/lib/countme/totals.{db,csv}
3. Track changes in updated totals
* set up /var/lib/countme as git repo (if needed)
* commit new `totals.csv` (if changed)
4. Make updated totals public
* Copy totals.{db,csv} to /var/www/html/csv-reports/countme
For safety's sake, I've tried to set up everything so it runs as the
`countme` user rather than running everything as `root`. This might be
an unnecessary complication but it seemed like the right thing to do.
Similarly, keeping totals.csv in a git repo isn't _required_, but it
seemed like a good idea to keep historical records in case we want/need
to change the counting algorithm or something.
I checked the YAML with ansible-lint and tested that all the scripts
work as expected when run as `wwoods`, so unless I've missed something
this should do the trick.