341 lines
12 KiB
Text
341 lines
12 KiB
Text
= DNF Counting
|
||
|
||
We use DNF Counting to get statistics about the number of Fedora
|
||
installations.
|
||
|
||
== Contact Information
|
||
|
||
Owner::
|
||
Fedora Infrastructure Team
|
||
Contact::
|
||
#fedora-admin, #fedora-noc,
|
||
admin@fedoraproject.org
|
||
Servers::
|
||
log01, proxy0*
|
||
Purpose::
|
||
Give interested parties information about the number of Fedora
|
||
installations.
|
||
Repositories::
|
||
* https://github.com/fedora-infra/mirrors-countme
|
||
* https://pagure.io/fedora-infra/ansible/blob/main/f/roles/web-data-analysis
|
||
|
||
== What it is
|
||
|
||
DNF Counting is a way for us to gather statistics about the number of Fedora
|
||
installations, differentiated by version, spin, etc. On the infrastructure
|
||
side this is implemented by a bunch of scripts and a Python package
|
||
(`mirrors-countme`).
|
||
|
||
== Scope
|
||
|
||
This SOP concerns itself with the infrastructure side of the equation. For any
|
||
issues with the various frontends logging in to be counted (DNF, PackageKit,
|
||
…), contact their respective maintainers or upstreams.
|
||
|
||
== How it works
|
||
|
||
Clients (DNF, PackageKit, …) have been modified so they add a `countme`
|
||
variable in their requests to `mirrors.fedoraproject.org` once a week. This
|
||
ends up in our webserver log data which lets us generate usage statistics.
|
||
|
||
Cron jobs are set up on `log01` which collect http log files from the various
|
||
web proxies, combine them (accesses to different backend services including
|
||
`mirrors.fedoraproject.org` are scattered across the proxy logs), and produce
|
||
statistics from them. The various pieces live in a) the `mirrors-countme`
|
||
project (Python package and related scripts to generate statistics from the
|
||
log data) and b) shell scripts in the `web-data-analysis` role in Ansible:
|
||
|
||
* `sync-http-logs.py` (Ansible) syncs individual log files from various hosts
|
||
including proxies to `log01`.
|
||
* `combineHttpLogs.sh` (Ansible) combines the logs for the different web sites
|
||
which are scattered across the proxy hosts.
|
||
* `condense-mirrorlogs.sh` & `mirrorlist.py` (Ansible) extracts hosts from the
|
||
combined log data.
|
||
* `countme-update.sh` (Ansible) drives `countme-update-rawdb.sh` &
|
||
`countme-update-totals.sh` (`mirrors-countme`) which generates statistics.
|
||
* `countme-trim-raw` (`mirrors-countme`) to trim the intermediary database file
|
||
(``raw.db``).
|
||
|
||
== Changes implemented in the Q2/2023 DNF mirrors-countme initiative
|
||
|
||
* The “traditional“ statistics which were done before DNF
|
||
learned about the `countme` variable were reimplemented: Count any
|
||
individual IP sighted, no matter if with or without `countme`. This is
|
||
necessary to count systems which don’t have that feature in their DNF or
|
||
YUM, and – while giving different numbers – gives us an idea how things
|
||
develop when compared to the same numbers for more modern OSes.
|
||
* The ``countme-trim-raw`` tool was implemented, to trim the intermediary
|
||
database ``raw.db`` which contains necessary information gleamed from
|
||
parsing the merged log files. This database grows steadily and – with the
|
||
brought back counting of any individual IP sighted – quickly, so once these
|
||
data have been safely turned into the final statistics, we wanted a way to
|
||
remove them so that the local volume were it is stored doesn’t fill up
|
||
completely.
|
||
* The project repository was cleaned up, i.e. large data files used in
|
||
integration tests were removed because they made cloning the repository
|
||
unnecessarily slow, for a couple hundred KB of code, the repo was more than
|
||
300 MB in size. In the context, the repository was moved from Pagure to
|
||
GitHub.
|
||
* Unused code was removed, the remaining code was refactored and condensed
|
||
to remove redundancies and comprehensive unit tests were added so that the
|
||
barrier to contributing is lower and changes are less risky.
|
||
|
||
== Changes implemented in the Q3/2021 DNF Counting Initiative
|
||
|
||
During the Q3/2021 DNF Counting Initiative, a number of changes were
|
||
implemented which improved the DNF Counting backend in the areas of monitoring
|
||
& debugging, performance & robustness.
|
||
|
||
* The involved scripts send messages about state changes and errors to the
|
||
fedora-messaging bus. State changes are e.g. start and finish of a complete
|
||
script or of its individual steps.
|
||
* The shell script which syncs log files from various hosts to `log01`
|
||
(`syncHttpLogs.sh`) was reimplemented in Python (as `sync-http-logs.py`), with
|
||
several improvements which reduced the time it takes for syncing from 6-7
|
||
hours to little more than 30 minutes per day:
|
||
** All log files for one date of one host are synced in one call to `rsync`.
|
||
This greatly reduces overhead.
|
||
+
|
||
The reason to sync these files one-by-one previously was because `rsync` only
|
||
allows differing file names when syncing single files, which we have: the log
|
||
files on the hosts contain their date in the name, on `log01` they don't but
|
||
are stored in directories for each date.
|
||
+
|
||
To overcome this limitation, `sync-http-logs.py` maintains a shadow structure
|
||
of hard links with dates in their names, and `rsync` operates on this
|
||
structure instead, which are linked back to "date-less" file names afterwards
|
||
for further processing.
|
||
** Because syncing log files from some hosts is pretty slow, several hosts are
|
||
synced in parallel.
|
||
* Previously, `syncHttpLogs.sh` and `combineHttpLogs.sh` were run from
|
||
individual cron jobs which were set to run a couple of hours apart.
|
||
Sometimes, this caused problems because the former wasn't finished when the
|
||
latter started to run (i.e. a race condition). Now, `sync-http-logs.py` and
|
||
`combineHttpLogs.sh` are run from one cron job to avoid this.
|
||
* Previously, the scripts where scattered across the `web-data-analysis`,
|
||
`awstats` and `base` roles. All of the deployment has been consolidated into
|
||
the `web-data-analysis` role, `awstats` has been removed.
|
||
* The `mirrors-countme` Python package and scripts are packaged as RPM
|
||
packages in Fedora, previously they were deployed from a local clone of the
|
||
upstream git repository.
|
||
|
||
== Reboot me
|
||
|
||
Yes, just reboot. Or don't. There are no continuously running services,
|
||
everything is regularly run as cronjobs.
|
||
|
||
== Logs
|
||
|
||
The `sync-http-logs.py` script sends relatively verbose output to syslog.
|
||
Other than that, the closest anything comes to logs are mails sent if cronjobs
|
||
produce (error) output and messages sent to the bus.
|
||
|
||
== First steps to debug
|
||
|
||
The scripts send messages with a topic prefix of `logging.stats` to the bus,
|
||
in various stages of their operation. If anything doesn't work as it should,
|
||
review if every step started is also finished, compare run times between days.
|
||
|
||
If anything crashes, cron should have sent mails to the recipients configured
|
||
(at least `root@fedoraproject.org`), which could also contain valuable
|
||
information.
|
||
|
||
== Ephemeral data
|
||
|
||
Generated CSV reports and images are in `/var/www/html/csv-reports` which are
|
||
exposed on https://data-analysis.fedoraproject.org/ – but they get regenerated
|
||
with every cycle of the scripts that is run.
|
||
|
||
== Persistent data
|
||
|
||
All combined http log data is kept on the `/fedora_stats` NFS share. Log
|
||
files from the proxy hosts are synced to `/var/log/hosts/<hostname>` locally,
|
||
but these are just copies of what exists elsewhere already.
|
||
|
||
== Other operational considerations
|
||
|
||
The scripts only process data from the previous three days (roughly). If they
|
||
don't run for a longer time, there might be gaps in the generated statistics
|
||
which can be plugged by temporarily adjusting the respective settings in the
|
||
scripts and re-running them.
|
||
|
||
== Where are the docs?
|
||
|
||
Here :) and at https://github.com/fedora-infra/mirrors-countme
|
||
|
||
== Is there data that needs to be backed up?
|
||
|
||
Yes, but it's on the `/fedora_stats` file share, so it's assumed to get backed
|
||
up regularly already.
|
||
|
||
== Upgrading
|
||
|
||
=== `mirrors-countme`
|
||
|
||
The `mirrors-countme` shell and Python scripts create statistics from the
|
||
already combined log data.
|
||
|
||
==== Making upstream changes available
|
||
|
||
Prerequisites: A change (bug fix or feature) is available in the `main`
|
||
branch of `mirrors-countme`.
|
||
|
||
. Publish an upstream release
|
||
+
|
||
From a clone of the upstream repository:
|
||
+
|
||
.. In `pyproject.toml`, bump `tool.poetry.version` (e.g. to `0.1.2`) and
|
||
commit the change, e.g.:
|
||
+
|
||
....
|
||
git commit -s -m "Version 0.1.2" -- pyproject.toml
|
||
....
|
||
.. Tag the previous change with a GPG-signed tag:
|
||
+
|
||
....
|
||
git tag -s 0.1.2
|
||
....
|
||
.. Push both the change and the tag:
|
||
+
|
||
....
|
||
git push origin main 0.1.2
|
||
....
|
||
.. Create a source tarball (this will be created as e.g.
|
||
`dist/mirrors_countme-0.1.2.tar.gz`):
|
||
+
|
||
....
|
||
poetry build
|
||
....
|
||
From the https://github.com/fedora-infra/mirrors-countme/tags[list of tags],
|
||
select “Create release” in the menu for the respective tag, and attach the
|
||
created tarball and wheel files to the created release.
|
||
. Update and Build the `python-mirrors-countme` Fedora Package
|
||
+
|
||
From a clone of the Fedora package repository, in the `rawhide` branch:
|
||
+
|
||
.. Bump the version in `python-mirrors-countme.spec`. No other changes
|
||
are necessary, the packages uses automatic release fields and changelog.
|
||
+
|
||
.. Download the source tarball, either manually or one of:
|
||
+
|
||
....
|
||
spectool -g python-mirrors-countme.spec
|
||
....
|
||
+
|
||
....
|
||
rpmspectool get python-mirrors-countme.spec
|
||
....
|
||
.. Upload the source tarball to the lookaside cache:
|
||
+
|
||
....
|
||
fedpkg new-sources mirrors_countme-0.1.2.tar.gz
|
||
....
|
||
.. Commit the changes to the repository, e.g.:
|
||
+
|
||
....
|
||
git commit -s -m "Version 0.1.2" -- python-mirrors-countme.spec
|
||
....
|
||
.. Push the changes and build:
|
||
+
|
||
....
|
||
git push && fedpkg build
|
||
....
|
||
.. For any other active Fedora and EPEL branch, fast forward them to the
|
||
state of the `rawhide` branch, push and build, e.g.:
|
||
+
|
||
....
|
||
git checkout epel8 \
|
||
&& git merge --ff-only rawhide \
|
||
&& git push \
|
||
&& fedpkg build
|
||
....
|
||
. Submit Fedora/EPEL Package Updates
|
||
+
|
||
Either submit the update via the
|
||
https://bodhi.fedoraproject.org/updates/new[Bodhi web interface], or
|
||
from the command line in the respective checked out Fedora or EPEL
|
||
branch, e.g.:
|
||
+
|
||
....
|
||
fedpkg update --type bugfix --notes 'Put in some notes!'
|
||
....
|
||
. Tag with Infra-Tags in Koji
|
||
.. Tag the build into the respective infra candidate tag in Koji, e.g.:
|
||
+
|
||
....
|
||
koji tag-build epel8-infra-candidate
|
||
....
|
||
.. Check that the build was picked up and signed (this should take no
|
||
more than a few minutes), e.g.:
|
||
+
|
||
....
|
||
koji buildinfo python-mirrors-countme-0.1.2-1.el8
|
||
....
|
||
+
|
||
The build must be tagged with the corresponding `*-infra-stg` tag.
|
||
.. Tag the build into the respective infra production tag in Koji, e.g.:
|
||
+
|
||
....
|
||
koji tag-build epel8-infra
|
||
....
|
||
|
||
When the respective infra tag repository is updated, the new version
|
||
should be ready to be installed/updated in our infrastructure.
|
||
|
||
=== Other scripts
|
||
|
||
Scripts other than what is contained in `mirrors-countme` live in the
|
||
`web-data-analysis` role in Ansible. Simply "upgrade" them in place.
|
||
|
||
=== Deployment of updates
|
||
|
||
To deploy updated scripts, etc. from the Ansible repository, simply run the
|
||
`groups/logging.yaml` playbook.
|
||
|
||
To update `mirrors-countme`, run the `manual/update-packages.yml` playbook
|
||
with `--extra-vars="package='*mirrors-countme*'"` set.
|
||
|
||
== Related applications
|
||
|
||
The scripts send out status messages over `fedora-messaging` with a topic
|
||
prefix of `logging.stats`.
|
||
|
||
== How is it deployed?
|
||
|
||
All of this runs on `log01.rdu3.fedoraproject.org` and is deployed through the
|
||
`web-data-analysis` role and the `groups/logserver.yml` playbook,
|
||
respectively.
|
||
|
||
The `mirrors-countme` upstream project publishes source tarballs to their
|
||
corresponding releases in the repository on GitHub:
|
||
|
||
https://github.com/fedora-infra/mirrors-countme/releases
|
||
|
||
These are packaged in Fedora as the `python-mirrors-countme` (SRPM) and
|
||
`python3-mirrors-countme` (RPM) packages.
|
||
|
||
Other scripts are located directly in the Fedora Infrastructure Ansible
|
||
repository, in the `web-data-analysis` role.
|
||
|
||
== Does it have any special requirements?
|
||
|
||
No.
|
||
|
||
== Are there any security requirements?
|
||
|
||
The same as anything else that deals with log data.
|
||
|
||
== Bug reports
|
||
|
||
Report bugs with `mirrors-countme` at its upstream project:
|
||
|
||
https://github.com/fedora-infra/mirrors-countme/issues/new
|
||
|
||
Anything concerning the cron jobs or other scripts should probably go into our
|
||
Infrastructure tracker:
|
||
|
||
https://pagure.io/fedora-infrastructure/new_issue
|
||
|
||
== Are there any GDPR related concerns? Mechanisms to deal with PII?
|
||
|
||
The same as anything else that deals with log data.
|