arc/docs/monitoring_metrics/zabbix.rst

Monitoring / Metrics with Prometheus
========================

We are using Zabbix 5.0 (lts) server with PostgreSQL database. Starting with manual configuration in a test vm and then automating it for for deployment, Ansible roles `zabbix-server` and `zabbix-agent` are to results of this PoC work.
Please follow FAQ to see how to access staging deployment of zabbix.

zabbix-server
-------------

This role is ready at the base level but as the complexity of the monitoring
increases, more work would be needed. At the current level, it

  * Installs needed packages for server
  * configure zabbix, apache and PostgreSQL configuration files
  * configures web UI
  * configures kerberos authentication

While these basic things are good for POC, they are not ready to be in
production until we have configured the following

  * add inventory files for groups and users and have zabbix-cli restore those
    in case of a fresh installation
  * Network config audit (see common challenges)

zabbix-agent
------------

This role is ready to be used and existing templates are good to gather basic
information. Though specific of what kind of common data would be collected
from all agent nodes needs to be discussed widely and set in template.
Other than common metrics, one can also export custom metrics using
zabbix-sender (see FAQ).


Common challenges
-----------------
Lack of experience in selinux policies and network configuration, we are not
very confident with those. A veteran sysadmin would be needed audit.