diff --git a/docs/monitoring_metrics/zabbix.rst b/docs/monitoring_metrics/zabbix.rst new file mode 100644 index 0000000..944a1d6 --- /dev/null +++ b/docs/monitoring_metrics/zabbix.rst @@ -0,0 +1,38 @@ +Monitoring / Metrics with Prometheus +======================== + +We are using Zabbix 5.0 (lts) server with PostgreSQL database. Starting with manual configuration in a test vm and then automating it for for deployment, Ansible roles `zabbix-server` and `zabbix-agent` are to results of this PoC work. +Please follow FAQ to see how to access staging deployment of zabbix. + +zabbix-server +------------- + +This role is ready at the base level but as the complexity of the monitoring +increases, more work would be needed. At the current level, it + + * Installs needed packages for server + * configure zabbix, apache and PostgreSQL configuration files + * configures web UI + * configures kerberos authentication + +While these basic things are good for POC, they are not ready to be in +production until we have configured the following + + * add inventory files for groups and users and have zabbix-cli restore those + in case of a fresh installation + * Network config audit (see common challenges) + +zabbix-agent +------------ + +This role is ready to be used and existing templates are good to gather basic +information. Though specific of what kind of common data would be collected +from all agent nodes needs to be discussed widely and set in template. +Other than common metrics, one can also export custom metrics using +zabbix-sender (see FAQ). + + +Common challenges +----------------- +Lack of experience in selinux policies and network configuration, we are not +very confident with those. A veteran sysadmin would be needed audit.