62 lines
2.2 KiB
ReStructuredText
62 lines
2.2 KiB
ReStructuredText
Monitoring / Metrics
|
|
====================
|
|
|
|
As an ARC team initiative we want to investigate Prometheus and Zabbix as our new
|
|
monitoring and metrics solutions, by:
|
|
|
|
- Installing Zabbix server in a VM, and hooking up the staging dist-git to it with
|
|
an agent
|
|
- Installing Prometheus in our Open Shift and collecting metrics for a selected
|
|
project in a self-service fashion
|
|
|
|
Prior POCs/deployments
|
|
----------------------
|
|
|
|
Fabian Arrotin deployed and utilizes zabbix in centos infrastructure.
|
|
- https://github.com/CentOS/ansible-role-zabbix-server
|
|
|
|
Adam Saleh has deployed a POC prometheus deployment for CoreOS team.
|
|
- https://pagure.io/centos-infra/issue/112
|
|
|
|
David Kirwan was part of the development team of
|
|
https://github.com/integr8ly/application-monitoring-operator/ and did some POC around
|
|
prometheus push-gateway in centos openshift
|
|
|
|
Investigation
|
|
-------------
|
|
|
|
In process we want to be able to answer the questions posed in the latest mailing thread
|
|
and by the end have a setup that can lead directly into mirating us away from nagios.
|
|
The questions (mostly from Kevin):
|
|
|
|
- How can we provision both of them automatically from ansible?
|
|
- Can we get zabbix to pull from prometheus?
|
|
- Can zabbix handle our number of machines?
|
|
- How flexible is the alerting?
|
|
|
|
Main takeaway
|
|
-------------
|
|
|
|
We managed to create proof-of-concept monitoring solutions with both prometheus and
|
|
zabbix.
|
|
|
|
The initial configuration has proven to have more pitfals than expected, with Prometheus
|
|
especially in the integration with openshift and its other auxialiary services, and with
|
|
Zabbix espcially with correctly setting up the ip-tables and network permissions, and
|
|
with configuring a reasonable setup for the user-access and user-account management.
|
|
|
|
Even despite these setbacks, we still feel this would be an improvement over our current
|
|
setup based on Nagios.
|
|
|
|
To get a basic overview of Prometheus, you can watch this short tech-talk by Adam Saleh:
|
|
(accessible only to RedHat)
|
|
https://drive.google.com/file/d/1-uEIkS2jaJ2b8V_4y-AKW1J6sdZzzlc9/view or read up the
|
|
more indepth report in the relevant sections of this documentation.
|
|
|
|
.. toctree::
|
|
:maxdepth: 1
|
|
|
|
prometheus_for_ops
|
|
prometheus_for_dev
|
|
zabbix
|
|
faq
|