Notes on Prometheus research.

This commit is contained in:
Adam Saleh 2021-04-14 11:46:27 +02:00
parent 7efca65b49
commit ed508e7b9b
3 changed files with 148 additions and 34 deletions

View file

@ -28,10 +28,26 @@ In process we want to be able to answer the questions posed in the latest mailin
- Can zabbix handle our number of machines?
- How flexible is the alerting?
Main takeaway
-------------
We managed to create proof-of-concept monitoring solutions with both prometheus and zabbix.
The initial configuration has proven to have more pitfals than expected,
with Prometheus especially in the integration with openshift and its other auxialiary services,
and with Zabbix espcially with correctly setting up the ip-tables and network permissions,
and with configuring a reasonable setup for the user-access and user-account management.
Even despite these setbacks, we still feel this would be an improvement over our current setup based on Nagios.
To get a basic overview of Prometheus, you can watch this short tech-talk by Adam Saleh:
(accessible only to RedHat) https://drive.google.com/file/d/1-uEIkS2jaJ2b8V_4y-AKW1J6sdZzzlc9/view
or read up the more indepth report in the relevant sections of this documentation.
.. toctree::
:maxdepth: 1
prometheus
prometheus_for_ops
prometheus_for_dev
faq