fix parsing errors and sphinx warnings
Signed-off-by: Ryan Lerch <rlerch@redhat.com>
This commit is contained in:
parent
8fb9b2fdf0
commit
ba720c3d77
98 changed files with 4799 additions and 4788 deletions
|
@ -1,34 +1,41 @@
|
|||
Frequently Asked Questions
|
||||
==========================
|
||||
|
||||
Here are a list of questions and answers that should help you get start with
|
||||
monitoring with zabbix and prometheus.
|
||||
Here are a list of questions and answers that should help you get start with monitoring
|
||||
with zabbix and prometheus.
|
||||
|
||||
How do I access zabbix?
|
||||
-----------------------
|
||||
|
||||
1. First obtain Kerberos ticket with kinit:
|
||||
::
|
||||
|
||||
.. code-block::
|
||||
|
||||
$ kinit myusername@FEDORAPROJECT.ORG
|
||||
Password for myusername@FEDORAPROJECT.ORG:
|
||||
|
||||
2. Login to https://zabbix.stg.fedoraproject.org/zabbix.php?action=dashboard.view to see dashboard
|
||||
|
||||
3. If you need to be added in special privilege group (to see specific systems metrics), Open a PR in <path-to-inventory> with your FAS id in the list under the group and ask sysadmin of that groups to +1.
|
||||
|
||||
2. Login to https://zabbix.stg.fedoraproject.org/zabbix.php?action=dashboard.view to see
|
||||
dashboard
|
||||
3. If you need to be added in special privilege group (to see specific systems metrics),
|
||||
Open a PR in <path-to-inventory> with your FAS id in the list under the group and ask
|
||||
sysadmin of that groups to +1.
|
||||
|
||||
How do I access zabbix when I'm a community member?
|
||||
---------------------------------------------------
|
||||
|
||||
1. First obtain Kerberos ticket with kinit:
|
||||
::
|
||||
|
||||
.. code-block::
|
||||
|
||||
$ kinit myusername@FEDORAPROJECT.ORG
|
||||
Password for myusername@FEDORAPROJECT.ORG:
|
||||
|
||||
2. Login to https://zabbix.stg.fedoraproject.org/zabbix.php?action=dashboard.view to see guest/public dashboard
|
||||
2. Login to https://zabbix.stg.fedoraproject.org/zabbix.php?action=dashboard.view to see
|
||||
guest/public dashboard
|
||||
|
||||
How do I access Prometheus?
|
||||
---------------------------
|
||||
|
||||
Prometheus is running in the application monitoring namespace, standard routing applies,
|
||||
i.e.: https://prometheus-route-application-monitoring.app.os.stg.fedoraproject.org/graph
|
||||
|
||||
|
@ -36,7 +43,9 @@ To access it you need to have account in the openshift it is running in.
|
|||
|
||||
How do I access Prometheus when I'm a community member?
|
||||
-------------------------------------------------------
|
||||
You shouldn't access prometheus directly, unless you are maintaining an application in openshift.
|
||||
|
||||
You shouldn't access prometheus directly, unless you are maintaining an application in
|
||||
openshift.
|
||||
|
||||
Data from prometheus can be exported and viewed in Grafana or Zabbix, meaning we can
|
||||
give access to a more limited public view through dashboards in one of these.
|
||||
|
@ -44,114 +53,131 @@ give access to a more limited public view through dashboards in one of these.
|
|||
Do you have a 5 minutes guide on how to use prometheus?
|
||||
-------------------------------------------------------
|
||||
|
||||
In other words, do you have some how-tos/links I should read to understand/get
|
||||
started with prometheus?
|
||||
In other words, do you have some how-tos/links I should read to understand/get started
|
||||
with prometheus?
|
||||
|
||||
* quick introduction to the stack we are running: https://www.youtube.com/watch?v=-37OPXXhrTw
|
||||
* to get idea on how to use it, look at sample queries: https://prometheus.io/docs/prometheus/latest/querying/examples/
|
||||
* for instrumentation, look at the libraries in https://github.com/prometheus/
|
||||
- quick introduction to the stack we are running:
|
||||
https://www.youtube.com/watch?v=-37OPXXhrTw
|
||||
- to get idea on how to use it, look at sample queries:
|
||||
https://prometheus.io/docs/prometheus/latest/querying/examples/
|
||||
- for instrumentation, look at the libraries in https://github.com/prometheus/
|
||||
|
||||
How do I get basic HW (disk, cpu, memory, network...) monitoring for a host?
|
||||
----------------------------------------------------------------------------
|
||||
There are out of the box template for most of basic monitoring requirement that
|
||||
can be seen on the web UI once you run the zabbix-agent-role against the node.
|
||||
if you want to send any custom metrics, we recommend zabbix-sender. Zabbix sender is a command line utility that may be used to send performance data to zabbix server for processing.
|
||||
Adding the zabbix sender command in crontab is one way of continuously sending
|
||||
data to server that can processed on server side (in your web UI). See https://www.zabbix.com/documentation/current/manpages/zabbix_sender
|
||||
|
||||
There are out of the box template for most of basic monitoring requirement that can be
|
||||
seen on the web UI once you run the zabbix-agent-role against the node. if you want to
|
||||
send any custom metrics, we recommend zabbix-sender. Zabbix sender is a command line
|
||||
utility that may be used to send performance data to zabbix server for processing.
|
||||
Adding the zabbix sender command in crontab is one way of continuously sending data to
|
||||
server that can processed on server side (in your web UI). See
|
||||
https://www.zabbix.com/documentation/current/manpages/zabbix_sender
|
||||
|
||||
How do I monitor a list of services?
|
||||
------------------------------------
|
||||
- pagure.io and src.fp.o have two different list of services to monitor
|
||||
they partly overlap but aren't exactly the same, how can I monitor them?
|
||||
|
||||
- For prometheus, metrics exported are usually done by instrumentation,
|
||||
meaning if i.e. pagure was instrumented to export /metrics endpoint,
|
||||
you just need to make sure you are collecting them, either because they run in openshift,
|
||||
and you configured appropriate ServiceMonitor or PodMonitor objects,
|
||||
or if outside of openshift, it is in additional scrape configuration of prometheus.
|
||||
Because collected metrics are labeled, it is simple to distinguish which belong where.
|
||||
- For Zabbix, if you want to send any custom metrics, we recommend zabbix-sender. Zabbix sender is a command line utility that may be used to send performance data to zabbix server for processing. Adding the zabbix sender command in crontab is one way of continuously sending data to server that can processed on server side (in your web UI). See https://www.zabbix.com/documentation/current/manpages/zabbix_sender
|
||||
- pagure.io and src.fp.o have two different list of services to monitor
|
||||
they partly overlap but aren't exactly the same, how can I monitor them?
|
||||
- For prometheus, metrics exported are usually done by instrumentation, meaning if i.e.
|
||||
pagure was instrumented to export /metrics endpoint, you just need to make sure you
|
||||
are collecting them, either because they run in openshift, and you configured
|
||||
appropriate ServiceMonitor or PodMonitor objects, or if outside of openshift, it is in
|
||||
additional scrape configuration of prometheus. Because collected metrics are labeled,
|
||||
it is simple to distinguish which belong where.
|
||||
- For Zabbix, if you want to send any custom metrics, we recommend zabbix-sender. Zabbix
|
||||
sender is a command line utility that may be used to send performance data to zabbix
|
||||
server for processing. Adding the zabbix sender command in crontab is one way of
|
||||
continuously sending data to server that can processed on server side (in your web
|
||||
UI). See https://www.zabbix.com/documentation/current/manpages/zabbix_sender
|
||||
|
||||
How do I get alerted for a service not running?
|
||||
-----------------------------------------------
|
||||
|
||||
- Prometheus supports configuring rules for alert-manager that can then notify through various services.
|
||||
You can learn about the configuration here: https://prometheus.io/docs/alerting/latest/configuration/#configuration-file
|
||||
The rules specifying when to alert are done in prometheus itself : https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
|
||||
You can specify them in CRDs in your project in simmilar fashion as with Service Monitor
|
||||
To use IRC, there needs to be a separate gateway installed in a sidecar: https://github.com/google/alertmanager-irc-relay
|
||||
|
||||
- In Zabbix, you can set custom alerting for yourself (or for groups through
|
||||
web UI). Follow https://www.zabbix.com/documentation/5.0/manual/config/triggers/trigger
|
||||
- Prometheus supports configuring rules for alert-manager that can then notify through
|
||||
various services. You can learn about the configuration here:
|
||||
https://prometheus.io/docs/alerting/latest/configuration/#configuration-file The rules
|
||||
specifying when to alert are done in prometheus itself :
|
||||
https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/ You can
|
||||
specify them in CRDs in your project in simmilar fashion as with Service Monitor To
|
||||
use IRC, there needs to be a separate gateway installed in a sidecar:
|
||||
https://github.com/google/alertmanager-irc-relay
|
||||
- In Zabbix, you can set custom alerting for yourself (or for groups through web UI).
|
||||
Follow https://www.zabbix.com/documentation/5.0/manual/config/triggers/trigger
|
||||
|
||||
How can I tune the alerts?
|
||||
--------------------------
|
||||
|
||||
As in, who gets alerted? When? How?
|
||||
|
||||
- In Zabbix, we will have different groups with different configurations. When
|
||||
you are added in that group, you will receive notifications relevant to that
|
||||
group (you can change what alerting you want for the group once you have
|
||||
access to that). You can filter down the alerting even more for yourself in
|
||||
web UI. Follow this tutorial: https://www.zabbix.com/documentation/5.0/manual/config/triggers/trigger
|
||||
If you want to tweak how you receive your alerts, follow https://www.zabbix.com/documentation/5.0/manual/config/notifications/media
|
||||
- In Zabbix, we will have different groups with different configurations. When you are
|
||||
added in that group, you will receive notifications relevant to that group (you can
|
||||
change what alerting you want for the group once you have access to that). You can
|
||||
filter down the alerting even more for yourself in web UI. Follow this tutorial:
|
||||
https://www.zabbix.com/documentation/5.0/manual/config/triggers/trigger If you want to
|
||||
tweak how you receive your alerts, follow
|
||||
https://www.zabbix.com/documentation/5.0/manual/config/notifications/media
|
||||
|
||||
How do I ask for the service to be restarted <X> times before being alerted?
|
||||
----------------------------------------------------------------------------
|
||||
|
||||
- In prometheus you can't. It is assumed you are using kubernetes that would manage something like this for you.
|
||||
- In zabbix, <TODO>, you can do events based on triggers and there are event
|
||||
correlation options but yet to figure out this customization
|
||||
- In prometheus you can't. It is assumed you are using kubernetes that would manage
|
||||
something like this for you.
|
||||
- In zabbix, <TODO>, you can do events based on triggers and there are event correlation
|
||||
options but yet to figure out this customization
|
||||
|
||||
How do I monitor rabbitmq queues?
|
||||
---------------------------------
|
||||
|
||||
- In prometheus, according to https://www.rabbitmq.com/prometheus.html#overview-prometheus
|
||||
you just need to make sure you are collecting the exported metrics.
|
||||
|
||||
- In Zabbix, according to https://www.zabbix.com/integrations/rabbitmq, there
|
||||
is a way to build push data to zabbix that can be processed on server side
|
||||
- In prometheus, according to
|
||||
https://www.rabbitmq.com/prometheus.html#overview-prometheus you just need to make
|
||||
sure you are collecting the exported metrics.
|
||||
- In Zabbix, according to https://www.zabbix.com/integrations/rabbitmq, there is a way
|
||||
to build push data to zabbix that can be processed on server side
|
||||
|
||||
How do we alert about checks not passing to people outside of our teams?
|
||||
------------------------------------------------------------------------
|
||||
-> the OSCI team is interesting in having notifications/monitoring for the CI
|
||||
queues in rabbitmq
|
||||
|
||||
-> the OSCI team is interesting in having notifications/monitoring for the CI
|
||||
queues in rabbitmq
|
||||
|
||||
How can we chain a prometheus instance to ours?
|
||||
-----------------------------------------------
|
||||
|
||||
This allows to consolidate in a single instance monitoring coming from different
|
||||
instances. This can be done with configuring federation in additional scrape configs: https://prometheus.io/docs/prometheus/latest/federation/
|
||||
instances. This can be done with configuring federation in additional scrape configs:
|
||||
https://prometheus.io/docs/prometheus/latest/federation/
|
||||
|
||||
How can I monitor the performances of my application?
|
||||
-----------------------------------------------------
|
||||
|
||||
Number of requests served? Number of 500 errors? Number of DB connections?
|
||||
|
||||
With prometheus, you need to instrument your application and configure prometheus t collect its metrics.
|
||||
With prometheus, you need to instrument your application and configure prometheus t
|
||||
collect its metrics.
|
||||
|
||||
How do I ack an alert so it stops alerting?
|
||||
-------------------------------------------
|
||||
|
||||
With prometheus and Alertmanager, there is no way to just ACK an alert,
|
||||
it is assumed that something more high-level like opsgenie would take care of actually
|
||||
interacting with regular human ops people.
|
||||
With prometheus and Alertmanager, there is no way to just ACK an alert, it is assumed
|
||||
that something more high-level like opsgenie would take care of actually interacting
|
||||
with regular human ops people.
|
||||
|
||||
For small enough teams, just using silence on alert in alertmanager could be enough.
|
||||
|
||||
There is a sidecar that serves to provide a little bit more features to the barebones alerting.
|
||||
like https://github.com/prymitive/kthxbye.
|
||||
There is a sidecar that serves to provide a little bit more features to the barebones
|
||||
alerting. like https://github.com/prymitive/kthxbye.
|
||||
|
||||
- In Zabbix, you can acknowledge the problem and it will stop alerting. Follow https://www.zabbix.com/documentation/current/manual/acknowledges
|
||||
- In Zabbix, you can acknowledge the problem and it will stop alerting. Follow
|
||||
https://www.zabbix.com/documentation/current/manual/acknowledges
|
||||
|
||||
How do I pre-emptively stop a check before I start working on an outage?
|
||||
------------------------------------------------------------------------
|
||||
|
||||
In other words: I know that I'll cause an outage while working on <service>, how
|
||||
do I turn off the checks for this service to avoid notifying admins while I'm
|
||||
working on it?
|
||||
In other words: I know that I'll cause an outage while working on <service>, how do I
|
||||
turn off the checks for this service to avoid notifying admins while I'm working on it?
|
||||
|
||||
In Prometheus and Alertmanager there are Silences, where you can set a time when certain alerts wouldn't
|
||||
be firing. You are able to create and remove these through rest api,
|
||||
In Prometheus and Alertmanager there are Silences, where you can set a time when certain
|
||||
alerts wouldn't be firing. You are able to create and remove these through rest api,
|
||||
|
||||
- In Zabbix, simplest way is to stop zabbix agent (or custom sender) on the system and ack on
|
||||
server side that it's not reachable.
|
||||
- In Zabbix, simplest way is to stop zabbix agent (or custom sender) on the system and
|
||||
ack on server side that it's not reachable.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue