fix parsing errors and sphinx warnings
Signed-off-by: Ryan Lerch <rlerch@redhat.com>
This commit is contained in:
parent
8fb9b2fdf0
commit
ba720c3d77
98 changed files with 4799 additions and 4788 deletions
|
@ -1,96 +1,101 @@
|
|||
Notes on application monitoring self-service
|
||||
---------------------------------
|
||||
============================================
|
||||
|
||||
To get the application monitored in the given namespace, the namespace must have the correct label applied,
|
||||
an in the namespace there needs to be either PodMonitor or ServiceMonitor CRD setup,
|
||||
that points towards the service or pod that exports metrics.
|
||||
To get the application monitored in the given namespace, the namespace must have the
|
||||
correct label applied, an in the namespace there needs to be either PodMonitor or
|
||||
ServiceMonitor CRD setup, that points towards the service or pod that exports metrics.
|
||||
|
||||
This way, the merics will be scraped into the configured prometheus and correctly labeled.
|
||||
This way, the merics will be scraped into the configured prometheus and correctly
|
||||
labeled.
|
||||
|
||||
As an example, lets look at ServiceMonitor for bodhi:
|
||||
|
||||
::
|
||||
.. code-block::
|
||||
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: ServiceMonitor
|
||||
metadata:
|
||||
labels:
|
||||
monitoring-key: cpe
|
||||
name: bodhi-service
|
||||
namespace: bodhi
|
||||
spec:
|
||||
endpoints:
|
||||
- path: /metrics
|
||||
selector:
|
||||
matchLabels:
|
||||
service: web
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: ServiceMonitor
|
||||
metadata:
|
||||
labels:
|
||||
monitoring-key: cpe
|
||||
name: bodhi-service
|
||||
namespace: bodhi
|
||||
spec:
|
||||
endpoints:
|
||||
- path: /metrics
|
||||
selector:
|
||||
matchLabels:
|
||||
service: web
|
||||
|
||||
In this example, we are only targetting the service wit label service:web, but we have the entire matching
|
||||
machinery at our disposal, see `Matcher <https://v1-17.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#labelselector-v1-meta>`_ .
|
||||
In this example, we are only targetting the service wit label service:web, but we have
|
||||
the entire matching machinery at our disposal, see `Matcher
|
||||
<https://v1-17.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#labelselector-v1-meta>`_
|
||||
.
|
||||
|
||||
To manage alerting, you can create an alerting rule:
|
||||
|
||||
::
|
||||
.. code-block::
|
||||
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: PrometheusRule
|
||||
metadata:
|
||||
labels:
|
||||
monitoring-key: cpe
|
||||
name: bodhi-rules
|
||||
spec:
|
||||
spec:
|
||||
groups:
|
||||
- name: general.rules
|
||||
rules:
|
||||
- alert: DeadMansSwitch
|
||||
annotations:
|
||||
description: >-
|
||||
This is a DeadMansSwitch meant to ensure that the entire Alerting
|
||||
pipeline is functional.
|
||||
summary: Alerting DeadMansSwitch
|
||||
expr: vector(1)
|
||||
labels:
|
||||
severity: none
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: PrometheusRule
|
||||
metadata:
|
||||
labels:
|
||||
monitoring-key: cpe
|
||||
name: bodhi-rules
|
||||
spec:
|
||||
spec:
|
||||
groups:
|
||||
- name: general.rules
|
||||
rules:
|
||||
- alert: DeadMansSwitch
|
||||
annotations:
|
||||
description: >-
|
||||
This is a DeadMansSwitch meant to ensure that the entire Alerting
|
||||
pipeline is functional.
|
||||
summary: Alerting DeadMansSwitch
|
||||
expr: vector(1)
|
||||
labels:
|
||||
severity: none
|
||||
|
||||
This would create a alert, that will always fire, to serve as a check the alerting works.
|
||||
You should be able to see it in alert manager.
|
||||
This would create a alert, that will always fire, to serve as a check the alerting
|
||||
works. You should be able to see it in alert manager.
|
||||
|
||||
To have an alert that actually does something, you should set expr to something else than vector(1).
|
||||
For example, to alert on rate of 500 responses of a service going over 5/s in past 10 minutes:
|
||||
To have an alert that actually does something, you should set expr to something else
|
||||
than vector(1). For example, to alert on rate of 500 responses of a service going over
|
||||
5/s in past 10 minutes:
|
||||
|
||||
sum(rate(pyramid_request_count{job="bodhi-web", status="500"}[10m])) > 5
|
||||
|
||||
The alerts themselves would be the routed for further processing and notification according to rules in alertmanager,
|
||||
these are not available to change from the developers namespaces.
|
||||
The alerts themselves would be the routed for further processing and notification
|
||||
according to rules in alertmanager, these are not available to change from the
|
||||
developers namespaces.
|
||||
|
||||
The managing and acknowledging of the alerts can be done in alert-manager in rudimentary fashion.
|
||||
The managing and acknowledging of the alerts can be done in alert-manager in rudimentary
|
||||
fashion.
|
||||
|
||||
Notes on instrumenting the application
|
||||
--------------------------------------
|
||||
======================================
|
||||
|
||||
Prometheus expects applications to scrape metrics from
|
||||
to be services, with '/metrics' endpoint exposed with metrics in correct
|
||||
format.
|
||||
Prometheus expects applications to scrape metrics from to be services, with '/metrics'
|
||||
endpoint exposed with metrics in correct format.
|
||||
|
||||
There are libraries that help with this for many different languages,
|
||||
confusingly called client-libraries, eve though they usually export metrics as a http server endpoint:
|
||||
There are libraries that help with this for many different languages, confusingly called
|
||||
client-libraries, eve though they usually export metrics as a http server endpoint:
|
||||
https://prometheus.io/docs/instrumenting/clientlibs/
|
||||
|
||||
As part of the proof of concept we have instrumented Bodhi application,
|
||||
to collect data through prometheus_client python library:
|
||||
As part of the proof of concept we have instrumented Bodhi application, to collect data
|
||||
through prometheus_client python library:
|
||||
https://github.com/fedora-infra/bodhi/pull/4079
|
||||
|
||||
Notes on alerting
|
||||
-----------------
|
||||
=================
|
||||
|
||||
To be be notified of alerts, you need to be subscribed to recievers that
|
||||
have been configured in alertmanager.
|
||||
To be be notified of alerts, you need to be subscribed to recievers that have been
|
||||
configured in alertmanager.
|
||||
|
||||
The configuration of the rules you want to alert on can be done in the namspace of your application.
|
||||
For example:
|
||||
The configuration of the rules you want to alert on can be done in the namspace of your
|
||||
application. For example:
|
||||
|
||||
::
|
||||
.. code-block::
|
||||
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: PrometheusRule
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue