Added more prometheus documentation

This commit is contained in:
Adam Saleh 2021-04-14 16:26:21 +02:00
parent ed508e7b9b
commit 646a390c9a
2 changed files with 124 additions and 1 deletions

View file

@ -10,6 +10,7 @@ This way, the merics will be scraped into the configured prometheus and correctl
As an example, lets look at ServiceMonitor for bodhi:
::
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
@ -30,6 +31,7 @@ machinery at our disposal, see `Matcher <https://v1-17.docs.kubernetes.io/docs/r
To manage alerting, you can create an alerting rule:
::
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
@ -79,3 +81,32 @@ As part of the proof of concept we have instrumented Bodhi application,
to collect data through prometheus_client python library:
https://github.com/fedora-infra/bodhi/pull/4079
Notes on alerting
-----------------
To be be notified of alerts, you need to be subscribed to recievers that
have been configured in alertmanager.
The configuration of the rules you want to alert on can be done in the namspace of your application.
For example:
::
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
monitoring-key: cpe
name: prometheus-application-monitoring-rules
spec:
groups:
- name: general.rules
rules:
- alert: AlertBodhi500Status
annotations:
summary: Alerting on too many server errors
expr: (100*sum(rate(pyramid_request_count{namespace="bodhi", path_info_pattern=~".*[^healthz]", status="500"}[20m]))/sum(rate(pyramid_request_count{namespace="bodhi", path_info_pattern=~".*[^healthz]"}[20m])))>1
labels:
severity: high
would alert if there is more than 1% responses with 500 status code.