Added more prometheus documentation

2021-04-14 16:26:21 +02:00 · 2021-04-14 16:26:21 +02:00 · 646a390c9a
commit 646a390c9a
parent ed508e7b9b
2 changed files with 124 additions and 1 deletions
--- a/docs/monitoring_metrics/prometheus_for_dev.rst
+++ b/docs/monitoring_metrics/prometheus_for_dev.rst
@ -10,6 +10,7 @@ This way, the merics will be scraped into the configured prometheus and correctl
 As an example, lets look at ServiceMonitor for bodhi:

 ::
+
  apiVersion: monitoring.coreos.com/v1
  kind: ServiceMonitor
  metadata:
@ -30,6 +31,7 @@ machinery at our disposal, see `Matcher <https://v1-17.docs.kubernetes.io/docs/r
 To manage alerting, you can create an alerting rule:

 ::
+
    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
@ -79,3 +81,32 @@ As part of the proof of concept we have instrumented Bodhi application,
 to collect data through prometheus_client python library:
 https://github.com/fedora-infra/bodhi/pull/4079

+Notes on alerting
+-----------------
+
+To be be notified of alerts, you need to be subscribed to recievers that
+have been configured in alertmanager.
+
+The configuration of the rules you want to alert on can be done in the namspace of your application.
+For example:
+
+::
+
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+  labels:
+    monitoring-key: cpe
+  name: prometheus-application-monitoring-rules
+spec:
+  groups:
+    - name: general.rules
+      rules:
+        - alert: AlertBodhi500Status
+          annotations:
+            summary: Alerting on too many server errors
+          expr: (100*sum(rate(pyramid_request_count{namespace="bodhi", path_info_pattern=~".*[^healthz]", status="500"}[20m]))/sum(rate(pyramid_request_count{namespace="bodhi", path_info_pattern=~".*[^healthz]"}[20m])))>1
+          labels:
+            severity: high
+
+would alert if there is more than 1% responses with 500 status code.