fix parsing errors and sphinx warnings

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2023-11-16 08:02:56 +10:00 · 2023-11-16 08:02:56 +10:00 · ba720c3d77
commit ba720c3d77
parent 8fb9b2fdf0
98 changed files with 4799 additions and 4788 deletions
--- a/docs/monitoring_metrics/faq.rst
+++ b/docs/monitoring_metrics/faq.rst
@ -1,34 +1,41 @@
 Frequently Asked Questions
 ==========================

-Here are a list of questions and answers that should help you get start with
-monitoring with zabbix and prometheus.
+Here are a list of questions and answers that should help you get start with monitoring
+with zabbix and prometheus.

 How do I access zabbix?
 -----------------------
+
 1. First obtain Kerberos ticket with kinit:
-::
+
+.. code-block::

    $ kinit myusername@FEDORAPROJECT.ORG
    Password for myusername@FEDORAPROJECT.ORG:

-2. Login to https://zabbix.stg.fedoraproject.org/zabbix.php?action=dashboard.view to see dashboard
-
-3. If you need to be added in special privilege group (to see specific systems metrics), Open a PR in <path-to-inventory> with your FAS id in the list under the group and ask sysadmin of that groups to +1.
-
+2. Login to https://zabbix.stg.fedoraproject.org/zabbix.php?action=dashboard.view to see
+   dashboard
+3. If you need to be added in special privilege group (to see specific systems metrics),
+   Open a PR in <path-to-inventory> with your FAS id in the list under the group and ask
+   sysadmin of that groups to +1.

 How do I access zabbix when I'm a community member?
 ---------------------------------------------------
+
 1. First obtain Kerberos ticket with kinit:
-::
+
+.. code-block::

    $ kinit myusername@FEDORAPROJECT.ORG
    Password for myusername@FEDORAPROJECT.ORG:

-2. Login to https://zabbix.stg.fedoraproject.org/zabbix.php?action=dashboard.view to see guest/public dashboard
+2. Login to https://zabbix.stg.fedoraproject.org/zabbix.php?action=dashboard.view to see
+   guest/public dashboard

 How do I access Prometheus?
 ---------------------------
+
 Prometheus is running in the application monitoring namespace, standard routing applies,
 i.e.: https://prometheus-route-application-monitoring.app.os.stg.fedoraproject.org/graph

@ -36,7 +43,9 @@ To access it you need to have account in the openshift it is running in.

 How do I access Prometheus when I'm a community member?
 -------------------------------------------------------
-You shouldn't access prometheus directly, unless you are maintaining an application in openshift.
+
+You shouldn't access prometheus directly, unless you are maintaining an application in
+openshift.

 Data from prometheus can be exported and viewed in Grafana or Zabbix, meaning we can
 give access to a more limited public view through dashboards in one of these.
@ -44,114 +53,131 @@ give access to a more limited public view through dashboards in one of these.
 Do you have a 5 minutes guide on how to use prometheus?
 -------------------------------------------------------

-In other words, do you have some how-tos/links I should read to understand/get
-started with prometheus?
+In other words, do you have some how-tos/links I should read to understand/get started
+with prometheus?

-* quick introduction to the stack we are running: https://www.youtube.com/watch?v=-37OPXXhrTw
-* to get idea on how to use it, look at sample queries: https://prometheus.io/docs/prometheus/latest/querying/examples/
-* for instrumentation, look at the libraries in https://github.com/prometheus/
+- quick introduction to the stack we are running:
+  https://www.youtube.com/watch?v=-37OPXXhrTw
+- to get idea on how to use it, look at sample queries:
+  https://prometheus.io/docs/prometheus/latest/querying/examples/
+- for instrumentation, look at the libraries in https://github.com/prometheus/

 How do I get basic HW (disk, cpu, memory, network...) monitoring for a host?
 ----------------------------------------------------------------------------
-There are out of the box template for most of basic monitoring requirement that
-can be seen on the web UI once you run the zabbix-agent-role against the node.
-if you want to send any custom metrics, we recommend zabbix-sender. Zabbix sender is a command line utility that may be used to send performance data to zabbix server for processing.
-Adding the zabbix sender command in crontab is one way of continuously sending
-data to server that can processed on server side (in your web UI). See https://www.zabbix.com/documentation/current/manpages/zabbix_sender
+
+There are out of the box template for most of basic monitoring requirement that can be
+seen on the web UI once you run the zabbix-agent-role against the node. if you want to
+send any custom metrics, we recommend zabbix-sender. Zabbix sender is a command line
+utility that may be used to send performance data to zabbix server for processing.
+Adding the zabbix sender command in crontab is one way of continuously sending data to
+server that can processed on server side (in your web UI). See
+https://www.zabbix.com/documentation/current/manpages/zabbix_sender

 How do I monitor a list of services?
 ------------------------------------
-  - pagure.io and src.fp.o have two different list of services to monitor
-    they partly overlap but aren't exactly the same, how can I monitor them?

-  - For prometheus, metrics exported are usually done by instrumentation,
-  meaning if i.e. pagure was instrumented to export /metrics endpoint,
-  you just need to make sure you are collecting them, either because they run in openshift,
-  and you configured appropriate ServiceMonitor or PodMonitor objects,
-  or if outside of openshift, it is in additional scrape configuration of prometheus.
-  Because collected metrics are labeled, it is simple to distinguish which belong where.
-  - For Zabbix, if you want to send any custom metrics, we recommend zabbix-sender. Zabbix sender is a command line utility that may be used to send performance data to zabbix server for processing.  Adding the zabbix sender command in crontab is one way of continuously sending data to server that can processed on server side (in your web UI). See https://www.zabbix.com/documentation/current/manpages/zabbix_sender
+- pagure.io and src.fp.o have two different list of services to monitor
+      they partly overlap but aren't exactly the same, how can I monitor them?
+- For prometheus, metrics exported are usually done by instrumentation, meaning if i.e.
+  pagure was instrumented to export /metrics endpoint, you just need to make sure you
+  are collecting them, either because they run in openshift, and you configured
+  appropriate ServiceMonitor or PodMonitor objects, or if outside of openshift, it is in
+  additional scrape configuration of prometheus. Because collected metrics are labeled,
+  it is simple to distinguish which belong where.
+- For Zabbix, if you want to send any custom metrics, we recommend zabbix-sender. Zabbix
+  sender is a command line utility that may be used to send performance data to zabbix
+  server for processing. Adding the zabbix sender command in crontab is one way of
+  continuously sending data to server that can processed on server side (in your web
+  UI). See https://www.zabbix.com/documentation/current/manpages/zabbix_sender

 How do I get alerted for a service not running?
 -----------------------------------------------

- Prometheus supports configuring rules for alert-manager that can then notify through various services.
-  You can learn about the configuration here: https://prometheus.io/docs/alerting/latest/configuration/#configuration-file
-  The rules specifying when to alert are done in prometheus itself : https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
-  You can specify them in CRDs in your project in simmilar fashion as with Service Monitor
-  To use IRC, there needs to be a separate gateway installed in a sidecar: https://github.com/google/alertmanager-irc-relay
-
- In Zabbix, you can set custom alerting for yourself (or for groups through
-  web UI). Follow https://www.zabbix.com/documentation/5.0/manual/config/triggers/trigger
+- Prometheus supports configuring rules for alert-manager that can then notify through
+  various services. You can learn about the configuration here:
+  https://prometheus.io/docs/alerting/latest/configuration/#configuration-file The rules
+  specifying when to alert are done in prometheus itself :
+  https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/ You can
+  specify them in CRDs in your project in simmilar fashion as with Service Monitor To
+  use IRC, there needs to be a separate gateway installed in a sidecar:
+  https://github.com/google/alertmanager-irc-relay
+- In Zabbix, you can set custom alerting for yourself (or for groups through web UI).
+  Follow https://www.zabbix.com/documentation/5.0/manual/config/triggers/trigger

 How can I tune the alerts?
 --------------------------

 As in, who gets alerted? When? How?

- In Zabbix, we will have different groups with different configurations. When
-  you are added in that group, you will receive notifications relevant to that
-  group (you can change what alerting you want for the group once you have
-  access to that). You can filter down the alerting even more for yourself in
-  web UI. Follow this tutorial: https://www.zabbix.com/documentation/5.0/manual/config/triggers/trigger
-  If you want to tweak how you receive your alerts, follow https://www.zabbix.com/documentation/5.0/manual/config/notifications/media
+- In Zabbix, we will have different groups with different configurations. When you are
+  added in that group, you will receive notifications relevant to that group (you can
+  change what alerting you want for the group once you have access to that). You can
+  filter down the alerting even more for yourself in web UI. Follow this tutorial:
+  https://www.zabbix.com/documentation/5.0/manual/config/triggers/trigger If you want to
+  tweak how you receive your alerts, follow
+  https://www.zabbix.com/documentation/5.0/manual/config/notifications/media

 How do I ask for the service to be restarted <X> times before being alerted?
 ----------------------------------------------------------------------------

- In prometheus you can't. It is assumed you are using kubernetes that would manage something like this for you.
- In zabbix, <TODO>, you can do events based on triggers and there are event
-  correlation options but yet to figure out this customization
+- In prometheus you can't. It is assumed you are using kubernetes that would manage
+  something like this for you.
+- In zabbix, <TODO>, you can do events based on triggers and there are event correlation
+  options but yet to figure out this customization

 How do I monitor rabbitmq queues?
 ---------------------------------

- In prometheus, according to https://www.rabbitmq.com/prometheus.html#overview-prometheus
-  you just need to make sure you are collecting the exported metrics.
-
- In Zabbix, according to https://www.zabbix.com/integrations/rabbitmq, there
-  is a way to build push data to zabbix that can be processed on server side
+- In prometheus, according to
+  https://www.rabbitmq.com/prometheus.html#overview-prometheus you just need to make
+  sure you are collecting the exported metrics.
+- In Zabbix, according to https://www.zabbix.com/integrations/rabbitmq, there is a way
+  to build push data to zabbix that can be processed on server side

 How do we alert about checks not passing to people outside of our teams?
 ------------------------------------------------------------------------
-  -> the OSCI team is interesting in having notifications/monitoring for the CI
-     queues in rabbitmq
+
+    -> the OSCI team is interesting in having notifications/monitoring for the CI
+        queues in rabbitmq

 How can we chain a prometheus instance to ours?
 -----------------------------------------------
+
 This allows to consolidate in a single instance monitoring coming from different
-instances. This can be done with configuring federation in additional scrape configs: https://prometheus.io/docs/prometheus/latest/federation/
+instances. This can be done with configuring federation in additional scrape configs:
+https://prometheus.io/docs/prometheus/latest/federation/

 How can I monitor the performances of my application?
 -----------------------------------------------------

 Number of requests served? Number of 500 errors? Number of DB connections?

-With prometheus, you need to instrument your application and configure prometheus t collect its metrics.
+With prometheus, you need to instrument your application and configure prometheus t
+collect its metrics.

 How do I ack an alert so it stops alerting?
 -------------------------------------------

-With prometheus and Alertmanager, there is no way to just ACK an alert,
-it is assumed that something more high-level like opsgenie would take care of actually
-interacting with regular human ops people.
+With prometheus and Alertmanager, there is no way to just ACK an alert, it is assumed
+that something more high-level like opsgenie would take care of actually interacting
+with regular human ops people.

 For small enough teams, just using silence on alert in alertmanager could be enough.

-There is a sidecar that serves to provide a little bit more features to the barebones alerting.
-like https://github.com/prymitive/kthxbye.
+There is a sidecar that serves to provide a little bit more features to the barebones
+alerting. like https://github.com/prymitive/kthxbye.

- In Zabbix, you can acknowledge the problem and it will stop alerting. Follow https://www.zabbix.com/documentation/current/manual/acknowledges
+- In Zabbix, you can acknowledge the problem and it will stop alerting. Follow
+  https://www.zabbix.com/documentation/current/manual/acknowledges

 How do I pre-emptively stop a check before I start working on an outage?
 ------------------------------------------------------------------------

-In other words: I know that I'll cause an outage while working on <service>, how
-do I turn off the checks for this service to avoid notifying admins while I'm
-working on it?
+In other words: I know that I'll cause an outage while working on <service>, how do I
+turn off the checks for this service to avoid notifying admins while I'm working on it?

-In Prometheus and Alertmanager there are Silences, where you can set a time when certain alerts wouldn't
-be firing. You are able to create and remove these through rest api,
+In Prometheus and Alertmanager there are Silences, where you can set a time when certain
+alerts wouldn't be firing. You are able to create and remove these through rest api,

- In Zabbix, simplest way is to stop zabbix agent (or custom sender) on the system and ack on
-  server side that it's not reachable.
+- In Zabbix, simplest way is to stop zabbix agent (or custom sender) on the system and
+  ack on server side that it's not reachable.
--- a/docs/monitoring_metrics/index.rst
+++ b/docs/monitoring_metrics/index.rst
@ -1,53 +1,62 @@
 Monitoring / Metrics
-========================
+====================

-As an ARC team initiative we want to investigate Prometheus and Zabbix
-as our new monitoring and metrics solutions, by:
+As an ARC team initiative we want to investigate Prometheus and Zabbix as our new
+monitoring and metrics solutions, by:

- -  Installing Zabbix server in a VM, and hooking up the staging dist-git to it with an agent
- -  Installing Prometheus in our Open Shift and collecting metrics for a selected project in a self-service fashion
+    - Installing Zabbix server in a VM, and hooking up the staging dist-git to it with
+      an agent
+    - Installing Prometheus in our Open Shift and collecting metrics for a selected
+      project in a self-service fashion

 Prior POCs/deployments
 ----------------------

 Fabian Arrotin deployed and utilizes zabbix in centos infrastructure.
- - https://github.com/CentOS/ansible-role-zabbix-server
+    - https://github.com/CentOS/ansible-role-zabbix-server

 Adam Saleh has deployed a POC prometheus deployment for CoreOS team.
- - https://pagure.io/centos-infra/issue/112
+    - https://pagure.io/centos-infra/issue/112

-David Kirwan was part of the development team of https://github.com/integr8ly/application-monitoring-operator/ and did some POC around prometheus push-gateway in centos openshift
+David Kirwan was part of the development team of
+https://github.com/integr8ly/application-monitoring-operator/ and did some POC around
+prometheus push-gateway in centos openshift

 Investigation
 -------------

-In process we want to be able to answer the questions posed in the latest mailing thread and by the end have a setup that can lead directly into mirating us away from nagios. The questions (mostly from Kevin):
+In process we want to be able to answer the questions posed in the latest mailing thread
+and by the end have a setup that can lead directly into mirating us away from nagios.
+The questions (mostly from Kevin):

- -  How can we provision both of them automatically from ansible?
- -  Can we get zabbix to pull from prometheus?
- -  Can zabbix handle our number of machines?
- -  How flexible is the alerting?
+    - How can we provision both of them automatically from ansible?
+    - Can we get zabbix to pull from prometheus?
+    - Can zabbix handle our number of machines?
+    - How flexible is the alerting?

 Main takeaway
 -------------

-We managed to create proof-of-concept monitoring solutions with both prometheus and zabbix.
+We managed to create proof-of-concept monitoring solutions with both prometheus and
+zabbix.

-The initial configuration has proven to have more pitfals than expected,
-with Prometheus especially in the integration with openshift and its other auxialiary services,
-and with Zabbix espcially with correctly setting up the ip-tables and network permissions,
-and with configuring a reasonable setup for the user-access and user-account management.
+The initial configuration has proven to have more pitfals than expected, with Prometheus
+especially in the integration with openshift and its other auxialiary services, and with
+Zabbix espcially with correctly setting up the ip-tables and network permissions, and
+with configuring a reasonable setup for the user-access and user-account management.

-Even despite these setbacks, we still feel this would be an improvement over our current setup based on Nagios.
+Even despite these setbacks, we still feel this would be an improvement over our current
+setup based on Nagios.

 To get a basic overview of Prometheus, you can watch this short tech-talk by Adam Saleh:
-(accessible only to RedHat) https://drive.google.com/file/d/1-uEIkS2jaJ2b8V_4y-AKW1J6sdZzzlc9/view 
-or read up the more indepth report in the relevant sections of this documentation.
+(accessible only to RedHat)
+https://drive.google.com/file/d/1-uEIkS2jaJ2b8V_4y-AKW1J6sdZzzlc9/view or read up the
+more indepth report in the relevant sections of this documentation.

 .. toctree::
    :maxdepth: 1

    prometheus_for_ops
    prometheus_for_dev
+    zabbix
    faq
-
--- a/docs/monitoring_metrics/prometheus_for_dev.rst
+++ b/docs/monitoring_metrics/prometheus_for_dev.rst
@ -1,96 +1,101 @@
 Notes on application monitoring self-service
---------------------------------
+============================================

-To get the application monitored in the given namespace, the namespace must have the correct label applied,
-an in the namespace there needs to be either PodMonitor or ServiceMonitor CRD setup,
-that points towards the service or pod that exports metrics.
+To get the application monitored in the given namespace, the namespace must have the
+correct label applied, an in the namespace there needs to be either PodMonitor or
+ServiceMonitor CRD setup, that points towards the service or pod that exports metrics.

-This way, the merics will be scraped into the configured prometheus and correctly labeled.
+This way, the merics will be scraped into the configured prometheus and correctly
+labeled.

 As an example, lets look at ServiceMonitor for bodhi:

-::
+.. code-block::

-  apiVersion: monitoring.coreos.com/v1
-  kind: ServiceMonitor
-  metadata:
-    labels:
-      monitoring-key: cpe
-    name: bodhi-service
-    namespace: bodhi
-  spec:
-    endpoints:
-      - path: /metrics
-    selector:
-      matchLabels:
-        service: web
+    apiVersion: monitoring.coreos.com/v1
+    kind: ServiceMonitor
+    metadata:
+      labels:
+        monitoring-key: cpe
+      name: bodhi-service
+      namespace: bodhi
+    spec:
+      endpoints:
+        - path: /metrics
+      selector:
+        matchLabels:
+          service: web

-In this example, we are only targetting the service wit label service:web, but we have the entire matching
-machinery at our disposal, see `Matcher <https://v1-17.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#labelselector-v1-meta>`_ .
+In this example, we are only targetting the service wit label service:web, but we have
+the entire matching machinery at our disposal, see `Matcher
+<https://v1-17.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#labelselector-v1-meta>`_
+.

 To manage alerting, you can create an alerting rule:

-::
+.. code-block::

-    apiVersion: monitoring.coreos.com/v1
-    kind: PrometheusRule
-    metadata:
-      labels:
-      monitoring-key: cpe
-      name: bodhi-rules
-    spec:
-    spec:
-  groups:
-    - name: general.rules
-      rules:
-        - alert: DeadMansSwitch
-          annotations:
-            description: >-
-              This is a DeadMansSwitch meant to ensure that the entire Alerting
-              pipeline is functional.
-            summary: Alerting DeadMansSwitch
-          expr: vector(1)
-          labels:
-            severity: none
+      apiVersion: monitoring.coreos.com/v1
+      kind: PrometheusRule
+      metadata:
+        labels:
+        monitoring-key: cpe
+        name: bodhi-rules
+      spec:
+      spec:
+    groups:
+      - name: general.rules
+        rules:
+          - alert: DeadMansSwitch
+            annotations:
+              description: >-
+                This is a DeadMansSwitch meant to ensure that the entire Alerting
+                pipeline is functional.
+              summary: Alerting DeadMansSwitch
+            expr: vector(1)
+            labels:
+              severity: none

-This would create a alert, that will always fire, to serve as a check the alerting works.
-You should be able to see it in alert manager.
+This would create a alert, that will always fire, to serve as a check the alerting
+works. You should be able to see it in alert manager.

-To have an alert that actually does something, you should set expr to something else than vector(1).
-For example, to alert on rate of 500 responses of a service going over 5/s in past 10 minutes:
+To have an alert that actually does something, you should set expr to something else
+than vector(1). For example, to alert on rate of 500 responses of a service going over
+5/s in past 10 minutes:

 sum(rate(pyramid_request_count{job="bodhi-web", status="500"}[10m])) > 5

-The alerts themselves would be the routed for further processing and notification according to rules in alertmanager,
-these are not available to change from the developers namespaces.
+The alerts themselves would be the routed for further processing and notification
+according to rules in alertmanager, these are not available to change from the
+developers namespaces.

-The managing and acknowledging of the alerts can be done in alert-manager in rudimentary fashion.
+The managing and acknowledging of the alerts can be done in alert-manager in rudimentary
+fashion.

 Notes on instrumenting the application
--------------------------------------
+======================================

-Prometheus expects applications to scrape metrics from
-to be services, with '/metrics' endpoint exposed with metrics in correct
-format.
+Prometheus expects applications to scrape metrics from to be services, with '/metrics'
+endpoint exposed with metrics in correct format.

-There are libraries that help with this for many different languages,
-confusingly called client-libraries, eve though they usually export metrics as a http server endpoint:
+There are libraries that help with this for many different languages, confusingly called
+client-libraries, eve though they usually export metrics as a http server endpoint:
 https://prometheus.io/docs/instrumenting/clientlibs/

-As part of the proof of concept we have instrumented Bodhi application,
-to collect data through prometheus_client python library:
+As part of the proof of concept we have instrumented Bodhi application, to collect data
+through prometheus_client python library:
 https://github.com/fedora-infra/bodhi/pull/4079

 Notes on alerting
-----------------
+=================

-To be be notified of alerts, you need to be subscribed to recievers that
-have been configured in alertmanager.
+To be be notified of alerts, you need to be subscribed to recievers that have been
+configured in alertmanager.

-The configuration of the rules you want to alert on can be done in the namspace of your application.
-For example:
+The configuration of the rules you want to alert on can be done in the namspace of your
+application. For example:

-::
+.. code-block::

    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
--- a/docs/monitoring_metrics/prometheus_for_ops.rst
+++ b/docs/monitoring_metrics/prometheus_for_ops.rst
@ -1,80 +1,97 @@
 Monitoring / Metrics with Prometheus
-========================
+====================================

-For deployment, we used combination for configuration of prometheus operator and application-monitoring operator.
+For deployment, we used combination for configuration of prometheus operator and
+application-monitoring operator.

-Beware, most of the deployment notes could be mostly obsolete in really short time.
-The POC was done on OpenShift 3.11, which limited us in using older version of prometheus operator,
-as well as the no longer maintained application-monitoring operator.
+Beware, most of the deployment notes could be mostly obsolete in really short time. The
+POC was done on OpenShift 3.11, which limited us in using older version of prometheus
+operator, as well as the no longer maintained application-monitoring operator.

-In openshift 4.x that we plan to use in the near future, there is  supported way integrated in the openshift deployment:
+In openshift 4.x that we plan to use in the near future, there is supported way
+integrated in the openshift deployment:

-* https://docs.openshift.com/container-platform/4.7/monitoring/understanding-the-monitoring-stack.html
-* https://docs.openshift.com/container-platform/4.7/monitoring/configuring-the-monitoring-stack.html#configuring-the-monitoring-stack
-* https://docs.openshift.com/container-platform/4.7/monitoring/enabling-monitoring-for-user-defined-projects.html
-
-The supported stack is more limited, especially w.r.t. adding user defined pod- and service-monitors, but even if we would want to
-run additional prometheus instances, we should be able to skip the instalation of the necessary operators, as all of them should already be present.
+- https://docs.openshift.com/container-platform/4.7/monitoring/understanding-the-monitoring-stack.html
+- https://docs.openshift.com/container-platform/4.7/monitoring/configuring-the-monitoring-stack.html#configuring-the-monitoring-stack
+- https://docs.openshift.com/container-platform/4.7/monitoring/enabling-monitoring-for-user-defined-projects.html

+The supported stack is more limited, especially w.r.t. adding user defined pod- and
+service-monitors, but even if we would want to run additional prometheus instances, we
+should be able to skip the instalation of the necessary operators, as all of them should
+already be present.

 Notes on operator deployment
-------------------
+----------------------------

-Operator pattern is often used with kubernetes and openshift for more complex deployments.
-Instead of applying all of the configuration to deploy your services, you deploy a special,
-smaller service called operator, that has necessary permissions to deploy and configure the complex service.
-Once the operator is running, instead of configuring the service itself with service-specific config-maps,
-you create operator specific kubernetes objects, so-alled CRDs.
+Operator pattern is often used with kubernetes and openshift for more complex
+deployments. Instead of applying all of the configuration to deploy your services, you
+deploy a special, smaller service called operator, that has necessary permissions to
+deploy and configure the complex service. Once the operator is running, instead of
+configuring the service itself with service-specific config-maps, you create operator
+specific kubernetes objects, so-alled CRDs.

-The deployment of the operator in question was done by configuring the CRDs, roles and rolebinding and operator setup:
+The deployment of the operator in question was done by configuring the CRDs, roles and
+rolebinding and operator setup:

-The definitions are as follows:
- https://github.com/prometheus-operator/prometheus-operator/tree/v0.38.3/example/prometheus-operator-crd
- https://github.com/prometheus-operator/prometheus-operator/tree/v0.38.3/example/rbac/prometheus-operator-crd
- https://github.com/prometheus-operator/prometheus-operator/tree/v0.38.3/example/rbac/prometheus-operator
+The definitions are as follows: -
+https://github.com/prometheus-operator/prometheus-operator/tree/v0.38.3/example/prometheus-operator-crd
+-
+https://github.com/prometheus-operator/prometheus-operator/tree/v0.38.3/example/rbac/prometheus-operator-crd
+-
+https://github.com/prometheus-operator/prometheus-operator/tree/v0.38.3/example/rbac/prometheus-operator

-Once the operator is correctly running, you just define a prometheus crd and it will create prometheus deployment for you.
-
-The POC lives in https://pagure.io/fedora-infra/ansible/blob/main/f/playbooks/openshift-apps/application-monitoring.yml
+Once the operator is correctly running, you just define a prometheus crd and it will
+create prometheus deployment for you.

+The POC lives in
+https://pagure.io/fedora-infra/ansible/blob/main/f/playbooks/openshift-apps/application-monitoring.yml

 Notes on application monitoring operator deployment
 ---------------------------------------------------

-The application-monitoring operator was created to solve the integration of Prometheus, Alertmanager and Grafana.
-After you configure it, it configures the relevant operators responsible for these services.
+The application-monitoring operator was created to solve the integration of Prometheus,
+Alertmanager and Grafana. After you configure it, it configures the relevant operators
+responsible for these services.

-The most interesting difference between configuring this shared operator,
-compared to configuring these operators individually is that it configures some of the integrations,
+The most interesting difference between configuring this shared operator, compared to
+configuring these operators individually is that it configures some of the integrations,
 and it integrates well with openshifts auth system through oauth proxy.

 The biggest drawback is, that the application-monitoring operator is orphanned project,
-but because it mostly configures other operators, it is relatively simple to just recreate
-the configuration for both prometheus and alertmanager to be deployed,
-and deploy the prometheus and alertmanager operators without the help or the application-monitoring operator.
+but because it mostly configures other operators, it is relatively simple to just
+recreate the configuration for both prometheus and alertmanager to be deployed, and
+deploy the prometheus and alertmanager operators without the help or the
+application-monitoring operator.

 Notes on persistence
 --------------------

-Prometheus by default expects to have a writable /prometheus folder,
-that can serve as persistent storage.
+Prometheus by default expects to have a writable /prometheus folder, that can serve as
+persistent storage.

-For the persistent volume to work for this purpose, it has to
-**needs to have POSIX-compliant filesystem**, and NFS we currently have configured is not.
-This is discussed in the `operational aspects <https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects>`_
-of Prometheus documentation
+For the persistent volume to work for this purpose, it has to **needs to have
+POSIX-compliant filesystem**, and NFS we currently have configured is not. This is
+discussed in the `operational aspects
+<https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects>`_ of
+Prometheus documentation

-The easiest supported way to have a POSIX-compliant `filesystem is to setup local-storage <https://docs.openshift.com/container-platform/3.11/install_config/configuring_local.html>`_
+The easiest supported way to have a POSIX-compliant `filesystem is to setup
+local-storage
+<https://docs.openshift.com/container-platform/3.11/install_config/configuring_local.html>`_
 in the cluster.

-In 4.x versions of OpenShift `there is a local-storage-operator <https://docs.openshift.com/container-platform/4.7/storage/persistent_storage/persistent-storage-local.html>`_ for this purpose.
+In 4.x versions of OpenShift `there is a local-storage-operator
+<https://docs.openshift.com/container-platform/4.7/storage/persistent_storage/persistent-storage-local.html>`_
+for this purpose.

-This is the simplest way to have working persistence, but it prevents us to have multiple instanes
-across openshift nodes, as the pod is using the underlying filesystem on the node.
+This is the simplest way to have working persistence, but it prevents us to have
+multiple instanes across openshift nodes, as the pod is using the underlying filesystem
+on the node.

-To ask the operator to create persisted prometheus, you specify in its configuration i.e.:
+To ask the operator to create persisted prometheus, you specify in its configuration
+i.e.:

-::
+.. code-block::

    storage:
        volumeClaimTemplate:
@ -87,27 +104,27 @@ To ask the operator to create persisted prometheus, you specify in its configura

 By default retention is set to 24 hours and can be over-ridden

-
 Notes on long term storage
--------------------
+--------------------------

 Usually, prometheus itself is setup to store its metrics for shorter ammount of time,
-and it is expected that for longterm storage and analysis, there is some other storage solution,
-such as influxdb or timescaledb.
+and it is expected that for longterm storage and analysis, there is some other storage
+solution, such as influxdb or timescaledb.

-We are currently running a POC that sychronizes Prometheus with Timescaledb (running on Postgresql)
-through a middleware service called `promscale <https://github.com/timescale/promscale>`_ .
+We are currently running a POC that sychronizes Prometheus with Timescaledb (running on
+Postgresql) through a middleware service called `promscale
+<https://github.com/timescale/promscale>`_ .

-Promscale just needs an access to a appropriate postgresql database:
-and can be configured through PROMSCALE_DB_PASSWORD, PROMSCALE_DB_HOST.
+Promscale just needs an access to a appropriate postgresql database: and can be
+configured through PROMSCALE_DB_PASSWORD, PROMSCALE_DB_HOST.

-By default it will ensure the database has timescaledb installed and configures its database
-automatically.
+By default it will ensure the database has timescaledb installed and configures its
+database automatically.

 We setup prometheus with directive to use promscale service as a backend:
 https://github.com/timescale/promscale

-::
+.. code-block::

    remote_write:
    - url: "http://promscale:9201/write"
@ -117,24 +134,27 @@ https://github.com/timescale/promscale
 Notes on auxialiary services
 ----------------------------

-As prometheus is primarily targeted to collect metrics from
-services that have beein instrumented to expose them, if your service is not instrumented,
-or it is not a service, i.e. a batch-job, you need an adapter to help you with the metrics collection.
+As prometheus is primarily targeted to collect metrics from services that have beein
+instrumented to expose them, if your service is not instrumented, or it is not a
+service, i.e. a batch-job, you need an adapter to help you with the metrics collection.

 There are two services that help with this.

-* `blackbox exporter <https://github.com/prometheus/blackbox_exporter>`_ to monitor services that have not been instrumented based on querying public a.p.i.
-* `push gateway <https://prometheus.io/docs/practices/pushing/#should-i-be-using-the-pushgateway>`_ that helps collect information from batch-jobs
+- `blackbox exporter <https://github.com/prometheus/blackbox_exporter>`_ to monitor
+  services that have not been instrumented based on querying public a.p.i.
+- `push gateway
+  <https://prometheus.io/docs/practices/pushing/#should-i-be-using-the-pushgateway>`_
+  that helps collect information from batch-jobs

-Maintaining the push-gateway can be relegated to the application developer,
-as it is lightweight, and by colloecting metrics from the namespace it is running in,
-the data will be correctly labeled.
+Maintaining the push-gateway can be relegated to the application developer, as it is
+lightweight, and by colloecting metrics from the namespace it is running in, the data
+will be correctly labeled.

 With blackbox exporter, it can be beneficial to have it running as prometheus side-car,
 in simmilar fashion, as we configure oauth-proxy, adding this to the containers section
 of the prometheus definition:

-::
+.. code-block::

    - name: blackbox-exporter
      volumeMounts:
@ -149,58 +169,65 @@ of the prometheus definition:
        - containerPort: 9115
          name: blackbox

-We can then instruct what is to be monitored through the configmap-blackbox, you can find `relevant examples <https://github.com/prometheus/blackbox_exporter/blob/master/example.yml>` in the project repo.
-Beause blackox exporter is in the same pod, we need to use the additional-scrape-config to add it in.
+We can then instruct what is to be monitored through the configmap-blackbox, you can
+find `relevant examples
+<https://github.com/prometheus/blackbox_exporter/blob/master/example.yml>` in the
+project repo. Beause blackox exporter is in the same pod, we need to use the
+additional-scrape-config to add it in.

 Notes on alerting
 -----------------

-Prometheus as is, can have rules configured that trigger alerts, once
-a specific query evaluates to true. The definition of the rule is explained in the companion docs
-for prometheus for developers and can be created in the namespace of the running application.
+Prometheus as is, can have rules configured that trigger alerts, once a specific query
+evaluates to true. The definition of the rule is explained in the companion docs for
+prometheus for developers and can be created in the namespace of the running
+application.

-Here, we need to focus what happens with alert after prometheus realizes it should fire it,
-based on a rule.
+Here, we need to focus what happens with alert after prometheus realizes it should fire
+it, based on a rule.

-In prometheus crd definition, there is a section about the alert-manager that is supposed to
-manage the forwarding of these alerts.
+In prometheus crd definition, there is a section about the alert-manager that is
+supposed to manage the forwarding of these alerts.

-::
+.. code-block::

-  alerting:
-    alertmanagers:
-      - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
-        name: alertmanager-service
-        namespace: application-monitoring
-        port: web
-        scheme: https
-        tlsConfig:
-          caFile: /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
-          serverName: alertmanager-service.application-monitoring.svc
+    alerting:
+      alertmanagers:
+        - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
+          name: alertmanager-service
+          namespace: application-monitoring
+          port: web
+          scheme: https
+          tlsConfig:
+            caFile: /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
+            serverName: alertmanager-service.application-monitoring.svc

 We already have alertmanager running and configured by the alertmanager-operator.
-Alertmanager itself is really simplistic with a simple ui and api, that allows for silencing an
-alert for a given ammount of time.
+Alertmanager itself is really simplistic with a simple ui and api, that allows for
+silencing an alert for a given ammount of time.

-It is expected that the actual user-interaction is happening elsewhere,
-either through services like OpsGenie, or through i.e. `integration with zabbix <https://devopy.io/setting-up-zabbix-alertmanager-integration/>`_
+It is expected that the actual user-interaction is happening elsewhere, either through
+services like OpsGenie, or through i.e. `integration with zabbix
+<https://devopy.io/setting-up-zabbix-alertmanager-integration/>`_

-More of a build-it yourself solution is to use i.e. https://karma-dashboard.io/,
-but we haven't tried any of these as the part of our POC.
+More of a build-it yourself solution is to use i.e. https://karma-dashboard.io/, but we
+haven't tried any of these as the part of our POC.

-To be able to be notified of the alert, you need to have the `correct reciever configuration <https://prometheus.io/docs/alerting/latest/configuration/#email_config>`_ in the alertmanagers secret:
+To be able to be notified of the alert, you need to have the `correct reciever
+configuration <https://prometheus.io/docs/alerting/latest/configuration/#email_config>`_
+in the alertmanagers secret:

-::
+.. code-block::

-global:
-  resolve_timeout: 5m
-route:
-  group_by: ['job']
-  group_wait: 10s
-  group_interval: 10s
-  repeat_interval: 30m
-  receiver: 'email'
-receivers:
- name: 'email'
-  email_configs:
-  - to: 'asaleh@redhat.com'
+    global:
+      resolve_timeout: 5m
+    route:
+      group_by: ['job']
+      group_wait: 10s
+      group_interval: 10s
+      repeat_interval: 30m
+      receiver: 'email'
+    receivers:
+    - name: 'email'
+      email_configs:
+      - to: 'asaleh@redhat.com'
--- a/docs/monitoring_metrics/zabbix.rst
+++ b/docs/monitoring_metrics/zabbix.rst
@ -1,38 +1,39 @@
 Monitoring / Metrics with Prometheus
-========================
+====================================

-We are using Zabbix 5.0 (lts) server with PostgreSQL database. Starting with manual configuration in a test vm and then automating it for for deployment, Ansible roles `zabbix-server` and `zabbix-agent` are to results of this PoC work.
-Please follow FAQ to see how to access staging deployment of zabbix.
+We are using Zabbix 5.0 (lts) server with PostgreSQL database. Starting with manual
+configuration in a test vm and then automating it for for deployment, Ansible roles
+`zabbix-server` and `zabbix-agent` are to results of this PoC work. Please follow FAQ to
+see how to access staging deployment of zabbix.

 zabbix-server
 -------------

-This role is ready at the base level but as the complexity of the monitoring
-increases, more work would be needed. At the current level, it
+This role is ready at the base level but as the complexity of the monitoring increases,
+more work would be needed. At the current level, it

-  * Installs needed packages for server
-  * configure zabbix, apache and PostgreSQL configuration files
-  * configures web UI
-  * configures kerberos authentication
+    - Installs needed packages for server
+    - configure zabbix, apache and PostgreSQL configuration files
+    - configures web UI
+    - configures kerberos authentication

-While these basic things are good for POC, they are not ready to be in
-production until we have configured the following
+While these basic things are good for POC, they are not ready to be in production until
+we have configured the following

-  * add inventory files for groups and users and have zabbix-cli restore those
-    in case of a fresh installation
-  * Network config audit (see common challenges)
+    - add inventory files for groups and users and have zabbix-cli restore those in case
+      of a fresh installation
+    - Network config audit (see common challenges)

 zabbix-agent
 ------------

 This role is ready to be used and existing templates are good to gather basic
-information. Though specific of what kind of common data would be collected
-from all agent nodes needs to be discussed widely and set in template.
-Other than common metrics, one can also export custom metrics using
-zabbix-sender (see FAQ).
-
+information. Though specific of what kind of common data would be collected from all
+agent nodes needs to be discussed widely and set in template. Other than common metrics,
+one can also export custom metrics using zabbix-sender (see FAQ).

 Common challenges
 -----------------
-Lack of experience in selinux policies and network configuration, we are not
-very confident with those. A veteran sysadmin would be needed audit.
+
+Lack of experience in selinux policies and network configuration, we are not very
+confident with those. A veteran sysadmin would be needed audit.