fix parsing errors and sphinx warnings

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2023-11-16 08:02:56 +10:00 · 2023-11-16 08:02:56 +10:00 · ba720c3d77
commit ba720c3d77
parent 8fb9b2fdf0
98 changed files with 4799 additions and 4788 deletions
--- a/docs/monitoring_metrics/prometheus_for_ops.rst
+++ b/docs/monitoring_metrics/prometheus_for_ops.rst
@ -1,80 +1,97 @@
 Monitoring / Metrics with Prometheus
-========================
+====================================

-For deployment, we used combination for configuration of prometheus operator and application-monitoring operator.
+For deployment, we used combination for configuration of prometheus operator and
+application-monitoring operator.

-Beware, most of the deployment notes could be mostly obsolete in really short time.
-The POC was done on OpenShift 3.11, which limited us in using older version of prometheus operator,
-as well as the no longer maintained application-monitoring operator.
+Beware, most of the deployment notes could be mostly obsolete in really short time. The
+POC was done on OpenShift 3.11, which limited us in using older version of prometheus
+operator, as well as the no longer maintained application-monitoring operator.

-In openshift 4.x that we plan to use in the near future, there is  supported way integrated in the openshift deployment:
+In openshift 4.x that we plan to use in the near future, there is supported way
+integrated in the openshift deployment:

-* https://docs.openshift.com/container-platform/4.7/monitoring/understanding-the-monitoring-stack.html
-* https://docs.openshift.com/container-platform/4.7/monitoring/configuring-the-monitoring-stack.html#configuring-the-monitoring-stack
-* https://docs.openshift.com/container-platform/4.7/monitoring/enabling-monitoring-for-user-defined-projects.html
-
-The supported stack is more limited, especially w.r.t. adding user defined pod- and service-monitors, but even if we would want to
-run additional prometheus instances, we should be able to skip the instalation of the necessary operators, as all of them should already be present.
+- https://docs.openshift.com/container-platform/4.7/monitoring/understanding-the-monitoring-stack.html
+- https://docs.openshift.com/container-platform/4.7/monitoring/configuring-the-monitoring-stack.html#configuring-the-monitoring-stack
+- https://docs.openshift.com/container-platform/4.7/monitoring/enabling-monitoring-for-user-defined-projects.html

+The supported stack is more limited, especially w.r.t. adding user defined pod- and
+service-monitors, but even if we would want to run additional prometheus instances, we
+should be able to skip the instalation of the necessary operators, as all of them should
+already be present.

 Notes on operator deployment
-------------------
+----------------------------

-Operator pattern is often used with kubernetes and openshift for more complex deployments.
-Instead of applying all of the configuration to deploy your services, you deploy a special,
-smaller service called operator, that has necessary permissions to deploy and configure the complex service.
-Once the operator is running, instead of configuring the service itself with service-specific config-maps,
-you create operator specific kubernetes objects, so-alled CRDs.
+Operator pattern is often used with kubernetes and openshift for more complex
+deployments. Instead of applying all of the configuration to deploy your services, you
+deploy a special, smaller service called operator, that has necessary permissions to
+deploy and configure the complex service. Once the operator is running, instead of
+configuring the service itself with service-specific config-maps, you create operator
+specific kubernetes objects, so-alled CRDs.

-The deployment of the operator in question was done by configuring the CRDs, roles and rolebinding and operator setup:
+The deployment of the operator in question was done by configuring the CRDs, roles and
+rolebinding and operator setup:

-The definitions are as follows:
- https://github.com/prometheus-operator/prometheus-operator/tree/v0.38.3/example/prometheus-operator-crd
- https://github.com/prometheus-operator/prometheus-operator/tree/v0.38.3/example/rbac/prometheus-operator-crd
- https://github.com/prometheus-operator/prometheus-operator/tree/v0.38.3/example/rbac/prometheus-operator
+The definitions are as follows: -
+https://github.com/prometheus-operator/prometheus-operator/tree/v0.38.3/example/prometheus-operator-crd
+-
+https://github.com/prometheus-operator/prometheus-operator/tree/v0.38.3/example/rbac/prometheus-operator-crd
+-
+https://github.com/prometheus-operator/prometheus-operator/tree/v0.38.3/example/rbac/prometheus-operator

-Once the operator is correctly running, you just define a prometheus crd and it will create prometheus deployment for you.
-
-The POC lives in https://pagure.io/fedora-infra/ansible/blob/main/f/playbooks/openshift-apps/application-monitoring.yml
+Once the operator is correctly running, you just define a prometheus crd and it will
+create prometheus deployment for you.

+The POC lives in
+https://pagure.io/fedora-infra/ansible/blob/main/f/playbooks/openshift-apps/application-monitoring.yml

 Notes on application monitoring operator deployment
 ---------------------------------------------------

-The application-monitoring operator was created to solve the integration of Prometheus, Alertmanager and Grafana.
-After you configure it, it configures the relevant operators responsible for these services.
+The application-monitoring operator was created to solve the integration of Prometheus,
+Alertmanager and Grafana. After you configure it, it configures the relevant operators
+responsible for these services.

-The most interesting difference between configuring this shared operator,
-compared to configuring these operators individually is that it configures some of the integrations,
+The most interesting difference between configuring this shared operator, compared to
+configuring these operators individually is that it configures some of the integrations,
 and it integrates well with openshifts auth system through oauth proxy.

 The biggest drawback is, that the application-monitoring operator is orphanned project,
-but because it mostly configures other operators, it is relatively simple to just recreate
-the configuration for both prometheus and alertmanager to be deployed,
-and deploy the prometheus and alertmanager operators without the help or the application-monitoring operator.
+but because it mostly configures other operators, it is relatively simple to just
+recreate the configuration for both prometheus and alertmanager to be deployed, and
+deploy the prometheus and alertmanager operators without the help or the
+application-monitoring operator.

 Notes on persistence
 --------------------

-Prometheus by default expects to have a writable /prometheus folder,
-that can serve as persistent storage.
+Prometheus by default expects to have a writable /prometheus folder, that can serve as
+persistent storage.

-For the persistent volume to work for this purpose, it has to
-**needs to have POSIX-compliant filesystem**, and NFS we currently have configured is not.
-This is discussed in the `operational aspects <https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects>`_
-of Prometheus documentation
+For the persistent volume to work for this purpose, it has to **needs to have
+POSIX-compliant filesystem**, and NFS we currently have configured is not. This is
+discussed in the `operational aspects
+<https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects>`_ of
+Prometheus documentation

-The easiest supported way to have a POSIX-compliant `filesystem is to setup local-storage <https://docs.openshift.com/container-platform/3.11/install_config/configuring_local.html>`_
+The easiest supported way to have a POSIX-compliant `filesystem is to setup
+local-storage
+<https://docs.openshift.com/container-platform/3.11/install_config/configuring_local.html>`_
 in the cluster.

-In 4.x versions of OpenShift `there is a local-storage-operator <https://docs.openshift.com/container-platform/4.7/storage/persistent_storage/persistent-storage-local.html>`_ for this purpose.
+In 4.x versions of OpenShift `there is a local-storage-operator
+<https://docs.openshift.com/container-platform/4.7/storage/persistent_storage/persistent-storage-local.html>`_
+for this purpose.

-This is the simplest way to have working persistence, but it prevents us to have multiple instanes
-across openshift nodes, as the pod is using the underlying filesystem on the node.
+This is the simplest way to have working persistence, but it prevents us to have
+multiple instanes across openshift nodes, as the pod is using the underlying filesystem
+on the node.

-To ask the operator to create persisted prometheus, you specify in its configuration i.e.:
+To ask the operator to create persisted prometheus, you specify in its configuration
+i.e.:

-::
+.. code-block::

    storage:
        volumeClaimTemplate:
@ -87,27 +104,27 @@ To ask the operator to create persisted prometheus, you specify in its configura

 By default retention is set to 24 hours and can be over-ridden

-
 Notes on long term storage
--------------------
+--------------------------

 Usually, prometheus itself is setup to store its metrics for shorter ammount of time,
-and it is expected that for longterm storage and analysis, there is some other storage solution,
-such as influxdb or timescaledb.
+and it is expected that for longterm storage and analysis, there is some other storage
+solution, such as influxdb or timescaledb.

-We are currently running a POC that sychronizes Prometheus with Timescaledb (running on Postgresql)
-through a middleware service called `promscale <https://github.com/timescale/promscale>`_ .
+We are currently running a POC that sychronizes Prometheus with Timescaledb (running on
+Postgresql) through a middleware service called `promscale
+<https://github.com/timescale/promscale>`_ .

-Promscale just needs an access to a appropriate postgresql database:
-and can be configured through PROMSCALE_DB_PASSWORD, PROMSCALE_DB_HOST.
+Promscale just needs an access to a appropriate postgresql database: and can be
+configured through PROMSCALE_DB_PASSWORD, PROMSCALE_DB_HOST.

-By default it will ensure the database has timescaledb installed and configures its database
-automatically.
+By default it will ensure the database has timescaledb installed and configures its
+database automatically.

 We setup prometheus with directive to use promscale service as a backend:
 https://github.com/timescale/promscale

-::
+.. code-block::

    remote_write:
    - url: "http://promscale:9201/write"
@ -117,24 +134,27 @@ https://github.com/timescale/promscale
 Notes on auxialiary services
 ----------------------------

-As prometheus is primarily targeted to collect metrics from
-services that have beein instrumented to expose them, if your service is not instrumented,
-or it is not a service, i.e. a batch-job, you need an adapter to help you with the metrics collection.
+As prometheus is primarily targeted to collect metrics from services that have beein
+instrumented to expose them, if your service is not instrumented, or it is not a
+service, i.e. a batch-job, you need an adapter to help you with the metrics collection.

 There are two services that help with this.

-* `blackbox exporter <https://github.com/prometheus/blackbox_exporter>`_ to monitor services that have not been instrumented based on querying public a.p.i.
-* `push gateway <https://prometheus.io/docs/practices/pushing/#should-i-be-using-the-pushgateway>`_ that helps collect information from batch-jobs
+- `blackbox exporter <https://github.com/prometheus/blackbox_exporter>`_ to monitor
+  services that have not been instrumented based on querying public a.p.i.
+- `push gateway
+  <https://prometheus.io/docs/practices/pushing/#should-i-be-using-the-pushgateway>`_
+  that helps collect information from batch-jobs

-Maintaining the push-gateway can be relegated to the application developer,
-as it is lightweight, and by colloecting metrics from the namespace it is running in,
-the data will be correctly labeled.
+Maintaining the push-gateway can be relegated to the application developer, as it is
+lightweight, and by colloecting metrics from the namespace it is running in, the data
+will be correctly labeled.

 With blackbox exporter, it can be beneficial to have it running as prometheus side-car,
 in simmilar fashion, as we configure oauth-proxy, adding this to the containers section
 of the prometheus definition:

-::
+.. code-block::

    - name: blackbox-exporter
      volumeMounts:
@ -149,58 +169,65 @@ of the prometheus definition:
        - containerPort: 9115
          name: blackbox

-We can then instruct what is to be monitored through the configmap-blackbox, you can find `relevant examples <https://github.com/prometheus/blackbox_exporter/blob/master/example.yml>` in the project repo.
-Beause blackox exporter is in the same pod, we need to use the additional-scrape-config to add it in.
+We can then instruct what is to be monitored through the configmap-blackbox, you can
+find `relevant examples
+<https://github.com/prometheus/blackbox_exporter/blob/master/example.yml>` in the
+project repo. Beause blackox exporter is in the same pod, we need to use the
+additional-scrape-config to add it in.

 Notes on alerting
 -----------------

-Prometheus as is, can have rules configured that trigger alerts, once
-a specific query evaluates to true. The definition of the rule is explained in the companion docs
-for prometheus for developers and can be created in the namespace of the running application.
+Prometheus as is, can have rules configured that trigger alerts, once a specific query
+evaluates to true. The definition of the rule is explained in the companion docs for
+prometheus for developers and can be created in the namespace of the running
+application.

-Here, we need to focus what happens with alert after prometheus realizes it should fire it,
-based on a rule.
+Here, we need to focus what happens with alert after prometheus realizes it should fire
+it, based on a rule.

-In prometheus crd definition, there is a section about the alert-manager that is supposed to
-manage the forwarding of these alerts.
+In prometheus crd definition, there is a section about the alert-manager that is
+supposed to manage the forwarding of these alerts.

-::
+.. code-block::

-  alerting:
-    alertmanagers:
-      - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
-        name: alertmanager-service
-        namespace: application-monitoring
-        port: web
-        scheme: https
-        tlsConfig:
-          caFile: /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
-          serverName: alertmanager-service.application-monitoring.svc
+    alerting:
+      alertmanagers:
+        - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
+          name: alertmanager-service
+          namespace: application-monitoring
+          port: web
+          scheme: https
+          tlsConfig:
+            caFile: /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
+            serverName: alertmanager-service.application-monitoring.svc

 We already have alertmanager running and configured by the alertmanager-operator.
-Alertmanager itself is really simplistic with a simple ui and api, that allows for silencing an
-alert for a given ammount of time.
+Alertmanager itself is really simplistic with a simple ui and api, that allows for
+silencing an alert for a given ammount of time.

-It is expected that the actual user-interaction is happening elsewhere,
-either through services like OpsGenie, or through i.e. `integration with zabbix <https://devopy.io/setting-up-zabbix-alertmanager-integration/>`_
+It is expected that the actual user-interaction is happening elsewhere, either through
+services like OpsGenie, or through i.e. `integration with zabbix
+<https://devopy.io/setting-up-zabbix-alertmanager-integration/>`_

-More of a build-it yourself solution is to use i.e. https://karma-dashboard.io/,
-but we haven't tried any of these as the part of our POC.
+More of a build-it yourself solution is to use i.e. https://karma-dashboard.io/, but we
+haven't tried any of these as the part of our POC.

-To be able to be notified of the alert, you need to have the `correct reciever configuration <https://prometheus.io/docs/alerting/latest/configuration/#email_config>`_ in the alertmanagers secret:
+To be able to be notified of the alert, you need to have the `correct reciever
+configuration <https://prometheus.io/docs/alerting/latest/configuration/#email_config>`_
+in the alertmanagers secret:

-::
+.. code-block::

-global:
-  resolve_timeout: 5m
-route:
-  group_by: ['job']
-  group_wait: 10s
-  group_interval: 10s
-  repeat_interval: 30m
-  receiver: 'email'
-receivers:
- name: 'email'
-  email_configs:
-  - to: 'asaleh@redhat.com'
+    global:
+      resolve_timeout: 5m
+    route:
+      group_by: ['job']
+      group_wait: 10s
+      group_interval: 10s
+      repeat_interval: 30m
+      receiver: 'email'
+    receivers:
+    - name: 'email'
+      email_configs:
+      - to: 'asaleh@redhat.com'