Few typos in prometheus_for_ops
Signed-off-by: Pierre-Yves Chibon <pingou@pingoured.fr>
This commit is contained in:
parent
1727112869
commit
0788095004
1 changed files with 19 additions and 20 deletions
|
@ -21,9 +21,9 @@ Notes on operator deployment
|
|||
-------------------
|
||||
|
||||
Operator pattern is often used with kubernetes and openshift for more complex deployments.
|
||||
Instead of applying all of the configuration to dpeloy your services, you deploy a special,
|
||||
Instead of applying all of the configuration to deploy your services, you deploy a special,
|
||||
smaller service called operator, that has necessary permissions to deploy and configure the complex service.
|
||||
Once the operator is running, instead of configuring the service itself with servie-specific config-maps,
|
||||
Once the operator is running, instead of configuring the service itself with service-specific config-maps,
|
||||
you create operator specific kubernetes objects, so-alled CRDs.
|
||||
|
||||
The deployment of the operator in question was done by configuring the CRDs, roles and rolebinding and operator setup:
|
||||
|
@ -49,7 +49,7 @@ compared to configuring these operators individually is that it configures some
|
|||
and it integrates well with openshifts auth system through oauth proxy.
|
||||
|
||||
The biggest drawback is, that the application-monitoring operator is orphanned project,
|
||||
but because it mostly configures other operators, it is relatively simple to just recreate
|
||||
but because it mostly configures other operators, it is relatively simple to just recreate
|
||||
the configuration for both prometheus and alertmanager to be deployed,
|
||||
and deploy the prometheus and alertmanager operators without the help or the application-monitoring operator.
|
||||
|
||||
|
@ -62,15 +62,15 @@ that can serve as persistent storage.
|
|||
For the persistent volume to work for this purpose, it has to
|
||||
**needs to have POSIX-compliant filesystem**, and NFS we currently have configured is not.
|
||||
This is discussed in the `operational aspects <https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects>`_
|
||||
of Prmetheus documentation
|
||||
of Prometheus documentation
|
||||
|
||||
The easiest supported way to have a POSIX-compliant `filesystem is to setup local-storage <https://docs.openshift.com/container-platform/3.11/install_config/configuring_local.html>`_
|
||||
The easiest supported way to have a POSIX-compliant `filesystem is to setup local-storage <https://docs.openshift.com/container-platform/3.11/install_config/configuring_local.html>`_
|
||||
in the cluster.
|
||||
|
||||
In 4.x versions of OpenShift `there is a local-storage-operator <https://docs.openshift.com/container-platform/4.7/storage/persistent_storage/persistent-storage-local.html>`_ for this purpose.
|
||||
|
||||
This is the simplest way to have working persistence, but it prevents us to have multiple instanes
|
||||
across openshift nodes, as the pod is using the underlying gilesystem on the node.
|
||||
across openshift nodes, as the pod is using the underlying filesystem on the node.
|
||||
|
||||
To ask the operator to create persisted prometheus, you specify in its configuration i.e.:
|
||||
|
||||
|
@ -85,15 +85,15 @@ To ask the operator to create persisted prometheus, you specify in its configura
|
|||
requests:
|
||||
storage: 10Gi
|
||||
|
||||
By default retention is set to 24 hours and can be over-ridden
|
||||
By default retention is set to 24 hours and can be over-ridden
|
||||
|
||||
|
||||
Notes on long term storage
|
||||
--------------------
|
||||
|
||||
Usually, the prometheus itself is setup to store its metrics for shorter ammount of time,
|
||||
Usually, prometheus itself is setup to store its metrics for shorter ammount of time,
|
||||
and it is expected that for longterm storage and analysis, there is some other storage solution,
|
||||
such as influxdb, timescale.
|
||||
such as influxdb or timescaledb.
|
||||
|
||||
We are currently running a POC that sychronizes Prometheus with Timescaledb (running on Postgresql)
|
||||
through a middleware service called `promscale <https://github.com/timescale/promscale>`_ .
|
||||
|
@ -101,10 +101,10 @@ through a middleware service called `promscale <https://github.com/timescale/pro
|
|||
Promscale just needs an access to a appropriate postgresql database:
|
||||
and can be configured through PROMSCALE_DB_PASSWORD, PROMSCALE_DB_HOST.
|
||||
|
||||
By default it will ensure the database has timescale installed and cofigures its database
|
||||
By default it will ensure the database has timescaledb installed and configures its database
|
||||
automatically.
|
||||
|
||||
We setup the prometheus with directive to use promscale service as a backend:
|
||||
We setup prometheus with directive to use promscale service as a backend:
|
||||
https://github.com/timescale/promscale
|
||||
|
||||
::
|
||||
|
@ -118,14 +118,13 @@ Notes on auxialiary services
|
|||
----------------------------
|
||||
|
||||
As prometheus is primarily targeted to collect metrics from
|
||||
services that have beein instrumented to expose them, if you don't
|
||||
your service is not instrumented, or it is not a service,
|
||||
i.e. a batch-job, you need an adapter to help you with the metrics collection.
|
||||
services that have beein instrumented to expose them, if your service is not instrumented,
|
||||
or it is not a service, i.e. a batch-job, you need an adapter to help you with the metrics collection.
|
||||
|
||||
There are two services that help with this.
|
||||
|
||||
* `blackbox exporter <https://github.com/prometheus/blackbox_exporter>`_ to monitor services that have not been instruented based on querying public a.p.i.
|
||||
* `push gateqay <https://prometheus.io/docs/practices/pushing/#should-i-be-using-the-pushgateway>`_ that helps collect information from batch-jobs
|
||||
* `blackbox exporter <https://github.com/prometheus/blackbox_exporter>`_ to monitor services that have not been instrumented based on querying public a.p.i.
|
||||
* `push gateway <https://prometheus.io/docs/practices/pushing/#should-i-be-using-the-pushgateway>`_ that helps collect information from batch-jobs
|
||||
|
||||
Maintaining the push-gateway can be relegated to the application developer,
|
||||
as it is lightweight, and by colloecting metrics from the namespace it is running in,
|
||||
|
@ -151,7 +150,7 @@ of the prometheus definition:
|
|||
name: blackbox
|
||||
|
||||
We can then instruct what is to be monitored through the configmap-blackbox, you can find `relevant examples <https://github.com/prometheus/blackbox_exporter/blob/master/example.yml>` in the project repo.
|
||||
Beause blackox exporter is in the sam epod, we need to use the additional-scrape-config to add it in.
|
||||
Beause blackox exporter is in the same pod, we need to use the additional-scrape-config to add it in.
|
||||
|
||||
Notes on alerting
|
||||
-----------------
|
||||
|
@ -180,10 +179,10 @@ manage the forwarding of these alerts.
|
|||
serverName: alertmanager-service.application-monitoring.svc
|
||||
|
||||
We already have alertmanager running and configured by the alertmanager-operator.
|
||||
Alertmanager itself is really simplistic with a simple ui and api, that alows for silencing an
|
||||
Alertmanager itself is really simplistic with a simple ui and api, that allows for silencing an
|
||||
alert for a given ammount of time.
|
||||
|
||||
It it is expected that the actual user-interaction is happening elsewhere,
|
||||
It is expected that the actual user-interaction is happening elsewhere,
|
||||
either through services like OpsGenie, or through i.e. `integration with zabbix <https://devopy.io/setting-up-zabbix-alertmanager-integration/>`_
|
||||
|
||||
More of a build-it yourself solution is to use i.e. https://karma-dashboard.io/,
|
||||
|
@ -204,4 +203,4 @@ route:
|
|||
receivers:
|
||||
- name: 'email'
|
||||
email_configs:
|
||||
- to: 'asaleh@redhat.com'
|
||||
- to: 'asaleh@redhat.com'
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue