Commit graph

497 commits

Author SHA1 Message Date
Kevin Fenzi
c88e89d96b retrace: fix ssl check
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-02-20 15:06:29 -08:00
Kevin Fenzi
b388a003b4 nagios: add checks for ssl certs on fcos and ocp4 endpoints, change to just checking proxy01
Add checks for ssl certs on fcos openshift endpoints.
Add checks for ocp4 wildcard certs.
Change check to only use proxy01/proxy01.stg instead of all proxies.
Ideally we really do want to check all proxies, but in practice this
results in like 70 alerts anytime the cert is going to expire.

Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-02-02 15:47:23 -08:00
Silvie Chlupova
5011e6a2dc copr: remove -f follow from nagios check 2022-01-31 11:51:31 +01:00
Silvie Chlupova
db6dc98940 copr: fix nagios service for checking Copr CDN
Fixes: https://pagure.io/fedora-infrastructure/issue/10508
2022-01-31 10:34:43 +01:00
Silvie Chlupova
ba86e27e79 copr: add nagios checks for copr servers 2022-01-21 14:18:05 +01:00
Silvie Chlupova
cb2f805c26 copr: don't check copr servers using nagios for now 2022-01-20 16:35:33 +01:00
Silvie Chlupova
debd3c5b7e copr: define new command for nagios
We need to use --ssl and also -f follow
2022-01-20 15:26:53 +01:00
Silvie Chlupova
6fa2999dbf copr: use already existing copr.cfg 2022-01-20 13:23:31 +01:00
Silvie Chlupova
b9fa39f0c8 copr: nagios check for Copr's CDN
Relates: https://pagure.io/fedora-infrastructure/issue/10456
2022-01-04 15:28:24 +01:00
Mikolaj Izdebski
26c38caafa nagios: Remove check for supybot fedmsg plugin
Zodbot no longer has fedmsg plugin installed - supybot-fedmsg package
is not installed on value02 (RHEL 8) and supybot-fedmsg upstream
project on GitHub has been archived.
2021-11-03 22:49:21 +00:00
Mikolaj Izdebski
a65fa4e1c0 nagios_server: Update hostname where zodbot is running
Zodbot is running on value02 now.
2021-11-03 16:38:34 +01:00
Kevin Fenzi
ec0d18a8b8 nagios: adjust where zodbot announces alerts, zodbot is on value02 now
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2021-08-22 10:10:10 -07:00
Pavel Raiskup
d2f9b772e9 nagios: move copr-ping to internal 2021-08-10 08:51:55 +02:00
Pavel Raiskup
ff215ea2b9 nagios: external: define copr_* hostgroups 2021-08-09 15:25:19 +02:00
Jakub Kadlcik
9a8acc79ae nagios: enable disk monitoring for copr instances
I think that / monitoring should work by default just by
setting `nrpe: true` because of

    define service {
      hostgroup_name	all, !mincheckgrp
      service_description   Disk_Space_/
      check_command		check_by_nrpe!check_disk_/
      use                   disktemplate
    }
2021-08-09 11:45:53 +00:00
Pavel Raiskup
73ba7d25b1 copr-be: fixup copr-ping nagios mapping 2021-08-09 13:34:25 +02:00
Pavel Raiskup
0771b0e4ad copr-be: install ping nrpe task 2021-08-09 11:59:03 +02:00
Pavel Raiskup
44c172c56e copr-be: copr-ping 2021-08-05 14:48:20 +02:00
Pavel Raiskup
97e5861ac0 nagios: sync copr-be and copr-be-dev 2021-07-28 23:06:26 +02:00
Pavel Raiskup
eb66378f24 nagios: typo in copr_back => copr_back_aws 2021-07-28 16:20:45 +02:00
Pavel Raiskup
e433a17ffe nagios: add schlupov, and notify her in copr contactgroup 2021-07-28 14:49:50 +02:00
Pavel Raiskup
9eebd7387c nagios: add contact for 'praiskup' 2021-07-28 14:14:18 +02:00
Pavel Raiskup
9dd486fac8 Revert "nagios: add me and schlupov to copr contact group"
We need to define the contacts first.

This reverts commit 00b5afa1a9.
2021-07-28 14:08:45 +02:00
Pavel Raiskup
29fb33bbb7 copr-be: test remaining results storage space 2021-07-28 13:51:16 +02:00
Pavel Raiskup
00b5afa1a9 nagios: add me and schlupov to copr contact group 2021-07-28 13:41:30 +02:00
Michael Scherer
3b8504f293 Fix mention of Freenode 2021-07-02 11:17:20 +02:00
9006cf784e nagios: remove unused check_datanommer_faf 2021-05-21 18:57:09 +00:00
Kevin Fenzi
d890a9fbf4 bugzilla2fedmsg: drop checks against vm as it has moved to openshift
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2021-05-19 12:00:49 -07:00
Stephen Smoogen
7b43b8049a Update kevins config so nagios will load since 16x7 no longer exists
Signed-off-by: Stephen Smoogen <smooge@smoogespace.com>
2021-04-28 16:07:43 +00:00
Stephen Smoogen
e5a3fb3a43 Add in a 12x7 versus 16x7 and make some timeszones friendlier
Signed-off-by: Stephen Smoogen <smooge@smoogespace.com>
2021-04-28 16:07:43 +00:00
seddikalaouiismaili
890dd31cb0 script to monitor systemd units on pagure 2021-02-12 11:34:57 +00:00
Kevin Fenzi
25ace56df7 pagure.io / nagios: check only that cert is valid for 25 days
We renew letsencrypt certs at 30 days, so checking at 60 is pointless.

Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2021-02-02 14:24:07 -08:00
Kevin Fenzi
a74b4015e7 nagios: contacts
Clean up a bunch of old contacts that no longer are around
or care about getting alerts from our nagios.

Add readme file that notes that this information is public and
people should use a filtered email address for this purpose and avoid
adding sensitive information like phone numbers.

Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-10-28 11:52:24 -07:00
Kevin Fenzi
71c650baff nagios / server: drop checking for fas fedmsgs, they likely wont be back
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-10-05 17:21:08 -07:00
Kevin Fenzi
f650eab7ee nagios_server / fedmsg: pkgs01 does not run any fedmsg-hub anymore.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-10-05 17:00:15 -07:00
Kevin Fenzi
bb61f0da99 nagios / server: don't try and check mincheck group rsyslog
We want to make sure rsyslog is running on hosts, but the mincheck
hostss are ones we don't do any nrpe checks on, so we should exclude
them from this. This is like builders or aws hosts.

Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-10-02 12:56:49 -07:00
seddikalaouiismaili
e785293064 add check for rsyslogd 2020-10-02 18:50:29 +00:00
Mark O'Brien
5fe015a90a nagios server plugins: port to py3 2020-10-02 18:46:32 +00:00
Pierre-Yves Chibon
9506631012 pagure: replace pagure01 by pagure02
Signed-off-by: Pierre-Yves Chibon <pingou@pingoured.fr>
2020-10-01 16:09:14 +02:00
Mark O'Brien
b2073703e5 [nagios] add back in strp accidentally removed 2020-09-25 14:11:10 +00:00
Mark O'Brien
95eb7c75d3 [nagios] port haproxy connections script to py3 2020-09-25 14:11:10 +00:00
Kevin Fenzi
eed8859c64 pdc-backend: clean up last bits of pdc-backend hosts.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-08-24 09:32:04 -07:00
Rick Elrod
dcc53bd63b add crl check to nagios + nrpe + facl perms for nrpe
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-08-06 15:32:09 -05:00
Francois Andrieu
e1b6248a4c nagios: add check_postfix_redhat to bastion01 2020-07-22 19:41:11 +00:00
Kevin Fenzi
349dec197c nagios_seever/ irc colorize: 2to3 run to move to python3
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-07-01 16:45:28 -07:00
Kevin Fenzi
9770bae604 nagios_server: use iad2-mgmt-http.cfg
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-06-30 16:13:26 -07:00
Kevin Fenzi
9d9d7f6c5c nagios_server: more adjustments, drop fas for now, fix gateway hosts harder
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-06-30 15:59:32 -07:00
Kevin Fenzi
88ab378bba nagios_server: drop phx2_internal stuff, fix mailman01 to use iad2
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-06-30 14:40:14 -07:00
Rick Elrod
d9a23d9930 nagios: nix basset checks for now
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-06-20 21:01:41 -05:00
Rick Elrod
f6c5bac836 nagios: comment out qahardware ref for now
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-06-20 21:01:41 -05:00