Commit graph

600 commits

Author SHA1 Message Date
Stephen Smoogen
6a6f5c0c75 Fix hostname in ping6-ipv6
A host_name in a nagios directive must match one which is defined
elsewhere in the hosts tree. For this case we needed to use the
host_name noc02-ipv6.fedoraproject.org to match what was in the ipv6
namespace.

Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
2022-11-18 15:11:38 -05:00
Kevin Fenzi
71cdddf55b nagios: move the ipv6 specific ping config to a ping-ipv6.cfg file
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-11-17 16:39:11 -08:00
Kevin Fenzi
b9b35a09ed nagios: move ping.cfg to a template so it works for both nagios servers
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-11-17 16:19:50 -08:00
Stephen Smoogen
4fe28d9291 do not put jinja2 template items into a static file 2022-11-17 16:00:22 -05:00
Stephen Smoogen
e6b3fb1904 Make it so that ipv6 is checked on hosts 2022-11-17 15:55:53 -05:00
Stephen Smoogen
e36f982263 This should allow for ansible to build correctly the templates for noc01/noc02. 2022-11-17 12:06:00 -05:00
Seddik Alaoui Ismaili
9af427e1bf add ipv6 check for fedorapeople 2022-11-17 01:40:25 +00:00
Kevin Fenzi
b97e20c3d8 nagios: add check for ocp api ssl cert
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-07-28 17:19:04 -07:00
Miroslav Suchý
b5c09240f1 remove schlupova
Silvie left the team and RH
2022-06-01 11:08:14 +02:00
Kevin Fenzi
c88e89d96b retrace: fix ssl check
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-02-20 15:06:29 -08:00
Kevin Fenzi
b388a003b4 nagios: add checks for ssl certs on fcos and ocp4 endpoints, change to just checking proxy01
Add checks for ssl certs on fcos openshift endpoints.
Add checks for ocp4 wildcard certs.
Change check to only use proxy01/proxy01.stg instead of all proxies.
Ideally we really do want to check all proxies, but in practice this
results in like 70 alerts anytime the cert is going to expire.

Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-02-02 15:47:23 -08:00
Silvie Chlupova
5011e6a2dc copr: remove -f follow from nagios check 2022-01-31 11:51:31 +01:00
Silvie Chlupova
db6dc98940 copr: fix nagios service for checking Copr CDN
Fixes: https://pagure.io/fedora-infrastructure/issue/10508
2022-01-31 10:34:43 +01:00
Silvie Chlupova
ba86e27e79 copr: add nagios checks for copr servers 2022-01-21 14:18:05 +01:00
Silvie Chlupova
cb2f805c26 copr: don't check copr servers using nagios for now 2022-01-20 16:35:33 +01:00
Silvie Chlupova
debd3c5b7e copr: define new command for nagios
We need to use --ssl and also -f follow
2022-01-20 15:26:53 +01:00
Silvie Chlupova
6fa2999dbf copr: use already existing copr.cfg 2022-01-20 13:23:31 +01:00
Silvie Chlupova
b9fa39f0c8 copr: nagios check for Copr's CDN
Relates: https://pagure.io/fedora-infrastructure/issue/10456
2022-01-04 15:28:24 +01:00
Mikolaj Izdebski
26c38caafa nagios: Remove check for supybot fedmsg plugin
Zodbot no longer has fedmsg plugin installed - supybot-fedmsg package
is not installed on value02 (RHEL 8) and supybot-fedmsg upstream
project on GitHub has been archived.
2021-11-03 22:49:21 +00:00
Mikolaj Izdebski
a65fa4e1c0 nagios_server: Update hostname where zodbot is running
Zodbot is running on value02 now.
2021-11-03 16:38:34 +01:00
Kevin Fenzi
ec0d18a8b8 nagios: adjust where zodbot announces alerts, zodbot is on value02 now
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2021-08-22 10:10:10 -07:00
Pavel Raiskup
d2f9b772e9 nagios: move copr-ping to internal 2021-08-10 08:51:55 +02:00
Pavel Raiskup
ff215ea2b9 nagios: external: define copr_* hostgroups 2021-08-09 15:25:19 +02:00
Jakub Kadlcik
9a8acc79ae nagios: enable disk monitoring for copr instances
I think that / monitoring should work by default just by
setting `nrpe: true` because of

    define service {
      hostgroup_name	all, !mincheckgrp
      service_description   Disk_Space_/
      check_command		check_by_nrpe!check_disk_/
      use                   disktemplate
    }
2021-08-09 11:45:53 +00:00
Pavel Raiskup
73ba7d25b1 copr-be: fixup copr-ping nagios mapping 2021-08-09 13:34:25 +02:00
Pavel Raiskup
0771b0e4ad copr-be: install ping nrpe task 2021-08-09 11:59:03 +02:00
Pavel Raiskup
44c172c56e copr-be: copr-ping 2021-08-05 14:48:20 +02:00
Pavel Raiskup
97e5861ac0 nagios: sync copr-be and copr-be-dev 2021-07-28 23:06:26 +02:00
Pavel Raiskup
eb66378f24 nagios: typo in copr_back => copr_back_aws 2021-07-28 16:20:45 +02:00
Pavel Raiskup
e433a17ffe nagios: add schlupov, and notify her in copr contactgroup 2021-07-28 14:49:50 +02:00
Pavel Raiskup
9eebd7387c nagios: add contact for 'praiskup' 2021-07-28 14:14:18 +02:00
Pavel Raiskup
9dd486fac8 Revert "nagios: add me and schlupov to copr contact group"
We need to define the contacts first.

This reverts commit 00b5afa1a9.
2021-07-28 14:08:45 +02:00
Pavel Raiskup
29fb33bbb7 copr-be: test remaining results storage space 2021-07-28 13:51:16 +02:00
Pavel Raiskup
00b5afa1a9 nagios: add me and schlupov to copr contact group 2021-07-28 13:41:30 +02:00
Michael Scherer
3b8504f293 Fix mention of Freenode 2021-07-02 11:17:20 +02:00
9006cf784e nagios: remove unused check_datanommer_faf 2021-05-21 18:57:09 +00:00
Kevin Fenzi
d890a9fbf4 bugzilla2fedmsg: drop checks against vm as it has moved to openshift
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2021-05-19 12:00:49 -07:00
Stephen Smoogen
7b43b8049a Update kevins config so nagios will load since 16x7 no longer exists
Signed-off-by: Stephen Smoogen <smooge@smoogespace.com>
2021-04-28 16:07:43 +00:00
Stephen Smoogen
e5a3fb3a43 Add in a 12x7 versus 16x7 and make some timeszones friendlier
Signed-off-by: Stephen Smoogen <smooge@smoogespace.com>
2021-04-28 16:07:43 +00:00
seddikalaouiismaili
890dd31cb0 script to monitor systemd units on pagure 2021-02-12 11:34:57 +00:00
Kevin Fenzi
25ace56df7 pagure.io / nagios: check only that cert is valid for 25 days
We renew letsencrypt certs at 30 days, so checking at 60 is pointless.

Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2021-02-02 14:24:07 -08:00
Kevin Fenzi
a74b4015e7 nagios: contacts
Clean up a bunch of old contacts that no longer are around
or care about getting alerts from our nagios.

Add readme file that notes that this information is public and
people should use a filtered email address for this purpose and avoid
adding sensitive information like phone numbers.

Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-10-28 11:52:24 -07:00
Kevin Fenzi
71c650baff nagios / server: drop checking for fas fedmsgs, they likely wont be back
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-10-05 17:21:08 -07:00
Kevin Fenzi
f650eab7ee nagios_server / fedmsg: pkgs01 does not run any fedmsg-hub anymore.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-10-05 17:00:15 -07:00
Kevin Fenzi
bb61f0da99 nagios / server: don't try and check mincheck group rsyslog
We want to make sure rsyslog is running on hosts, but the mincheck
hostss are ones we don't do any nrpe checks on, so we should exclude
them from this. This is like builders or aws hosts.

Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-10-02 12:56:49 -07:00
seddikalaouiismaili
e785293064 add check for rsyslogd 2020-10-02 18:50:29 +00:00
Mark O'Brien
5fe015a90a nagios server plugins: port to py3 2020-10-02 18:46:32 +00:00
Pierre-Yves Chibon
9506631012 pagure: replace pagure01 by pagure02
Signed-off-by: Pierre-Yves Chibon <pingou@pingoured.fr>
2020-10-01 16:09:14 +02:00
Mark O'Brien
b2073703e5 [nagios] add back in strp accidentally removed 2020-09-25 14:11:10 +00:00
Mark O'Brien
95eb7c75d3 [nagios] port haproxy connections script to py3 2020-09-25 14:11:10 +00:00