Commit graph

1008 commits

Author SHA1 Message Date
Stephen Smoogen
e36f982263 This should allow for ansible to build correctly the templates for noc01/noc02. 2022-11-17 12:06:00 -05:00
Seddik Alaoui Ismaili
9af427e1bf add ipv6 check for fedorapeople 2022-11-17 01:40:25 +00:00
Stephen Smoogen
b671e0e571 add phsmoura to the nagios system so they can acknowledge down systems and other events 2022-10-24 23:20:58 +00:00
Stephen Smoogen
7d31252ba0 FIX: nagios external was referencing phx2 ip addresses
The PHX2 colocation has been turned off. This meant that some configs
which had been accidently working before due to referencing an ip
address there that no longer existed broke. The fix was to rewrite the
config so that it contained proper router ips and remove all mentions
of the PHX2 ip address.

Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
2022-07-29 09:46:49 -04:00
Stephen Smoogen
a34148440d FIX: nagios was using 66.187.228.248 which is not a usable ip address on Ibiblio networks currently 2022-07-29 09:40:57 -04:00
Kevin Fenzi
b97e20c3d8 nagios: add check for ocp api ssl cert
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-07-28 17:19:04 -07:00
Kevin Fenzi
75943dfe0e websites build moved to openshift
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-06-29 18:16:33 -07:00
Kevin Fenzi
771d72e12d resultsdb01: clean up last entries
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-06-27 15:14:12 -07:00
Mikolaj Izdebski
89f28097ce nagios_server: Update koschei internal website check for ocp4 2022-06-24 17:55:10 +02:00
Kevin Fenzi
0757ae95df greenwave: change nagios check for ocp4
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-06-15 16:01:01 -07:00
Kevin Fenzi
fcc9d984da waiverdb / nagios: fix url to ocp4
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-06-15 15:37:38 -07:00
Mark O Brien
91f3d3b0bc change nagios checks for http-bodhi to only run on ocp4 proxies
Signed-off-by: Mark O Brien <markobri@redhat.com>
2022-06-09 13:17:12 +01:00
Miroslav Suchý
b5c09240f1 remove schlupova
Silvie left the team and RH
2022-06-01 11:08:14 +02:00
Kevin Fenzi
d7a8c7aa57 nagios: only check mote on value01
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-05-25 13:25:00 -07:00
Mark O Brien
75aadffd63 rename proxies_ocp4 hostgroup
Signed-off-by: Mark O Brien <markobri@redhat.com>
2022-05-16 15:08:17 +01:00
Mark O Brien
28db0aa10f update nagios checks for http-accounts for ocp4 proxies only
Signed-off-by: Mark O Brien <markobri@redhat.com>
2022-05-16 13:59:32 +01:00
Andrew Heath
81aad830e6 Fix typo 2022-04-29 18:58:50 +00:00
Andrew Heath
8795bffd2c Adding Check for pagure.io per issue 10541 2022-04-29 18:58:50 +00:00
Pavel Raiskup
120acfb3e7 copr-be: really setup the copr-be storage warning to 12%
The templates got de-synced.
2022-04-23 23:54:23 +02:00
Pavel Raiskup
e3bee776ea nagios/copr: start warning us on 12% of backend storage
There's 15T (and we can enlarge the volume to 16T).  12% is still 1.8T.
2022-03-01 10:03:04 +01:00
Kevin Fenzi
c88e89d96b retrace: fix ssl check
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-02-20 15:06:29 -08:00
Kevin Fenzi
467498bb8b retrace fixes: fix dns to work, add nagios check for ssl cert
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-02-20 13:52:35 -08:00
Kevin Fenzi
2e548a91e6 nagios_server: update what variable nagios templates use for ipv4
We changed eth0_ip and eth0_ipv4 to eth0_ipv4_ip. Update the host
templates.

Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-02-09 16:03:01 -08:00
Kevin Fenzi
6cd9a57b0b nagios: adjust hostname for copr-be, it cannot use the alias
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-02-07 13:52:13 -08:00
Silvie Chlupova
dce5318cfc copr: add nagios check for copr backend 2022-02-07 20:22:45 +00:00
Kevin Fenzi
b388a003b4 nagios: add checks for ssl certs on fcos and ocp4 endpoints, change to just checking proxy01
Add checks for ssl certs on fcos openshift endpoints.
Add checks for ocp4 wildcard certs.
Change check to only use proxy01/proxy01.stg instead of all proxies.
Ideally we really do want to check all proxies, but in practice this
results in like 70 alerts anytime the cert is going to expire.

Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-02-02 15:47:23 -08:00
Kevin Fenzi
4dda088136 nagios: remove duplicate variable check
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-01-31 10:29:21 -08:00
Silvie Chlupova
194a5503f3 copr: comment define service for copr backend, it doesn't work 2022-01-31 14:13:12 +01:00
Silvie Chlupova
5011e6a2dc copr: remove -f follow from nagios check 2022-01-31 11:51:31 +01:00
Silvie Chlupova
db6dc98940 copr: fix nagios service for checking Copr CDN
Fixes: https://pagure.io/fedora-infrastructure/issue/10508
2022-01-31 10:34:43 +01:00
Stephen Smoogen
9845cd08be fix nagios check on download.copr to use check_website_follow_ssl to remove alert 2022-01-21 11:16:55 -05:00
Pavel Raiskup
c9951efa8d nagios: disable download.copr.fedoraproject.org chack again
We don't know what's wrong on that:
HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - string 'Index of /' not found on 'https://download.copr.fedorainfracloud.org:443/' - 3692 bytes in 0.631 second response time
2022-01-21 15:29:14 +01:00
Silvie Chlupova
ba86e27e79 copr: add nagios checks for copr servers 2022-01-21 14:18:05 +01:00
Silvie Chlupova
cb2f805c26 copr: don't check copr servers using nagios for now 2022-01-20 16:35:33 +01:00
Pavel Raiskup
f7edb31e43 noc: fixup noc.yaml playbook
Per report:
Error: Could not find any hostgroup matching 'datagrepper'
(config file '/etc/nagios/services/websites.cfg', starting on line 194)"

Folow up for: 726a788721
2022-01-20 15:34:41 +01:00
Silvie Chlupova
debd3c5b7e copr: define new command for nagios
We need to use --ssl and also -f follow
2022-01-20 15:26:53 +01:00
Silvie Chlupova
6fa2999dbf copr: use already existing copr.cfg 2022-01-20 13:23:31 +01:00
Silvie Chlupova
8c5dc50c7e copr: move copr nagios services into separate file 2022-01-20 12:14:48 +01:00
Silvie Chlupova
87e510f378 copr: nagios check for copr frontend, backend and distgit
Fixes: https://pagure.io/copr/copr/issue/2002
2022-01-20 11:47:14 +01:00
Silvie Chlupova
8d9f6e0c4c copr: nagios check for copr frontend, backend and distgit
Fixes: https://pagure.io/copr/copr/issue/2002
2022-01-20 08:33:23 +00:00
Silvie Chlupova
b9fa39f0c8 copr: nagios check for Copr's CDN
Relates: https://pagure.io/fedora-infrastructure/issue/10456
2022-01-04 15:28:24 +01:00
Kevin Fenzi
0f2ae88d63 nagios: add some copr team members
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2021-11-21 14:43:57 -08:00
Eddie Jennings, Jr
6ef496d56a Reconfigure IPv6
Reconfigure IPv6

Reconfigure IPv6

Reconfigure IPv6

Reconfigure IPv6

Reconfigure IPv6

Reconfigure IPv6

Reconfigure IPv6

Configure IPv6

Reconfigure IPv6

Reconfigure IPv6

Reconfigure IPv6

Reconfigure IPv6

Reconfigure IPv6

Reconfigure IPv6

Reconfigure IPv6

Reconfigure IPv6

Configure IPv6

Update IPv6 address for noc02 rule

Update IPv6 address in confg for noc02 address change

Update IPv6 address for proxy04

Update IPv6 address for torrent02
2021-11-08 22:56:05 +00:00
Mikolaj Izdebski
26c38caafa nagios: Remove check for supybot fedmsg plugin
Zodbot no longer has fedmsg plugin installed - supybot-fedmsg package
is not installed on value02 (RHEL 8) and supybot-fedmsg upstream
project on GitHub has been archived.
2021-11-03 22:49:21 +00:00
Mikolaj Izdebski
a65fa4e1c0 nagios_server: Update hostname where zodbot is running
Zodbot is running on value02 now.
2021-11-03 16:38:34 +01:00
49c1616ca7 Update nagios check for accounts.fedoraproject.org 2021-09-29 19:04:41 +00:00
Kevin Fenzi
844177a0ae nagios: try and sepecify the additional groups another way
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2021-09-02 11:25:38 -07:00
Kevin Fenzi
d4ad74ae5e nagios / vpnclients: fix typo in previous commit
group was used, but ansible needs groups here.

Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2021-09-02 10:28:20 -07:00
Stephen Smoogen
2272ab1f6f Add in a test to make that the nagios templates try to add in groups
with no vpn.

Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
2021-08-27 11:05:40 -04:00
Kevin Fenzi
ec0d18a8b8 nagios: adjust where zodbot announces alerts, zodbot is on value02 now
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2021-08-22 10:10:10 -07:00