Commit graph

266 commits

Author SHA1 Message Date
Stephen Smoogen
7b4872c557 IAD2this should work.. I bet it wont 2020-05-15 20:23:26 -04:00
Stephen Smoogen
2f02466f90 ok this shold revert change 2020-05-15 20:15:41 -04:00
Stephen Smoogen
5327f28dfb nagios syntax is weirdly specific at times 2020-05-15 20:10:40 -04:00
Stephen Smoogen
cb8b68bc18 put more checks for able to connect 2020-05-15 19:54:12 -04:00
Stephen Smoogen
0163e646ad try a different way to get this working 2020-05-15 19:32:44 -04:00
Stephen Smoogen
d7080ffee1 add in an endif for nagios 2020-05-15 19:18:03 -04:00
Stephen Smoogen
3ba920f5bc put in tests to remove nagios from trying to talk to iad2 boxes 2020-05-15 19:06:56 -04:00
Kevin Fenzi
b606ddb322 iad2: try putting nagios_Can_Connect in
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-05-11 16:14:10 -07:00
Kevin Fenzi
7b424a3a49 iad2: nagios_server: tweak raid check to only add hosts where raid check is true
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-05-11 15:19:34 -07:00
Stephen Smoogen
328113f81a remove hosts from cloud and nagios 2020-04-24 21:34:29 +02:00
Stephen Smoogen
91bac1fc02 remove templates which no longer exist 2020-04-24 21:34:28 +02:00
Pierre-Yves Chibon
22153994ba nagios: change the string checked on the status page
Nagios warns us if status.fp.o isn't running, that's the goal.
But nagios was checking for the presence of the string:
"All systems go".
This is fine, until one system goes down. Nagios tells us about
this system, we go look at it, we (manually) update status.fp.o
so our users know that we know about the outage.
Then nagios tells us that status.fp.o isn't how it should be and
we need to go tell nagios that we know status isn't how it should
be since we updated it ourselves.

So instead of checking for "All systems go" we'll now check for
"Fedora Infrastructure Status" which is at the top of the page
and will remain there as long as status.fp.o is up and regardless
of the state of the rest of the infra.

Signed-off-by: Pierre-Yves Chibon <pingou@pingoured.fr>
2020-04-24 21:34:27 +02:00
Stephen Smoogen
8b9c9a395b fix cloud nagios 2020-04-24 21:34:27 +02:00
Rick Elrod
0135fc1102 nagios: Add script and check for checking that a timestamp within a file is within a delta of now, and then use this for alerting when websites stop building
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-04-24 21:34:26 +02:00
Stephen Smoogen
ba1b6c933d ok ping doesnt need to be a template. all.cfg needs a group which says you cant ping it 2020-04-24 21:34:25 +02:00
Stephen Smoogen
5dafec9444 == != = 2020-04-24 21:34:25 +02:00
Stephen Smoogen
a2ba26c5f4 this is ugly but its been a 12 hour day 2020-04-24 21:34:25 +02:00
Stephen Smoogen
0b6f47fe3d fix jinja2 syntax. really I know how to code 2020-04-24 21:34:25 +02:00
Stephen Smoogen
fa91e4d065 fix broken endif in all.cfg 2020-04-24 21:34:25 +02:00
Stephen Smoogen
8ea853e92b add aarch64-test to cloud_aws group 2020-04-24 21:34:25 +02:00
Stephen Smoogen
f963dfa4d5 move the ping test to its own variable 2020-04-24 21:34:25 +02:00
Stephen Smoogen
8cf6069e6d test to see why we put this in 2020-04-24 21:34:25 +02:00
Stephen Smoogen
b7f9164fb9 Add a test to see if sshd:true will allow some hosts to work in nagios all.cfg.j2 2020-04-24 21:34:25 +02:00
Stephen Smoogen
83f0c19f07 and we try yet another cargo cult dance to make the nagios gods happy 2020-04-24 21:34:25 +02:00
Stephen Smoogen
dffa4d33ea try to set this for hosts in aws cleanly and in one spot 2020-04-24 21:34:25 +02:00
Stephen Smoogen
68c936375e clean up nagios cloud to use cloud_phx2 2020-04-24 21:34:25 +02:00
Stephen Smoogen
a26e0ab63e add a restart_httpd on the badges to see if it helps cut down manual restarts 2020-04-24 21:34:25 +02:00
Stephen Smoogen
6efa60181e [nagios] small change to cloud-hosts where I forgot to remove an endif 2020-04-24 21:34:25 +02:00
Stephen Smoogen
3a2cdba4bb copr is not in the cloud and this is messing up nagios complicated templates. 2020-04-24 21:34:25 +02:00
Kevin Fenzi
4fe85007f0 nagios: fix up aws group template
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-04-24 21:34:25 +02:00
Kevin Fenzi
05ca7b4ae8 nagios: also use aws template on noc01
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-04-24 21:34:24 +02:00
Kevin Fenzi
9a255e3c41 nagios_server: try and adjust for all the aws copr instances
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-04-24 21:34:24 +02:00
Rick Elrod
6cfe3f18f0 Add some more hostgroups to excludes and extract the list out to group_vars/nagios
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-04-24 21:34:23 +02:00
Rick Elrod
cdc6091079 Revert "Revert "just use the hostname as the address here, because aws is weird with networking""
This reverts commit cecefbe1206479326f3b579bafef971532ac92bf.
2020-04-24 21:34:22 +02:00
Rick Elrod
db023609f6 Revert "try to get nagios to talk to it over the vpn"
This reverts commit 6d717abd15c08418ebcb44bef39bf8df6ad4874a.
2020-04-24 21:34:22 +02:00
Rick Elrod
0cbad706fb try to get nagios to talk to it over the vpn
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-04-24 21:34:22 +02:00
Rick Elrod
a40e911e7b Revert "just use the hostname as the address here, because aws is weird with networking"
This reverts commit 7c1646d89358852faf5d6ef28c4e9232304af145.
2020-04-24 21:34:22 +02:00
Rick Elrod
bc95213c59 just use the hostname as the address here, because aws is weird with networking
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-04-24 21:34:22 +02:00
Rick Elrod
8802e2a98f try to fix noc playbook
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-04-24 21:34:22 +02:00
Stephen Smoogen
8cd7809278 aws does not have parents 2020-04-24 21:34:21 +02:00
Stephen Smoogen
32d31b1d9d add a minimum aws template for nagios 2020-04-24 21:34:21 +02:00
Stephen Smoogen
1961cb9cb7 turns out nixnagios is not what I wanted 2020-04-24 21:34:21 +02:00
Stephen Smoogen
a54e3ce9f6 remove ci-cc-rdu from files it was hidden in 2020-04-24 21:34:19 +02:00
Rick Elrod
ea96618bd4 Get rid of modernpaste everywhere, redirect it to paste.centos.org everywhere
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-04-24 21:34:18 +02:00
Kevin Fenzi
779fa01877 autocloud: fare well autocloud, you served long and well...
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-04-24 21:34:17 +02:00
Stephen Smoogen
6174e107e0 time to remove the spanner in the works 2020-04-24 21:34:17 +02:00
Clement Verna
a75cc6f246 Bodhi: update the nagios string to check if the page is correctly loaded
Signed-off-by: Clement Verna <cverna@tutanota.com>
2020-04-24 21:34:15 +02:00
Rick Elrod
cdd8a99a92 nagios_server: s/repospanner-temp/repospanner_temp/
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-04-24 21:34:14 +02:00
Stephen Smoogen
011102c216 make sure repospanner-temp is not an empty group 2020-04-24 21:34:13 +02:00
Mikolaj Izdebski
42b2399faf nagios_server: Run internal Koschei check only against OS infra nodes 2020-04-24 21:34:11 +02:00