Commit graph

341 commits

Author SHA1 Message Date
Kevin Fenzi
88ab378bba nagios_server: drop phx2_internal stuff, fix mailman01 to use iad2
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-06-30 14:40:14 -07:00
Kevin Fenzi
6908fbf86a nagios_server: replace phx2_internal with iad2_internal.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-06-30 14:33:11 -07:00
Kevin Fenzi
b7a5fbcc7e nagios: need a newline here
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-06-30 14:18:36 -07:00
Rick Elrod
9b531316ae nagios: fix comment
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-06-20 21:01:41 -05:00
Rick Elrod
d990cc884a nagios: move comment out of block
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-06-20 21:01:41 -05:00
Rick Elrod
853975da7d nagios: nix phx2 mgmt
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-06-20 18:31:29 -05:00
Rick Elrod
7636206d12 nagios: nuke bastion02.iad2 from here for now
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-06-20 18:25:59 -05:00
Stephen Smoogen
6a411fae1b try to clean up 2020-06-09 11:27:52 -04:00
Stephen Smoogen
cd97509505 make an iad2-internal hosts for our systems 2020-06-09 11:19:07 -04:00
Stephen Smoogen
6057fec643 make changes to allow for noc01.iad2 to see hosts. ( this is how you break^wfix nagios.. one config at a time) 2020-06-09 10:54:33 -04:00
Stephen Smoogen
bcda2606fe try to make this work in iad2 2020-06-09 10:45:02 -04:00
Stephen Smoogen
9836838b94 as we empty groups of hosts in phx2, we need to add them to the exclude_phx2_internal 2020-06-08 16:19:49 -04:00
Stephen Smoogen
e26ead0f70 try and get nagios working on noc01.iad2 2020-06-08 11:17:20 -04:00
Stephen Smoogen
b72c46c607 according to https://docs.ansible.com/ansible/2.5/dev_guide/testing/sanity/no-dict-iteritems.html iteritmes is not in python3. changing old items 2020-06-06 16:57:01 -04:00
Stephen Smoogen
fd67776bab Revert "have to add group by group to see where break is"
This reverts commit 553341ca7a.
2020-06-06 16:53:30 -04:00
Stephen Smoogen
553341ca7a have to add group by group to see where break is 2020-06-06 16:51:08 -04:00
Stephen Smoogen
7f5b96a8e4 Revert "have to add group by group to see where break is"
This reverts commit 59af0e88ab.
2020-06-06 16:50:44 -04:00
Stephen Smoogen
59af0e88ab have to add group by group to see where break is 2020-06-06 16:47:10 -04:00
Stephen Smoogen
192637532c set up things so nagios in iad2 is mostly ready. 2020-05-21 19:20:38 -04:00
Stephen Smoogen
794071b256 make mgmt interfaces faster to build 2020-05-21 16:46:41 -04:00
Stephen Smoogen
435095958d move more service groups to static files and use servicegroup definitions in services 2020-05-21 15:47:19 -04:00
Stephen Smoogen
d82e99371c use a different syntax for service groups to clean up phx2 ness 2020-05-21 15:22:48 -04:00
Stephen Smoogen
df9fcb477d move nagios ipa file to template to make less phx2 dependent 2020-05-21 14:57:41 -04:00
Stephen Smoogen
89f91a9642 Clean up nagios to deal with dropped services and that servicegroups can NOT end with a , while every other nagios group can. 2020-05-21 13:22:26 -04:00
Stephen Smoogen
211641ab19 test to see if these variables are working for iad2 2020-05-21 12:28:46 -04:00
Stephen Smoogen
7b4872c557 IAD2this should work.. I bet it wont 2020-05-15 20:23:26 -04:00
Stephen Smoogen
2f02466f90 ok this shold revert change 2020-05-15 20:15:41 -04:00
Stephen Smoogen
5327f28dfb nagios syntax is weirdly specific at times 2020-05-15 20:10:40 -04:00
Stephen Smoogen
cb8b68bc18 put more checks for able to connect 2020-05-15 19:54:12 -04:00
Stephen Smoogen
0163e646ad try a different way to get this working 2020-05-15 19:32:44 -04:00
Stephen Smoogen
d7080ffee1 add in an endif for nagios 2020-05-15 19:18:03 -04:00
Stephen Smoogen
3ba920f5bc put in tests to remove nagios from trying to talk to iad2 boxes 2020-05-15 19:06:56 -04:00
Kevin Fenzi
b606ddb322 iad2: try putting nagios_Can_Connect in
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-05-11 16:14:10 -07:00
Kevin Fenzi
7b424a3a49 iad2: nagios_server: tweak raid check to only add hosts where raid check is true
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-05-11 15:19:34 -07:00
Stephen Smoogen
328113f81a remove hosts from cloud and nagios 2020-04-24 21:34:29 +02:00
Stephen Smoogen
91bac1fc02 remove templates which no longer exist 2020-04-24 21:34:28 +02:00
Pierre-Yves Chibon
22153994ba nagios: change the string checked on the status page
Nagios warns us if status.fp.o isn't running, that's the goal.
But nagios was checking for the presence of the string:
"All systems go".
This is fine, until one system goes down. Nagios tells us about
this system, we go look at it, we (manually) update status.fp.o
so our users know that we know about the outage.
Then nagios tells us that status.fp.o isn't how it should be and
we need to go tell nagios that we know status isn't how it should
be since we updated it ourselves.

So instead of checking for "All systems go" we'll now check for
"Fedora Infrastructure Status" which is at the top of the page
and will remain there as long as status.fp.o is up and regardless
of the state of the rest of the infra.

Signed-off-by: Pierre-Yves Chibon <pingou@pingoured.fr>
2020-04-24 21:34:27 +02:00
Stephen Smoogen
8b9c9a395b fix cloud nagios 2020-04-24 21:34:27 +02:00
Rick Elrod
0135fc1102 nagios: Add script and check for checking that a timestamp within a file is within a delta of now, and then use this for alerting when websites stop building
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-04-24 21:34:26 +02:00
Stephen Smoogen
ba1b6c933d ok ping doesnt need to be a template. all.cfg needs a group which says you cant ping it 2020-04-24 21:34:25 +02:00
Stephen Smoogen
5dafec9444 == != = 2020-04-24 21:34:25 +02:00
Stephen Smoogen
a2ba26c5f4 this is ugly but its been a 12 hour day 2020-04-24 21:34:25 +02:00
Stephen Smoogen
0b6f47fe3d fix jinja2 syntax. really I know how to code 2020-04-24 21:34:25 +02:00
Stephen Smoogen
fa91e4d065 fix broken endif in all.cfg 2020-04-24 21:34:25 +02:00
Stephen Smoogen
8ea853e92b add aarch64-test to cloud_aws group 2020-04-24 21:34:25 +02:00
Stephen Smoogen
f963dfa4d5 move the ping test to its own variable 2020-04-24 21:34:25 +02:00
Stephen Smoogen
8cf6069e6d test to see why we put this in 2020-04-24 21:34:25 +02:00
Stephen Smoogen
b7f9164fb9 Add a test to see if sshd:true will allow some hosts to work in nagios all.cfg.j2 2020-04-24 21:34:25 +02:00
Stephen Smoogen
83f0c19f07 and we try yet another cargo cult dance to make the nagios gods happy 2020-04-24 21:34:25 +02:00
Stephen Smoogen
dffa4d33ea try to set this for hosts in aws cleanly and in one spot 2020-04-24 21:34:25 +02:00