Stephen Smoogen
7b4872c557
IAD2this should work.. I bet it wont
2020-05-15 20:23:26 -04:00
Stephen Smoogen
2f02466f90
ok this shold revert change
2020-05-15 20:15:41 -04:00
Stephen Smoogen
5327f28dfb
nagios syntax is weirdly specific at times
2020-05-15 20:10:40 -04:00
Stephen Smoogen
cb8b68bc18
put more checks for able to connect
2020-05-15 19:54:12 -04:00
Stephen Smoogen
0163e646ad
try a different way to get this working
2020-05-15 19:32:44 -04:00
Stephen Smoogen
d7080ffee1
add in an endif for nagios
2020-05-15 19:18:03 -04:00
Stephen Smoogen
3ba920f5bc
put in tests to remove nagios from trying to talk to iad2 boxes
2020-05-15 19:06:56 -04:00
Kevin Fenzi
b606ddb322
iad2: try putting nagios_Can_Connect in
...
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-05-11 16:14:10 -07:00
Kevin Fenzi
7b424a3a49
iad2: nagios_server: tweak raid check to only add hosts where raid check is true
...
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-05-11 15:19:34 -07:00
Stephen Smoogen
328113f81a
remove hosts from cloud and nagios
2020-04-24 21:34:29 +02:00
Stephen Smoogen
91bac1fc02
remove templates which no longer exist
2020-04-24 21:34:28 +02:00
Pierre-Yves Chibon
22153994ba
nagios: change the string checked on the status page
...
Nagios warns us if status.fp.o isn't running, that's the goal.
But nagios was checking for the presence of the string:
"All systems go".
This is fine, until one system goes down. Nagios tells us about
this system, we go look at it, we (manually) update status.fp.o
so our users know that we know about the outage.
Then nagios tells us that status.fp.o isn't how it should be and
we need to go tell nagios that we know status isn't how it should
be since we updated it ourselves.
So instead of checking for "All systems go" we'll now check for
"Fedora Infrastructure Status" which is at the top of the page
and will remain there as long as status.fp.o is up and regardless
of the state of the rest of the infra.
Signed-off-by: Pierre-Yves Chibon <pingou@pingoured.fr>
2020-04-24 21:34:27 +02:00
Stephen Smoogen
8b9c9a395b
fix cloud nagios
2020-04-24 21:34:27 +02:00
Rick Elrod
0135fc1102
nagios: Add script and check for checking that a timestamp within a file is within a delta of now, and then use this for alerting when websites stop building
...
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-04-24 21:34:26 +02:00
Stephen Smoogen
ba1b6c933d
ok ping doesnt need to be a template. all.cfg needs a group which says you cant ping it
2020-04-24 21:34:25 +02:00
Stephen Smoogen
5dafec9444
== != =
2020-04-24 21:34:25 +02:00
Stephen Smoogen
a2ba26c5f4
this is ugly but its been a 12 hour day
2020-04-24 21:34:25 +02:00
Stephen Smoogen
0b6f47fe3d
fix jinja2 syntax. really I know how to code
2020-04-24 21:34:25 +02:00
Stephen Smoogen
fa91e4d065
fix broken endif in all.cfg
2020-04-24 21:34:25 +02:00
Stephen Smoogen
8ea853e92b
add aarch64-test to cloud_aws group
2020-04-24 21:34:25 +02:00
Stephen Smoogen
f963dfa4d5
move the ping test to its own variable
2020-04-24 21:34:25 +02:00
Stephen Smoogen
8cf6069e6d
test to see why we put this in
2020-04-24 21:34:25 +02:00
Stephen Smoogen
b7f9164fb9
Add a test to see if sshd:true will allow some hosts to work in nagios all.cfg.j2
2020-04-24 21:34:25 +02:00
Stephen Smoogen
83f0c19f07
and we try yet another cargo cult dance to make the nagios gods happy
2020-04-24 21:34:25 +02:00
Stephen Smoogen
dffa4d33ea
try to set this for hosts in aws cleanly and in one spot
2020-04-24 21:34:25 +02:00
Stephen Smoogen
68c936375e
clean up nagios cloud to use cloud_phx2
2020-04-24 21:34:25 +02:00
Stephen Smoogen
a26e0ab63e
add a restart_httpd on the badges to see if it helps cut down manual restarts
2020-04-24 21:34:25 +02:00
Stephen Smoogen
6efa60181e
[nagios] small change to cloud-hosts where I forgot to remove an endif
2020-04-24 21:34:25 +02:00
Stephen Smoogen
3a2cdba4bb
copr is not in the cloud and this is messing up nagios complicated templates.
2020-04-24 21:34:25 +02:00
Kevin Fenzi
4fe85007f0
nagios: fix up aws group template
...
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-04-24 21:34:25 +02:00
Kevin Fenzi
05ca7b4ae8
nagios: also use aws template on noc01
...
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-04-24 21:34:24 +02:00
Kevin Fenzi
9a255e3c41
nagios_server: try and adjust for all the aws copr instances
...
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-04-24 21:34:24 +02:00
Rick Elrod
6cfe3f18f0
Add some more hostgroups to excludes and extract the list out to group_vars/nagios
...
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-04-24 21:34:23 +02:00
Rick Elrod
cdc6091079
Revert "Revert "just use the hostname as the address here, because aws is weird with networking""
...
This reverts commit cecefbe1206479326f3b579bafef971532ac92bf.
2020-04-24 21:34:22 +02:00
Rick Elrod
db023609f6
Revert "try to get nagios to talk to it over the vpn"
...
This reverts commit 6d717abd15c08418ebcb44bef39bf8df6ad4874a.
2020-04-24 21:34:22 +02:00
Rick Elrod
0cbad706fb
try to get nagios to talk to it over the vpn
...
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-04-24 21:34:22 +02:00
Rick Elrod
a40e911e7b
Revert "just use the hostname as the address here, because aws is weird with networking"
...
This reverts commit 7c1646d89358852faf5d6ef28c4e9232304af145.
2020-04-24 21:34:22 +02:00
Rick Elrod
bc95213c59
just use the hostname as the address here, because aws is weird with networking
...
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-04-24 21:34:22 +02:00
Rick Elrod
8802e2a98f
try to fix noc playbook
...
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-04-24 21:34:22 +02:00
Stephen Smoogen
8cd7809278
aws does not have parents
2020-04-24 21:34:21 +02:00
Stephen Smoogen
32d31b1d9d
add a minimum aws template for nagios
2020-04-24 21:34:21 +02:00
Stephen Smoogen
1961cb9cb7
turns out nixnagios is not what I wanted
2020-04-24 21:34:21 +02:00
Stephen Smoogen
a54e3ce9f6
remove ci-cc-rdu from files it was hidden in
2020-04-24 21:34:19 +02:00
Rick Elrod
ea96618bd4
Get rid of modernpaste everywhere, redirect it to paste.centos.org everywhere
...
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-04-24 21:34:18 +02:00
Kevin Fenzi
779fa01877
autocloud: fare well autocloud, you served long and well...
...
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-04-24 21:34:17 +02:00
Stephen Smoogen
6174e107e0
time to remove the spanner in the works
2020-04-24 21:34:17 +02:00
Clement Verna
a75cc6f246
Bodhi: update the nagios string to check if the page is correctly loaded
...
Signed-off-by: Clement Verna <cverna@tutanota.com>
2020-04-24 21:34:15 +02:00
Rick Elrod
cdd8a99a92
nagios_server: s/repospanner-temp/repospanner_temp/
...
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-04-24 21:34:14 +02:00
Stephen Smoogen
011102c216
make sure repospanner-temp is not an empty group
2020-04-24 21:34:13 +02:00
Mikolaj Izdebski
42b2399faf
nagios_server: Run internal Koschei check only against OS infra nodes
2020-04-24 21:34:11 +02:00