Make nagios check_ssl_cert checks happen from one server instead of from all proxies #8209
Labels
No labels
announcement
authentication
automate
aws
backlog
blocked
bodhi
ci
Closed As
Duplicate
Closed As
Fixed
Closed As
Fixed with Explanation
Closed As
Initiative Worthy
Closed As
Insufficient data
Closed As
Invalid
Closed As
Spam
Closed As
Upstream
Closed As/Will Not
Can Not fix
cloud
communishift
copr
database
deprecated
dev
discourse
dns
downloads
easyfix
epel
factory2
firmitas
gitlab
greenwave
hardware
help wanted
high-gain
high-trouble
iad2
koji
koschei
lists
low-gain
low-trouble
mbs
medium-gain
medium-trouble
mini-initiative
mirrorlists
monitoring
Needs investigation
notifier
odcs
OpenShift
ops
OSBS
outage
packager_workflow_blocker
pagure
permissions
Priority
Needs Review
Priority
Next Meeting
Priority
🔥 URGENT 🔥
Priority
Waiting on Assignee
Priority
Waiting on External
Priority
Waiting on Reporter
rabbitmq
rdu-cc
release-monitoring
releng
repoSpanner
request-for-resources
s390x
security
SMTP
src.fp.o
staging
taiga
unfreeze
waiverdb
websites-general
wiki
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Infrastructure/fedora-infrastructure#8209
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
In some cases, when a cert is about to expire, we get an alert about it from every single proxy, which is excessive. This notably happens with the checks in
nagios_server/files/nagios/services/ssl.cfg
which havehostgroup_name proxies
.These should all just be listed under one server, not all the proxies for our sanities.
This is a relatively simple change and would make a good easyfix for someone looking to make one of their first patches.
Metadata Update from @kevin:
This is easy enough to change, but... one reason we wanted all the proxies checked was in case we rolled out a new cert and somehow some subset of them didn't update and still had the old cert.
ie, we could just check one, but then we might miss if there's others that are effected.
Not sure whats best here...
@codeblock you still want to try and do this some different way?
I wonder if we could make them all depend on each other as far as nagios knows and the other ones would all be deps of the first one in the case where they are all bad, but only one would alert in the case of one bad?
I'll take a look, lets see what I can come up with
@lgriffin
I wonder if this will work. It checks the certs on koji.fp.o and proxy03.fp.o and suppresses warnings of the koji and proxies hostgroups when this check already returns WARN or CRIT. pagure as a single host can send out notifications whenever necessary.
caveat: we won't get notifications about ssl certs from koji and proxies hostgroups until the certs on koji.fp.o and proxie03.fp.o get fixed.
diff --git a/roles/nagios_server/files/nagios/services/ssl.cfg b/roles/nagios_server/files/nagios/services/ssl.cfg
index 275571cc9..b857c8fd2 100644
--- a/roles/nagios_server/files/nagios/services/ssl.cfg
+++ b/roles/nagios_server/files/nagios/services/ssl.cfg
@@ -39,3 +39,19 @@ define service {
check_command check_ssl_cert!pagure.io!60
use defaulttemplate
}
+
+define servicedependency {
+}
+define servicedependency {
+}
Added.
Metadata Update from @kevin: