Add monitoring for the queue used by resultsdb #8392
Labels
No labels
announcement
authentication
automate
aws
backlog
blocked
bodhi
ci
Closed As
Duplicate
Closed As
Fixed
Closed As
Fixed with Explanation
Closed As
Initiative Worthy
Closed As
Insufficient data
Closed As
Invalid
Closed As
Spam
Closed As
Upstream
Closed As/Will Not
Can Not fix
cloud
communishift
copr
database
deprecated
dev
discourse
dns
downloads
easyfix
epel
factory2
firmitas
gitlab
greenwave
hardware
help wanted
high-gain
high-trouble
iad2
koji
koschei
lists
low-gain
low-trouble
mbs
medium-gain
medium-trouble
mini-initiative
mirrorlists
monitoring
Needs investigation
notifier
odcs
OpenShift
ops
OSBS
outage
packager_workflow_blocker
pagure
permissions
Priority
Needs Review
Priority
Next Meeting
Priority
🔥 URGENT 🔥
Priority
Waiting on Assignee
Priority
Waiting on External
Priority
Waiting on Reporter
rabbitmq
rdu-cc
release-monitoring
releng
repoSpanner
request-for-resources
s390x
security
SMTP
src.fp.o
staging
taiga
unfreeze
waiverdb
websites-general
wiki
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Infrastructure/fedora-infrastructure#8392
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
resultsdb01.qa has a fedora-messaging listener that has its own queue.
Monitoring how that queue is doing in nagios would alert of misbehave on the consumer (such as the one we had this week-end due to a message without body).
Is this in it's own rabbitmq instance? or in the main one?
@abompard has done some rabbit monitoring work...
Metadata Update from @kevin:
It is in the main one
Metadata Update from @mizdebsk:
I imagine https://infrastructure.fedoraproject.org/cgit/ansible.git/commit/?id=ee4a746691a0cb6edd81505a275ce82a33a40542 fixes this (sans any threshold adjustments)?
That would alert if it got too large, but wouldn't alert if there were no messages coming in from it.
We need something like the checks we have that use datanommer to query for the last message from that service.
https://apps.fedoraproject.org/datagrepper/raw?category=resultsdb
There's some nagios checks for other services like that.
Sorry if I sound like a broken record, but what does https://infrastructure.fedoraproject.org/cgit/ansible.git/commit/?id=93d0eeaf54ff57fea9c8942f25618c826b454c54 achieve, then?
oh man, you are in fact right. I somehow was looking at another commit or something. ;(
My bad... you are right, this does what we want. :)
Sorry about that.
Metadata Update from @kevin:
Those are indeed two different commits. Sorry if my link-fu confused you. :p