Greenwave fails to access rabbitmq in iad2 #8977
Labels
No labels
announcement
authentication
automate
aws
backlog
blocked
bodhi
ci
Closed As
Duplicate
Closed As
Fixed
Closed As
Fixed with Explanation
Closed As
Initiative Worthy
Closed As
Insufficient data
Closed As
Invalid
Closed As
Spam
Closed As
Upstream
Closed As/Will Not
Can Not fix
cloud
communishift
copr
database
deprecated
dev
discourse
dns
downloads
easyfix
epel
factory2
firmitas
gitlab
greenwave
hardware
help wanted
high-gain
high-trouble
iad2
koji
koschei
lists
low-gain
low-trouble
mbs
medium-gain
medium-trouble
mini-initiative
mirrorlists
monitoring
Needs investigation
notifier
odcs
OpenShift
ops
OSBS
outage
packager_workflow_blocker
pagure
permissions
Priority
Needs Review
Priority
Next Meeting
Priority
🔥 URGENT 🔥
Priority
Waiting on Assignee
Priority
Waiting on External
Priority
Waiting on Reporter
rabbitmq
rdu-cc
release-monitoring
releng
repoSpanner
request-for-resources
s390x
security
SMTP
src.fp.o
staging
taiga
unfreeze
waiverdb
websites-general
wiki
No milestone
No project
No assignees
7 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Infrastructure/fedora-infrastructure#8977
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
When trying to deploy greenwave in openshift in iad2, it looks like the fedora-messaging consumer/pod is failing to start.
The logs show the error:
Could this be a port issue?
Note: I figure this is also going to impact bodhi when we get to it, but I guess solving this one will also solve it for it (bodhi).
@abompard Can you take a look at it? Thanks.
Metadata Update from @mohanboddu:
This is an issue also for ODCS (ticket closed as duplicate: https://pagure.io/fedora-infrastructure/issue/8978).
At least copr deployment scripts seem to be affected:
OK @praiskup 's problem here is slightly different:
the def file roles/rabbit/queue/defaults/main.yml was changed to have an environment variable trying to set up the rabbitmq per datacenter. Your system is in the aws datacenter so it it going to break. I will try to figure out a fix.
So, rabbitmq.fedoraproject.org in iad2 is resolving to:
rabbitmq.fedoraproject.org has address 209.132.181.15
rabbitmq.fedoraproject.org has address 209.132.181.16
it should be connecting to the phx2 (currently active cluster). I am not sure why it's not, thats the part that needs more investigation.
The second issue is a change I made to make iad2 playbooks work back when we were initially installing/testing things. We can likely revert that to phx2 now and then when we move the rabbitmq cluster to iad2 monday we can switch it to iad2. Or we could just leave it and switch it monday.
Can everyone please try this again? I think it's working (I see greenwave connected at least).
@kevin, ODCS can connect to rabbitmq now, but it timeouts on
Authenticating with server using x509 (certfile: /etc/odcs/odcs-rabbitmq.crt, keyfile: /etc/odcs/odcs-rabbitmq.key)
I'm also not sure how sane the current configuration is. Based on your comment, it seems iad2 services are connected to the same rabbitmq instance as phx2 services which probably mixes them together.
right now there are 2 independent clusters. One in iad2 and one in phx2.
All instances everywhere should resolve rabbitmq.fedoraproject.org to the phx2 one.
However, we are moving that one later today and then there will be only one cluster... in iad2.
It seems this is still/again broken for ODCS:
greenwave consumer is now up and running, closing this ticket. Another ticket was created to track the other hosts that have an issue with the connection to rabbitmq --> https://pagure.io/fedora-infrastructure/issue/9003
Metadata Update from @cverna: