aarch64 osbs cluster production #7411
Labels
No labels
announcement
authentication
automate
aws
backlog
blocked
bodhi
ci
Closed As
Duplicate
Closed As
Fixed
Closed As
Fixed with Explanation
Closed As
Initiative Worthy
Closed As
Insufficient data
Closed As
Invalid
Closed As
Spam
Closed As
Upstream
Closed As/Will Not
Can Not fix
cloud
communishift
copr
database
deprecated
dev
discourse
dns
downloads
easyfix
epel
factory2
firmitas
gitlab
greenwave
hardware
help wanted
high-gain
high-trouble
iad2
koji
koschei
lists
low-gain
low-trouble
mbs
medium-gain
medium-trouble
mini-initiative
mirrorlists
monitoring
Needs investigation
notifier
odcs
OpenShift
ops
OSBS
outage
packager_workflow_blocker
pagure
permissions
Priority
Needs Review
Priority
Next Meeting
Priority
🔥 URGENT 🔥
Priority
Waiting on Assignee
Priority
Waiting on External
Priority
Waiting on Reporter
rabbitmq
rdu-cc
release-monitoring
releng
repoSpanner
request-for-resources
s390x
security
SMTP
src.fp.o
staging
taiga
unfreeze
waiverdb
websites-general
wiki
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: infrastructure/fedora-infrastructure#7411
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I have finally the OSBS building aarch64 images in stg, so we can move forward an get some machine to deploy a cluster in production.
Link to the original RFR https://pagure.io/fedora-infrastructure/issue/7184
Port 8443 needs to be open with the following hosts:
When do you need this? (YYYY/MM/DD)
When time allows
When is this no longer needed or useful? (YYYY/MM/DD)
If we cannot complete your request, what is the impact?
Metadata Update from @smooge:
Can you clarify how many nodes we need here? Can we just do one like we did for staging?
it does work with one VM being master and node at the same time (this is what we have in stg). But I am not sure how this will scale, If possible I think it would be better with 2 VMs in prod, 1 master and 1 node ? if we can't get that lets go with only 1 VM and see how that performs.
Just for everyone at home to understand the problem. Every OSBS server we bring up drops down the number of aarch64 regular builders we can have on the 23 systems we can use in the moonshot. Those are the second 'slowest' part of the build system so we like to have a lot of them to spread around the load. So we can either have 3 OSBS-nodes or we can have 3 builders.
osbs-aarch64-master has been built. Please see how this works
And it won't because of something in the plays:
Thanks @smooge, I am taking it from here.
The Origin container image are not yet available in our registry for aarch64 so they need to be manually pulled on the box for the first deployment
Metadata Update from @cverna:
So it seems that
osbs-control01.phx2.fedoraproject.org
cannot ssh toosbs-aarch64-master01.arm.fedoraproject.org
. I checked thatosbs-aarch64-master01.arm.fedoraproject.org
add the correct ssh key in.ssh/authorized_keys
. Is this something related to the firewall ?Full error from the playbook
Metadata Update from @cverna:
Metadata Update from @bowlofeggs:
So I tried to manually ssh from
osbs-control01.phx2.fp.o
to the aarch64 master and I get a Connection refused.That makes me thinks that the firewall does not allow ssh between this 2 boxes, since I can successfully ssh to
osbs-aarch64-master01.arm.fedoraproject.org
through bastionThe network firewall is indeed putting a block here. I will put in a ticket to have this opened.
This should be all done now. @cverna can you please test
Reopen if you hit some issue.
📔
Metadata Update from @kevin: