infrastructure/fedora-infrastructure

Fork 0

aarch64 osbs cluster production #7411

New issue

Closed

opened 2018-11-29 11:03:37 +00:00 by cverna · 15 comments

cverna commented

2018-11-29 11:03:37 +00:00

Describe what you need us to do:

I have finally the OSBS building aarch64 images in stg, so we can move forward an get some machine to deploy a cluster in production.

Link to the original RFR https://pagure.io/fedora-infrastructure/issue/7184

Port 8443 needs to be open with the following hosts:

osbs-node01.phx2.fedoraproject.org
osbs-node02.phx2.fedoraproject.org
osbs-master01.phx2.fedoraproject.org

When do you need this? (YYYY/MM/DD)
When time allows
When is this no longer needed or useful? (YYYY/MM/DD)
If we cannot complete your request, what is the impact?

* Describe what you need us to do: I have finally the OSBS building aarch64 images in stg, so we can move forward an get some machine to deploy a cluster in production. Link to the original RFR https://pagure.io/fedora-infrastructure/issue/7184 Port 8443 needs to be open with the following hosts: ``` osbs-node01.phx2.fedoraproject.org osbs-node02.phx2.fedoraproject.org osbs-master01.phx2.fedoraproject.org ``` * When do you need this? (YYYY/MM/DD) When time allows * When is this no longer needed or useful? (YYYY/MM/DD) * If we cannot complete your request, what is the impact?

smooge commented

2018-12-03 15:41:40 +00:00

Metadata Update from @smooge:

Issue assigned to smooge

**Metadata Update from @smooge**: - Issue assigned to smooge

kevin commented

2018-12-04 00:50:09 +00:00

Can you clarify how many nodes we need here? Can we just do one like we did for staging?

cverna commented

2018-12-04 07:38:40 +00:00

Author

Can you clarify how many nodes we need here? Can we just do one like we did for staging?

it does work with one VM being master and node at the same time (this is what we have in stg). But I am not sure how this will scale, If possible I think it would be better with 2 VMs in prod, 1 master and 1 node ? if we can't get that lets go with only 1 VM and see how that performs.

> Can you clarify how many nodes we need here? Can we just do one like we did for staging? it does work with one VM being master and node at the same time (this is what we have in stg). But I am not sure how this will scale, If possible I think it would be better with 2 VMs in prod, 1 master and 1 node ? if we can't get that lets go with only 1 VM and see how that performs.

smooge commented

2018-12-04 16:00:24 +00:00

Just for everyone at home to understand the problem. Every OSBS server we bring up drops down the number of aarch64 regular builders we can have on the 23 systems we can use in the moonshot. Those are the second 'slowest' part of the build system so we like to have a lot of them to spread around the load. So we can either have 3 OSBS-nodes or we can have 3 builders.

smooge commented

2018-12-07 22:38:56 +00:00

osbs-aarch64-master has been built. Please see how this works

smooge commented

2018-12-07 22:48:07 +00:00

And it won't because of something in the plays:

TASK [osbs-namespace : query osbs namespace] ********************************************************************************************************************************
Friday 07 December 2018  22:34:17 +0000 (0:00:00.093)       0:25:02.744 *******
fatal: [osbs-aarch64-master01.arm.fedoraproject.org]: FAILED! => {"msg": "The conditional check 'namespace_result.rc != 0 and ('not found' not in namespace_result.stderr)' failed. The error was: error while evaluating conditional (namespace_result.rc != 0 and ('not found' not in namespace_result.stderr)): Unable to look up a name or access an attribute in template string ({% if namespace_result.rc != 0 and ('not found' not in namespace_result.stderr) %} True {% else %} False {% endif %}).\nMake sure your variable name does not contain invalid characters like '-': argument of type 'StrictUndefined' is not iterable"}

PLAY RECAP ******************************************************************************************************************************************************************
osbs-aarch64-master01.arm.fedoraproject.org : ok=163  changed=121  unreachable=0    failed=1

And it won't because of something in the plays: ``` TASK [osbs-namespace : query osbs namespace] ******************************************************************************************************************************** Friday 07 December 2018 22:34:17 +0000 (0:00:00.093) 0:25:02.744 ******* fatal: [osbs-aarch64-master01.arm.fedoraproject.org]: FAILED! => {"msg": "The conditional check 'namespace_result.rc != 0 and ('not found' not in namespace_result.stderr)' failed. The error was: error while evaluating conditional (namespace_result.rc != 0 and ('not found' not in namespace_result.stderr)): Unable to look up a name or access an attribute in template string ({% if namespace_result.rc != 0 and ('not found' not in namespace_result.stderr) %} True {% else %} False {% endif %}).\nMake sure your variable name does not contain invalid characters like '-': argument of type 'StrictUndefined' is not iterable"} PLAY RECAP ****************************************************************************************************************************************************************** osbs-aarch64-master01.arm.fedoraproject.org : ok=163 changed=121 unreachable=0 failed=1 ```

cverna commented

2018-12-12 07:54:55 +00:00

Author

Thanks @smooge, I am taking it from here.

The Origin container image are not yet available in our registry for aarch64 so they need to be manually pulled on the box for the first deployment

Thanks @smooge, I am taking it from here. The Origin container image are not yet available in our registry for aarch64 so they need to be manually pulled on the box for the first deployment

cverna commented

2018-12-12 07:54:56 +00:00

Author

Metadata Update from @cverna:

Issue close_status updated to: Fixed
Issue status updated to: Closed (was: Open)

**Metadata Update from @cverna**: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)

cverna commented

2018-12-12 14:15:32 +00:00

Author

So it seems that osbs-control01.phx2.fedoraproject.org cannot ssh to osbs-aarch64-master01.arm.fedoraproject.org. I checked that osbs-aarch64-master01.arm.fedoraproject.org add the correct ssh key in .ssh/authorized_keys. Is this something related to the firewall ?

Full error from the playbook

TASK [Gathering Facts] ********************************************************************************************************************************************************************************************
Wednesday 12 December 2018  14:10:02 +0000 (0:00:00.081)       0:00:01.139 **** 
fatal: [osbs-aarch64-master01.arm.fedoraproject.org]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \"osbs-aarch64-master01.arm.fedoraproject.org\". Make sure this host can be reached over ssh", "unreachable": true}

So it seems that `osbs-control01.phx2.fedoraproject.org` cannot ssh to `osbs-aarch64-master01.arm.fedoraproject.org`. I checked that `osbs-aarch64-master01.arm.fedoraproject.org` add the correct ssh key in `.ssh/authorized_keys`. Is this something related to the firewall ? Full error from the playbook ``` TASK [Gathering Facts] ******************************************************************************************************************************************************************************************** Wednesday 12 December 2018 14:10:02 +0000 (0:00:00.081) 0:00:01.139 **** fatal: [osbs-aarch64-master01.arm.fedoraproject.org]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \"osbs-aarch64-master01.arm.fedoraproject.org\". Make sure this host can be reached over ssh", "unreachable": true} ```

cverna commented

2018-12-12 14:15:39 +00:00

Author

Metadata Update from @cverna:

Issue status updated to: Open (was: Closed)

**Metadata Update from @cverna**: - Issue status updated to: Open (was: Closed)

bowlofeggs commented

2018-12-13 21:13:25 +00:00

Metadata Update from @bowlofeggs:

Issue priority set to: Waiting on Assignee (was: Needs Review)
Issue tagged with: request-for-resources

**Metadata Update from @bowlofeggs**: - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: request-for-resources

cverna commented

2019-01-03 09:59:54 +00:00

Author

So I tried to manually ssh from osbs-control01.phx2.fp.o to the aarch64 master and I get a Connection refused.

ssh -vv osbs-aarch64-master01.arm.fedoraproject.org
OpenSSH_7.4p1, OpenSSL 1.0.2k-fips  26 Jan 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 58: Applying options for *
debug2: resolving "osbs-aarch64-master01.arm.fedoraproject.org" port 22
debug2: ssh_connect_direct: needpriv 0
debug1: Connecting to osbs-aarch64-master01.arm.fedoraproject.org [10.5.129.33] port 22.
debug1: connect to address 10.5.129.33 port 22: Connection refused
ssh: connect to host osbs-aarch64-master01.arm.fedoraproject.org port 22: Connection refused

That makes me thinks that the firewall does not allow ssh between this 2 boxes, since I can successfully ssh to osbs-aarch64-master01.arm.fedoraproject.org through bastion

So I tried to manually ssh from `osbs-control01.phx2.fp.o` to the aarch64 master and I get a Connection refused. ``` ssh -vv osbs-aarch64-master01.arm.fedoraproject.org OpenSSH_7.4p1, OpenSSL 1.0.2k-fips 26 Jan 2017 debug1: Reading configuration data /etc/ssh/ssh_config debug1: /etc/ssh/ssh_config line 58: Applying options for * debug2: resolving "osbs-aarch64-master01.arm.fedoraproject.org" port 22 debug2: ssh_connect_direct: needpriv 0 debug1: Connecting to osbs-aarch64-master01.arm.fedoraproject.org [10.5.129.33] port 22. debug1: connect to address 10.5.129.33 port 22: Connection refused ssh: connect to host osbs-aarch64-master01.arm.fedoraproject.org port 22: Connection refused ``` That makes me thinks that the firewall does not allow ssh between this 2 boxes, since I can successfully ssh to `osbs-aarch64-master01.arm.fedoraproject.org` through bastion

smooge commented

2019-01-07 14:01:03 +00:00

The network firewall is indeed putting a block here. I will put in a ticket to have this opened.