We must create a `install-config.yaml` file, use the following example for inspiration, alternatively refer to the documentation[1] for more detailed information/explainations.
----
apiVersion: v1
baseDomain: stg.fedoraproject.org
compute:
- hyperthreading: Enabled
name: worker
replicas: 0
controlPlane:
hyperthreading: Enabled
name: master
replicas: 3
metadata:
name: 'ocp'
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
networkType: OpenShiftSDN
serviceNetwork:
- 172.30.0.0/16
platform:
none: {}
fips: false
pullSecret: 'PUT PULL SECRET HERE'
sshKey: 'PUT SSH PUBLIC KEY HERE kubeadmin@core'
----
* Login to the `os-control01` corresponding with the environment
* Make a directory to hold the installation files: `mkdir ocp4-<ENV>`
* Enter this newly created directory: `cd ocp4-<ENV>`
* Generate a fresh SSH keypair: `ssh-keygen -f ./ocp4-<ENV>-ssh`
* Create a `ssh` directory and place this keypair into it.
* Put the contents of the public key in the `sshKey` value in the `install-config.yaml` file
* Put the contents of your Pull Secret in the `pullSecret` value in the `install-config.yaml`
* Take a backup of the `install-config.yaml` to `install-config.yaml.bak`, as running the next steps consumes this file, having a backup allows you to recover from mistakes quickly.
Using the `openshift-install` tool we can generate the installation files. Make sure that the `install-config.yaml` file is in the `/path/to/ocp4-<ENV>` location before attempting the next steps.
The manifest files are human readable, at this stage you can put any customisations required before the installation begins.
* Create the manifests: `openshift-install create manifests --dir=/path/to/ocp4-<ENV>`
* All configuration for RHCOS must be done via MachineConfigs configuration. If there is known configuration which must be performed, such as NTP, you can copy the MachineConfigs into the `/path/to/ocp4-<ENV>/openshift` directory now.
* The following step should be performed at this point, edit the `/path/to/ocp4-<ENV>/manifests/cluster-scheduler-02-config.yml` change the `mastersSchedulable` value to `false`.
The ignition files have been generated from the manifests and MachineConfig files to generate the final installation files for the three roles: `bootstrap`, `master`, `worker`. In Fedora we prefer not to use the term `master` here, we have renamed this role to `controlplane`.
* Create the ignition files: `openshift-install create ignition-configs --dir=/path/to/ocp4-<ENV>`
* At this point you should have the following three files: `bootstrap.ign`, `master.ign` and `worker.ign`.
* Rename the `master.ign` to `controlplane.ign`.
* A directory has been created, `auth`. This contains two files: `kubeadmin-password` and `kubeconfig`. These allow `cluster-admin` access to the cluster.
On the `batcave01` at the following location: `/srv/web/infra/bigfiles/openshiftboot/`:
* Create a directory to match the environment: `mkdir /srv/web/infra/bigfiles/openshiftboot/ocp4-<ENV>`
* Copy the ignition files, the ssh files and the auth files generated in previous steps, to this newly created directory. Users with `sysadmin-openshift` should have the necessary permissions to write to this location.
* when this is complete it should look like the following:
The DNS and DHCP configuration must also be updated. This https://pagure.io/fedora-infra/ansible/pull-request/765[PR] contains the necessiary changes DHCP for prod and can be done in ansible.
However the DNS changes may only be performed by `sysadmin-main`. For this reason any DNS changes must go via a patch snippet which is emailed to the `infrastructure@lists.fedoraproject.org` mailing list for review and approval. This process may take several days.
This is beyond the scope of this SOP, the best option is to create a ticket for Fedora Infra to request that these certs are created and available for use. The following certs should be available:
There are a number of playbooks required to be run. Once all the previous steps have been reached, we can run these playbooks from the `batcave01` instance.
Depending on if some of the nodes are VMs or baremetal, different tags should be supplied to the following playbook. If the entire cluster is baremetal you can skip the `kvm_deploy` tag entirely.
If there are VMs used for some of the roles, make sure to leave it in.
At this point we can switch on the baremetal nodes and begin the PXE/UEFI boot process. The baremetal nodes should via DHCP/DNS have the configuration necessary to reach out to the `noc01.rdu3.fedoraproject.org` server and retrieve the UEFI boot configuration via PXE.
Once booted up, you should visit the management console for this node, and manually choose the UEFI configuration appropriate for its role.
The node will begin booting, and during the boot process it will reach out to the `os-control01` instance specific to the `<ENV>` to retrieve the ignition file appropriate to its role.
The system will then become autonomous, it will install and potentially reboot multiple times as updates are retrieved/applied etc.
Eventually you will be presented with a SSH login prompt, where it should have the correct hostname eg: `ocp01` to match what is in the DNS configuration.
When the control plane is up, we should see all controlplane instances available in the appropriate haproxy dashboard. eg: https://admin.fedoraproject.org/haproxy/proxy01=ocp-masters-backend-kapi[haproxy].
At this time we should take the `bootstrap` instance out of the haproxy load balancer.
- Make the necessiary changes to ansible at: `ansible/roles/haproxy/templates/haproxy.cfg`
- Once merged, run the following playbook once more: `sudo rbac-playbook groups/proxies.yml -t 'haproxy'`