215 lines
11 KiB
Text
215 lines
11 KiB
Text
= SOP Installation/Configuration of OCP4 on Fedora Infra
|
|
|
|
== Resources
|
|
|
|
- [1]: https://docs.openshift.com/container-platform/4.8/installing/installing_bare_metal/[Official OCP4 Installation Documentation]
|
|
|
|
== Install
|
|
To install OCP4 on Fedora Infra, one must be apart of the following groups:
|
|
|
|
- `sysadmin-openshift`
|
|
- `sysadmin-noc`
|
|
|
|
|
|
=== Prerequisites
|
|
Visit the https://console.redhat.com/openshift/install/metal/user-provisioned[OpenShift Console] and download the following OpenShift tools:
|
|
|
|
* A Red Hat Access account is required
|
|
* OC client tools https://access.redhat.com/downloads/content/290/ver=4.8/rhel---8/4.8.10/x86_64/product-software[Here]
|
|
* OC installation tool https://access.redhat.com/downloads/content/290/ver=4.8/rhel---8/4.8.10/x86_64/product-software[Here]
|
|
* Ensure the downloaded tools are available on the `PATH`
|
|
* A valid OCP4 subscription is required to complete the installation configuration, by default you have a 60 day trial.
|
|
* Take a copy of your pull secret file you will need to put this in the `install-config.yaml` file in the next step.
|
|
|
|
|
|
=== Generate install-config.yaml file
|
|
We must create a `install-config.yaml` file, use the following example for inspiration, alternatively refer to the documentation[1] for more detailed information/explainations.
|
|
|
|
----
|
|
apiVersion: v1
|
|
baseDomain: stg.fedoraproject.org
|
|
compute:
|
|
- hyperthreading: Enabled
|
|
name: worker
|
|
replicas: 0
|
|
controlPlane:
|
|
hyperthreading: Enabled
|
|
name: master
|
|
replicas: 3
|
|
metadata:
|
|
name: 'ocp'
|
|
networking:
|
|
clusterNetwork:
|
|
- cidr: 10.128.0.0/14
|
|
hostPrefix: 23
|
|
networkType: OpenShiftSDN
|
|
serviceNetwork:
|
|
- 172.30.0.0/16
|
|
platform:
|
|
none: {}
|
|
fips: false
|
|
pullSecret: 'PUT PULL SECRET HERE'
|
|
sshKey: 'PUT SSH PUBLIC KEY HERE kubeadmin@core'
|
|
----
|
|
|
|
* Login to the `os-control01` corresponding with the environment
|
|
* Make a directory to hold the installation files: `mkdir ocp4-<ENV>`
|
|
* Enter this newly created directory: `cd ocp4-<ENV>`
|
|
* Generate a fresh SSH keypair: `ssh-keygen -f ./ocp4-<ENV>-ssh`
|
|
* Create a `ssh` directory and place this keypair into it.
|
|
* Put the contents of the public key in the `sshKey` value in the `install-config.yaml` file
|
|
* Put the contents of your Pull Secret in the `pullSecret` value in the `install-config.yaml`
|
|
* Take a backup of the `install-config.yaml` to `install-config.yaml.bak`, as running the next steps consumes this file, having a backup allows you to recover from mistakes quickly.
|
|
|
|
|
|
=== Create the Installation Files
|
|
Using the `openshift-install` tool we can generate the installation files. Make sure that the `install-config.yaml` file is in the `/path/to/ocp4-<ENV>` location before attempting the next steps.
|
|
|
|
==== Create the Manifest Files
|
|
The manifest files are human readable, at this stage you can put any customisations required before the installation begins.
|
|
|
|
* Create the manifests: `openshift-install create manifests --dir=/path/to/ocp4-<ENV>`
|
|
* All configuration for RHCOS must be done via MachineConfigs configuration. If there is known configuration which must be performed, such as NTP, you can copy the MachineConfigs into the `/path/to/ocp4-<ENV>/openshift` directory now.
|
|
* The following step should be performed at this point, edit the `/path/to/ocp4-<ENV>/manifests/cluster-scheduler-02-config.yml` change the `mastersSchedulable` value to `false`.
|
|
|
|
|
|
==== Create the Ignition Files
|
|
The ignition files have been generated from the manifests and MachineConfig files to generate the final installation files for the three roles: `bootstrap`, `master`, `worker`. In Fedora we prefer not to use the term `master` here, we have renamed this role to `controlplane`.
|
|
|
|
* Create the ignition files: `openshift-install create ignition-configs --dir=/path/to/ocp4-<ENV>`
|
|
* At this point you should have the following three files: `bootstrap.ign`, `master.ign` and `worker.ign`.
|
|
* Rename the `master.ign` to `controlplane.ign`.
|
|
* A directory has been created, `auth`. This contains two files: `kubeadmin-password` and `kubeconfig`. These allow `cluster-admin` access to the cluster.
|
|
|
|
|
|
=== Copy the Ignition files to the `batcave01` server
|
|
On the `batcave01` at the following location: `/srv/web/infra/bigfiles/openshiftboot/`:
|
|
|
|
* Create a directory to match the environment: `mkdir /srv/web/infra/bigfiles/openshiftboot/ocp4-<ENV>`
|
|
* Copy the ignition files, the ssh files and the auth files generated in previous steps, to this newly created directory. Users with `sysadmin-openshift` should have the necessary permissions to write to this location.
|
|
* when this is complete it should look like the following:
|
|
----
|
|
├── <ENV>
|
|
│ ├── auth
|
|
│ │ ├── kubeadmin-password
|
|
│ │ └── kubeconfig
|
|
│ ├── bootstrap.ign
|
|
│ ├── controlplane.ign
|
|
│ ├── ssh
|
|
│ │ ├── id_rsa
|
|
│ │ └── id_rsa.pub
|
|
│ └── worker.ign
|
|
----
|
|
|
|
|
|
=== Update the ansible inventory
|
|
The ansible inventory/hostvars/group vars should be updated with the new hosts information.
|
|
|
|
For inspiration see the following https://pagure.io/fedora-infra/ansible/pull-request/765[PR] where we added the ocp4 production changes.
|
|
|
|
|
|
=== Update the DNS/DHCP configuration
|
|
The DNS and DHCP configuration must also be updated. This https://pagure.io/fedora-infra/ansible/pull-request/765[PR] contains the necessiary changes DHCP for prod and can be done in ansible.
|
|
|
|
However the DNS changes may only be performed by `sysadmin-main`. For this reason any DNS changes must go via a patch snippet which is emailed to the `infrastructure@lists.fedoraproject.org` mailing list for review and approval. This process may take several days.
|
|
|
|
|
|
=== Generate the TLS Certs for the new environment
|
|
This is beyond the scope of this SOP, the best option is to create a ticket for Fedora Infra to request that these certs are created and available for use. The following certs should be available:
|
|
|
|
- `*.apps.<ENV>.fedoraproject.org`
|
|
- `api.<ENV>.fedoraproject.org`
|
|
- `api-int.<ENV>.fedoraproject.org`
|
|
|
|
|
|
=== Run the Playbooks
|
|
There are a number of playbooks required to be run. Once all the previous steps have been reached, we can run these playbooks from the `batcave01` instance.
|
|
|
|
- `sudo rbac-playbook groups/noc.yml -t 'tftp_server,dhcp_server'`
|
|
- `sudo rbac-playbook groups/proxies.yml -t 'haproxy,httpd,iptables'`
|
|
|
|
|
|
==== Baremetal / VMs
|
|
Depending on if some of the nodes are VMs or baremetal, different tags should be supplied to the following playbook. If the entire cluster is baremetal you can skip the `kvm_deploy` tag entirely.
|
|
|
|
If there are VMs used for some of the roles, make sure to leave it in.
|
|
|
|
- `sudo rbac-playbook manual/ocp4-place-ignitionfiles.yml -t "ignition,repo,kvm_deploy"`
|
|
|
|
|
|
==== Baremetal
|
|
At this point we can switch on the baremetal nodes and begin the PXE/UEFI boot process. The baremetal nodes should via DHCP/DNS have the configuration necessary to reach out to the `noc01.rdu3.fedoraproject.org` server and retrieve the UEFI boot configuration via PXE.
|
|
|
|
Once booted up, you should visit the management console for this node, and manually choose the UEFI configuration appropriate for its role.
|
|
|
|
The node will begin booting, and during the boot process it will reach out to the `os-control01` instance specific to the `<ENV>` to retrieve the ignition file appropriate to its role.
|
|
|
|
The system will then become autonomous, it will install and potentially reboot multiple times as updates are retrieved/applied etc.
|
|
|
|
Eventually you will be presented with a SSH login prompt, where it should have the correct hostname eg: `ocp01` to match what is in the DNS configuration.
|
|
|
|
|
|
=== Bootstrapping completed
|
|
When the control plane is up, we should see all controlplane instances available in the appropriate haproxy dashboard. eg: https://admin.fedoraproject.org/haproxy/proxy01=ocp-masters-backend-kapi[haproxy].
|
|
|
|
At this time we should take the `bootstrap` instance out of the haproxy load balancer.
|
|
|
|
- Make the necessiary changes to ansible at: `ansible/roles/haproxy/templates/haproxy.cfg`
|
|
- Once merged, run the following playbook once more: `sudo rbac-playbook groups/proxies.yml -t 'haproxy'`
|
|
|
|
|
|
=== Begin instllation of the worker nodes
|
|
Follow the same processes listed in the Baremetal section above to switch on the worker nodes and begin installation.
|
|
|
|
|
|
=== Configure the `os-control01` to authenticate with the new OCP4 cluster
|
|
Copy the `kubeconfig` to `~root/.kube/config` on the `os-control01` instance.
|
|
This will allow the `root` user to automatically be authenticated to the new OCP4 cluster with `cluster-admin` privileges.
|
|
|
|
|
|
=== Accept Node CSR Certs
|
|
To accept the worker/compute nodes into the cluster we need to accept their CSR certs.
|
|
|
|
List the CSR certs. The ones we're interested in will show as pending:
|
|
|
|
----
|
|
oc get csr
|
|
----
|
|
|
|
To accept all the OCP4 node CSRs in a one liner do the following:
|
|
|
|
----
|
|
oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
|
|
----
|
|
|
|
This should look something like this once completed:
|
|
|
|
----
|
|
[root@os-control01 ocp4][STG]= oc get nodes
|
|
NAME STATUS ROLES AGE VERSION
|
|
ocp01.ocp.stg.rdu3.fedoraproject.org Ready master 34d v1.21.1+9807387
|
|
ocp02.ocp.stg.rdu3.fedoraproject.org Ready master 34d v1.21.1+9807387
|
|
ocp03.ocp.stg.rdu3.fedoraproject.org Ready master 34d v1.21.1+9807387
|
|
worker01.ocp.stg.rdu3.fedoraproject.org Ready worker 21d v1.21.1+9807387
|
|
worker02.ocp.stg.rdu3.fedoraproject.org Ready worker 20d v1.21.1+9807387
|
|
worker03.ocp.stg.rdu3.fedoraproject.org Ready worker 20d v1.21.1+9807387
|
|
worker04.ocp.stg.rdu3.fedoraproject.org Ready worker 34d v1.21.1+9807387
|
|
worker05.ocp.stg.rdu3.fedoraproject.org Ready worker 34d v1.21.1+9807387
|
|
----
|
|
|
|
At this point the cluster is basically up and running.
|
|
|
|
|
|
== Follow on SOPs
|
|
Several other SOPs should be followed to perform the post installation configuration on the cluster.
|
|
|
|
- xref:sop_configure_baremetal_pxe_uefi_boot.adoc[SOP Configure Baremetal PXE-UEFI Boot]
|
|
- xref:sop_create_machineconfigs.adoc[SOP Create MachineConfigs to Configure RHCOS]
|
|
- xref:sop_retrieve_ocp4_cacert.adoc[SOP Retrieve OCP4 CACERT]
|
|
- xref:sop_configure_image_registry_operator.adoc[SOP Configure the Image Registry Operator]
|
|
- xref:sop_disable_provisioners_role.adoc[SOP Disable the Provisioners Role]
|
|
- xref:sop_configure_oauth_ipa.adoc[SOP Configure oauth Authentication via IPA/Noggin]
|
|
- xref:sop_configure_local_storage_operator.adoc[SOP Configure the Local Storage Operator]
|
|
- xref:sop_configure_openshift_container_storage.adoc[SOP Configure the Openshift Container Storage Operator]
|
|
- xref:sop_configure_userworkload_monitoring_stack.adoc[SOP Configure the Userworkload Monitoring Stack]
|
|
|