diff --git a/modules/ocp4/pages/sop_installation.adoc b/modules/ocp4/pages/sop_installation.adoc new file mode 100644 index 0000000..45f03d2 --- /dev/null +++ b/modules/ocp4/pages/sop_installation.adoc @@ -0,0 +1,213 @@ +== SOP Installation/Configuration of OCP4 on Fedora Infra + +=== Resources + +- [1]: https://docs.openshift.com/container-platform/4.8/installing/installing_bare_metal/[Official OCP4 Installation Documentation] + +=== Install +To install OCP4 on Fedora Infra, one must be apart of the following groups: + +- `sysadmin-openshift` +- `sysadmin-noc` + + +==== Prerequisites +Visit the https://console.redhat.com/openshift/install/metal/user-provisioned[OpenShift Console] and download the following OpenShift tools: + +* A Red Hat Access account is required +* OC client tools https://access.redhat.com/downloads/content/290/ver=4.8/rhel---8/4.8.10/x86_64/product-software[Here] +* OC installation tool https://access.redhat.com/downloads/content/290/ver=4.8/rhel---8/4.8.10/x86_64/product-software[Here] +* Ensure the downloaded tools are available on the `PATH` +* A valid OCP4 subscription is required to complete the installation configuration, by default you have a 60 day trial. +* Take a copy of your pull secret file you will need to put this in the `install-config.yaml` file in the next step. + + +==== Generate install-config.yaml file +We must create a `install-config.yaml` file, use the following example for inspiration, alternatively refer to the documentation[1] for more detailed information/explainations. + +---- +apiVersion: v1 +baseDomain: stg.fedoraproject.org +compute: +- hyperthreading: Enabled + name: worker + replicas: 0 +controlPlane: + hyperthreading: Enabled + name: master + replicas: 3 +metadata: + name: 'ocp' +networking: + clusterNetwork: + - cidr: 10.128.0.0/14 + hostPrefix: 23 + networkType: OpenShiftSDN + serviceNetwork: + - 172.30.0.0/16 +platform: + none: {} +fips: false +pullSecret: 'PUT PULL SECRET HERE' +sshKey: 'PUT SSH PUBLIC KEY HERE kubeadmin@core' +---- + +* Login to the `os-control01` corresponding with the environment +* Make a directory to hold the installation files: `mkdir ocp4-` +* Enter this newly created directory: `cd ocp4-` +* Generate a fresh SSH keypair: `ssh-keygen -f ./ocp4--ssh` +* Create a `ssh` directory and place this keypair into it. +* Put the contents of the public key in the `sshKey` value in the `install-config.yaml` file +* Put the contents of your Pull Secret in the `pullSecret` value in the `install-config.yaml` +* Take a backup of the `install-config.yaml` to `install-config.yaml.bak`, as running the next steps consumes this file, having a backup allows you to recover from mistakes quickly. + + +==== Create the Installation Files +Using the `openshift-install` tool we can generate the installation files. Make sure that the `install-config.yaml` file is in the `/path/to/ocp4-` location before attempting the next steps. + +===== Create the Manifest Files +The manifest files are human readable, at this stage you can put any customisations required before the installation begins. + +* Create the manifests: `openshift-install create manifests --dir=/path/to/ocp4-` +* All configuration for RHCOS must be done via MachineConfigs configuration. If there is known configuration which must be performed, such as NTP, you can copy the MachineConfigs into the `/path/to/ocp4-/openshift` directory now. +* The following step should be performed at this point, edit the `/path/to/ocp4-/manifests/cluster-scheduler-02-config.yml` change the `mastersSchedulable` value to `false`. + + +===== Create the Ignition Files +The ignition files have been generated from the manifests and MachineConfig files to generate the final installation files for the three roles: `bootstrap`, `master`, `worker`. In Fedora we prefer not to use the term `master` here, we have renamed this role to `controlplane`. + +* Create the ignition files: `openshift-install create ignition-configs --dir=/path/to/ocp4-` +* At this point you should have the following three files: `bootstrap.ign`, `master.ign` and `worker.ign`. +* Rename the `master.ign` to `controlplane.ign`. +* A directory has been created, `auth`. This contains two files: `kubeadmin-password` and `kubeconfig`. These allow `cluster-admin` access to the cluster. + + +==== Copy the Ignition files to the `batcave01` server +On the `batcave01` at the following location: `/srv/web/infra/bigfiles/openshiftboot/`: + +* Create a directory to match the environment: `mkdir /srv/web/infra/bigfiles/openshiftboot/ocp4-` +* Copy the ignition files, the ssh files and the auth files generated in previous steps, to this newly created directory. Users with `sysadmin-openshift` should have the necessary permissions to write to this location. +* when this is complete it should look like the following: +---- + ├── + │ ├── auth + │ │ ├── kubeadmin-password + │ │ └── kubeconfig + │ ├── bootstrap.ign + │ ├── controlplane.ign + │ ├── ssh + │ │ ├── id_rsa + │ │ └── id_rsa.pub + │ └── worker.ign +---- + + +==== Update the ansible inventory +The ansible inventory/hostvars/group vars should be updated with the new hosts information. + +For inspiration see the following https://pagure.io/fedora-infra/ansible/pull-request/765[PR] where we added the ocp4 production changes. + + +==== Update the DNS/DHCP configuration +The DNS and DHCP configuration must also be updated. This https://pagure.io/fedora-infra/ansible/pull-request/765[PR] contains the necessiary changes DHCP for prod and can be done in ansible. + +However the DNS changes may only be performed by `sysadmin-main`. For this reason any DNS changes must go via a patch snippet which is emailed to the `infrastructure@lists.fedoraproject.org` mailing list for review and approval. This process may take several days. + + +==== Generate the TLS Certs for the new environment +This is beyond the scope of this SOP, the best option is to create a ticket for Fedora Infra to request that these certs are created and available for use. The following certs should be available: + +- `*.apps..fedoraproject.org` +- `api..fedoraproject.org` +- `api-int..fedoraproject.org` + + +==== Run the Playbooks +There are a number of playbooks required to be run. Once all the previous steps have been reached, we can run these playbooks from the `batcave01` instance. + +- `sudo rbac-playbook groups/noc.yml -t 'tftp_server,dhcp_server'` +- `sudo rbac-playbook groups/proxies.yml -t 'haproxy,httpd,iptables'` + + +===== Baremetal / VMs +Depending on if some of the nodes are VMs or baremetal, different tags should be supplied to the following playbook. If the entire cluster is baremetal you can skip the `kvm_deploy` tag entirely. + +If there are VMs used for some of the roles, make sure to leave it in. + +- `sudo rbac-playbook manual/ocp4-place-ignitionfiles.yml -t "ignition,repo,kvm_deploy"` + + +===== Baremetal +At this point we can switch on the baremetal nodes and begin the PXE/UEFI boot process. The baremetal nodes should via DHCP/DNS have the configuration necessary to reach out to the `noc01.iad2.fedoraproject.org` server and retrieve the UEFI boot configuration via PXE. + +Once booted up, you should visit the management console for this node, and manually choose the UEFI configuration appropriate for its role. + +The node will begin booting, and during the boot process it will reach out to the `os-control01` instance specific to the `` to retrieve the ignition file appropriate to its role. + +The system will then become autonomous, it will install and potentially reboot multiple times as updates are retrieved/applied etc. + +Eventually you will be presented with a SSH login prompt, where it should have the correct hostname eg: `ocp01` to match what is in the DNS configuration. + + +==== Bootstrapping completed +When the control plane is up, we should see all controlplane instances available in the appropriate haproxy dashboard. eg: https://admin.fedoraproject.org/haproxy/proxy01=ocp-masters-backend-kapi[haproxy]. + +At this time we should take the `bootstrap` instance out of the haproxy load balancer. + +- Make the necessiary changes to ansible at: `ansible/roles/haproxy/templates/haproxy.cfg` +- Once merged, run the following playbook once more: `sudo rbac-playbook groups/proxies.yml -t 'haproxy'` + + +==== Begin instllation of the worker nodes +Follow the same processes listed in the Baremetal section above to switch on the worker nodes and begin installation. + + +==== Configure the `os-control01` to authenticate with the new OCP4 cluster +Copy the `kubeconfig` to `~root/.kube/config` on the `os-control01` instance. +This will allow the `root` user to automatically be authenticated to the new OCP4 cluster with `cluster-admin` privileges. + + +==== Accept Node CSR Certs +To accept the worker/compute nodes into the cluster we need to accept their CSR certs. + +List the CSR certs. The ones we're interested in will show as pending: + +---- +oc get csr +---- + +To accept all the OCP4 node CSRs in a one liner do the following: + +---- +oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve +---- + +This should look something like this once completed: + +---- +[root@os-control01 ocp4][STG]= oc get nodes +NAME STATUS ROLES AGE VERSION +ocp01.ocp.stg.iad2.fedoraproject.org Ready master 34d v1.21.1+9807387 +ocp02.ocp.stg.iad2.fedoraproject.org Ready master 34d v1.21.1+9807387 +ocp03.ocp.stg.iad2.fedoraproject.org Ready master 34d v1.21.1+9807387 +worker01.ocp.stg.iad2.fedoraproject.org Ready worker 21d v1.21.1+9807387 +worker02.ocp.stg.iad2.fedoraproject.org Ready worker 20d v1.21.1+9807387 +worker03.ocp.stg.iad2.fedoraproject.org Ready worker 20d v1.21.1+9807387 +worker04.ocp.stg.iad2.fedoraproject.org Ready worker 34d v1.21.1+9807387 +worker05.ocp.stg.iad2.fedoraproject.org Ready worker 34d v1.21.1+9807387 +---- + +At this point the cluster is basically up and running. + + +=== Follow on SOPs +Several other SOPs should be followed to perform the post installation configuration on the cluster. + +- http://linkmeh[Retrieve the OCP4 Cluster's CA Cert to configure haproxy] +- http://linkmeh[Configure the Image Registry Operator to use NFS Storage] +- http://linkmeh[Configure OIDC for Noggin/IPA in OCP4] +- http://linkmeh[Disable self provisioners role] +- http://linkmeh[Installation/Configuration of the Local Storage Operator] +- http://linkmeh[Installation/Configuration of the Openshift Container Storage Operator] +- http://linkmeh[Configure the OCP4 User Workload Monitoring Stack] + diff --git a/modules/ocp4/pages/sops.adoc b/modules/ocp4/pages/sops.adoc new file mode 100644 index 0000000..0da08bb --- /dev/null +++ b/modules/ocp4/pages/sops.adoc @@ -0,0 +1,30 @@ +== SOPs + +- xref:sop_installation.adoc[SOP Openshift 4 Installation on Fedora Infra] + +=== Configure the baremetal nodes to pxeboot with UEFI into RHCOS + +=== Create MachineConfigs to configure RHCOS + +=== Retrieve the OCP4 Cluster's CA Cert to configure haproxy + +=== Configure the Image Registry Operator to use NFS Storage + +=== Configure OIDC for Noggin/IPA in OCP4 + +=== Disable self provisioners role + +=== Installation/Configuration of the Local Storage Operator + +=== Installation/Configuration of the Openshift Container Storage Operator + +=== Configure the OCP4 User Workload Monitoring Stack + + + + + + + + + diff --git a/modules/sysadmin_guide/nav.adoc b/modules/sysadmin_guide/nav.adoc index 8d50a04..269824d 100644 --- a/modules/sysadmin_guide/nav.adoc +++ b/modules/sysadmin_guide/nav.adoc @@ -1,4 +1,5 @@ * xref:orientation.adoc[Orientation for Sysadmin Guide] +** xref:ocp4:sops.adoc[Openshift 4 SOPs] * xref:index.adoc[Sysadmin Guide] ** xref:2-factor.adoc[Two factor auth] ** xref:accountdeletion.adoc[Account Deletion SOP]