Fix Antora warnings about list item numbering.

This commit is contained in:
Bussi Andrea 2022-04-04 22:22:56 +02:00
parent eac62f0679
commit e27f2e56da
5 changed files with 43 additions and 47 deletions

View file

@ -12,9 +12,9 @@ This SOP should be used in the following scenario:
== Steps
1. Add the new nodes to the Ansible inventory file in the appropriate group.
+
eg:
+
----
[ocp_workers]
worker01.ocp.iad2.fedoraproject.org
@ -31,7 +31,7 @@ worker05.ocp.stg.iad2.fedoraproject.org
----
2. Add the new hostvars for each new host being added, see the following examples for `VM` vs `baremetal` hosts.
+
----
# control plane VM
inventory/host_vars/ocp01.ocp.iad2.fedoraproject.org
@ -41,14 +41,14 @@ inventory/host_vars/worker01.ocp.iad2.fedoraproject.org
----
3. If the nodes are `compute` or `worker` nodes, they must be also added to the following group_vars `proxies` for prod, `proxies_stg` for staging
+
----
inventory/group_vars/proxies:ocp_nodes:
inventory/group_vars/proxies_stg:ocp_nodes_stg:
----
4. Changes must be made to the `roles/dhcp_server/files/dhcpd.conf.noc01.iad2.fedoraproject.org` file for DHCP to ensure that the node will receive an IP address based on its MAC address, and tells the node to reach out to the `next-server` where it can find the UEFI boot configuration.
+
----
host worker01-ocp { # UPDATE THIS
hardware ethernet 68:05:CA:CE:A3:C9; # UPDATE THIS
@ -61,9 +61,9 @@ host worker01-ocp { # UPDATE THIS
----
5. Changes must be made to DNS. To do this one must be a member of `sysadmin-main`, if you are not, one must send a patch request to the Fedora Infra mailing list for review which will be merged by the sysadmin-main members.
+
See the following examples for the `worker01.ocp` nodes for production and staging.
+
----
master/163.3.10.in-addr.arpa:123 IN PTR worker01.ocp.iad2.fedoraproject.org.
master/166.3.10.in-addr.arpa:118 IN PTR worker01.ocp.stg.iad2.fedoraproject.org.
@ -72,15 +72,14 @@ master/stg.iad2.fedoraproject.org:worker01.ocp IN A 10.3.1
----
6. Run the playbook to update the haproxy config to monitor the new nodes, and add it to the load balancer.
+
----
sudo rbac-playbook groups/noc.yml -t "tftp_server,dhcp_server"
sudo rbac-playbook groups/proxies.yml -t 'haproxy,httpd'
----
7. DHCP instructs the node to reach out to the `next-server` when it is handed out an IP address. The `next-server` runs a tftp server which contains the kernel, initramfs and UEFI boot configuration. `uefi/grub.cfg`. Contained in this grub.cfg is the following which relates to the OCP4 nodes:
+
----
menuentry 'RHCOS 4.8 worker staging' {
linuxefi images/RHCOS/4.8/x86_64/rhcos-4.8.2-x86_64-live-kernel-x86_64 ip=dhcp nameserver=10.3.163.33 coreos.inst.install_dev=/dev/sda
@ -93,23 +92,23 @@ coreos.live.rootfs_url=http://10.3.163.65/rhcos/rhcos-4.8.2-x86_64-live-rootfs.x
initrdefi images/RHCOS/4.8/x86_64/rhcos-4.8.2-x86_64-live-initramfs.x86_64.img
}
----
+
When a node is booted up, and reads this UEFI boot configuration, the menu option must be manually selected:
+
- To add a node to the staging cluster choose: `RHCOS 4.8 worker staging`
- To add a node to the production cluster choose: `RHCOS 4.8 worker production`
8. Connect to the `os-control01` node which corresponds with the ENV which the new node is being added to.
+
Verify that you are authenticated correctly to the OpenShift cluster as the `system:admin` user.
+
----
oc whoami
system:admin
----
9. Contained within the UEFI boot menu configuration are the links to the web server running on the `os-control01` host specific to the ENV. This server should only run when we wish to reinstall an existing node or install a new node. Start it using systemctl manually:
+
----
systemctl start httpd.service
----
@ -119,7 +118,7 @@ Wait until the node displays a SSH login prompt with the nodes name. It may rebo
11. As the new nodes are provisioned, they will attempt to join the cluster. They must first be accepted.
From the `os-control01` node run the following:
+
----
# List the certs. If you see status pending, this is the worker/compute nodes attempting to join the cluster. It must be approved.
oc get csr
@ -127,7 +126,7 @@ oc get csr
# Accept all node CSRs one liner
oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
----
+
This process usually needs to be repeated twice, for each new node.
To see more information about adding new worker/compute nodes to a user provisioned infrastructure based OCP4 cluster see the detailed steps at [1],[2].

View file

@ -13,26 +13,27 @@ This SOP should be used in the following scenario:
- [4] https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.9[Openshift Data Foundation Product Notes]
== Steps
1. Once a new node has been added to the Openshift cluster, we can manage the extra local storage devices on this node from within Openshift itself, providing that they do not contain file paritions/filesystems. In the case of a node being repurposed, please first ensure that all storage devices except `/dev/sda` are partition and filesystem free before starting.
2. From within the Openshift webconsole, or via cli search for all "LocalVolumeDiscovery" objects.
+
----
[root@os-control01 ~][PROD-IAD2]# oc get localvolumediscovery --all-namespaces
NAMESPACE NAME AGE
openshift-local-storage auto-discover-devices 167d
----
+
There should really be only a single LocalVolumeDiscovery object called `auto-discover-devices` in the `openshift-local-storage` namespace/project.
+
Edit this object:
+
----
oc edit localvolumediscovery auto-discover-devices -n openshift-local-storage
----
+
Add the hostname for the new node to the list that is already there like:
+
----
...
spec:
@ -49,22 +50,21 @@ spec:
- worker05.ocp.iad2.fedoraproject.org
...
----
+
Write and save the change.
3. From within the Openshift webconsole, or via cli search for all "LocalVolumeSet" objects.
+
There should really be only a single LocalVolumeSet object called `local-block` in the `openshift-local-storage` namespace/project.
+
Edit this object:
+
----
oc edit localvolumeset local-block -n openshift-local-storage
----
+
Add the hostname for the new node to the list that is already there like:
+
----
...
spec:
@ -82,7 +82,7 @@ spec:
- worker05.ocp.iad2.fedoraproject.org
...
----
+
Write and save the change.
4. From the Openshift Web console visit `Storage, OpenShift Data Foundation`, then in the `Storage Systems` sub menu, click the 3 dot menu on the right beside the `ocs-storagecluster-storage` object. Choose `Add Capacity` option. From the popup menu that appears, ensure that the storage class `local-block` is selected in the list. Finally confirm with add.

View file

@ -9,7 +9,7 @@ This SOP should be followed in the following scenarios:
1. Connect to the `os-control01` host associated with this ENV. Become root `sudo su -`.
2. Mark the node as unschedulable:
+
----
nodes=$(oc get nodes -o name | sed -E "s/node\///")
echo $nodes
@ -19,19 +19,18 @@ node/<node> cordoned
----
3. Check that the node status is `NotReady,SchedulingDisabled`
+
----
oc get node <node1>
NAME STATUS ROLES AGE VERSION
<node1> NotReady,SchedulingDisabled worker 1d v1.18.3
----
+
Note: It might not switch to `NotReady` immediately, there maybe many pods still running.
4. Evacuate the Pods from **worker nodes** using one of the following methods
This will drain node `<node1>`, delete any local data, and ignore daemonsets, and give a period of 60 seconds for pods to drain gracefully.
+
----
oc adm drain <node1> --delete-emptydir-data=true --ignore-daemonsets=true --grace-period=15
----
@ -39,10 +38,9 @@ oc adm drain <node1> --delete-emptydir-data=true --ignore-daemonsets=true --grac
5. Perform the scheduled maintenance on the node
Do what ever is required in the scheduled maintenance window
6. Once the node is ready to be added back into the cluster
We must uncordon the node. This allows it to be marked scheduleable once more.
+
----
nodes=$(oc get nodes -o name | sed -E "s/node\///")
echo $nodes
@ -50,7 +48,6 @@ echo $nodes
for node in ${nodes[@]}; do oc adm uncordon $node; done
----
== Resources
- [1] [Nodes - working with nodes](https://docs.openshift.com/container-platform/4.8/nodes/nodes/nodes-nodes-working.html)

View file

@ -13,38 +13,38 @@ This SOP should be followed in the following scenarios:
1. Connect to the `os-control01` node associated with the ENV.
2. Use the `oc` tool to make a debug connection to a controlplane node
+
----
oc debug node/<node_name>
----
3. Chroot to the /host directory on the containers filesystem
+
----
sh-4.2# chroot /host
----
4. Run the cluster-backup.sh script and pass in the location to save the backup to
+
----
sh-4.4# /usr/local/bin/cluster-backup.sh /home/core/assets/backup
----
5. Chown the backup files to be owned by user `core` and group `core`
+
----
chown -R core:core /home/core/assets/backup
----
6. From the admin machine, see inventory group: `ocp-ci-management`, become the Openshift service account, see the inventory hostvars for the host identified in the previous step and note the `ocp_service_account` variable.
+
----
ssh <host>
sudo su - <ocp_service_account>
----
7. Copy the files down to the `os-control01` machine.
+
----
scp -i <ssh_key> core@<node_name>:/home/core/assets/backup/* ocp_backups/
----

View file

@ -13,13 +13,13 @@ Prequisite steps:
1. Connect to the `os-control01` host associated with this ENV. Become root `sudo su -`.
2. Get a list of the nodes
+
----
nodes=$(oc get nodes -o name | sed -E "s/node\///")
----
3. Shutdown the nodes from the administration box associated with the cluster `ENV` eg production/staging.
+
----
for node in ${nodes[@]}; do ssh -i /root/ocp4/ocp-<ENV>/ssh/id_rsa core@$node sudo shutdown -h now; done
----