Metrics-for-apps: Added SOPs
- cordoning nodes - graceful shutdown - graceful startup Signed-off-by: David Kirwan <dkirwan@redhat.com>
This commit is contained in:
parent
23c84096dd
commit
6c17d91dbb
4 changed files with 182 additions and 5 deletions
|
@ -0,0 +1,56 @@
|
|||
== Cordoning Nodes and Draining Pods
|
||||
This SOP should be followed in the following scenarios:
|
||||
|
||||
- If maintenance is scheduled to be carried out on an Openshift node.
|
||||
|
||||
|
||||
=== Steps
|
||||
|
||||
1. Connect to the `os-control01` host associated with this ENV. Become root `sudo su -`.
|
||||
|
||||
2. Mark the node as unschedulable:
|
||||
|
||||
----
|
||||
nodes=$(oc get nodes -o name | sed -E "s/node\///")
|
||||
echo $nodes
|
||||
|
||||
for node in ${nodes[@]}; do oc adm cordon $node; done
|
||||
node/<node> cordoned
|
||||
----
|
||||
|
||||
3. Check that the node status is `NotReady,SchedulingDisabled`
|
||||
|
||||
----
|
||||
oc get node <node1>
|
||||
NAME STATUS ROLES AGE VERSION
|
||||
<node1> NotReady,SchedulingDisabled worker 1d v1.18.3
|
||||
----
|
||||
|
||||
Note: It might not switch to `NotReady` immediately, there maybe many pods still running.
|
||||
|
||||
|
||||
4. Evacuate the Pods from **worker nodes** using one of the following methods
|
||||
This will drain node `<node1>`, delete any local data, and ignore daemonsets, and give a period of 60 seconds for pods to drain gracefully.
|
||||
|
||||
----
|
||||
oc adm drain <node1> --delete-emptydir-data=true --ignore-daemonsets=true --grace-period=15
|
||||
----
|
||||
|
||||
5. Perform the scheduled maintenance on the node
|
||||
Do what ever is required in the scheduled maintenance window
|
||||
|
||||
|
||||
6. Once the node is ready to be added back into the cluster
|
||||
We must uncordon the node. This allows it to be marked scheduleable once more.
|
||||
|
||||
----
|
||||
nodes=$(oc get nodes -o name | sed -E "s/node\///")
|
||||
echo $nodes
|
||||
|
||||
for node in ${nodes[@]}; do oc adm uncordon $node; done
|
||||
----
|
||||
|
||||
|
||||
=== Resources
|
||||
|
||||
- [1] [Nodes - working with nodes](https://docs.openshift.com/container-platform/4.8/nodes/nodes/nodes-nodes-working.html)
|
30
modules/ocp4/pages/sop_graceful_shutdown_ocp_cluster.adoc
Normal file
30
modules/ocp4/pages/sop_graceful_shutdown_ocp_cluster.adoc
Normal file
|
@ -0,0 +1,30 @@
|
|||
== Graceful Shutdown of an Openshift 4 Cluster
|
||||
This SOP should be followed in the following scenarios:
|
||||
|
||||
- Graceful full shut down of the Openshift 4 cluster is required.
|
||||
|
||||
=== Steps
|
||||
|
||||
Prequisite steps:
|
||||
- Follow the SOP for cordoning and draining the nodes.
|
||||
- Follow the SOP for creating an `etcd` backup.
|
||||
|
||||
|
||||
1. Connect to the `os-control01` host associated with this ENV. Become root `sudo su -`.
|
||||
|
||||
2. Get a list of the nodes
|
||||
|
||||
----
|
||||
nodes=$(oc get nodes -o name | sed -E "s/node\///")
|
||||
----
|
||||
|
||||
3. Shutdown the nodes from the administration box associated with the cluster `ENV` eg production/staging.
|
||||
|
||||
----
|
||||
for node in ${nodes[@]}; do ssh -i /root/ocp4/ocp-<ENV>/ssh/id_rsa core@$node sudo shutdown -h now; done
|
||||
----
|
||||
|
||||
|
||||
==== Resources
|
||||
|
||||
- [1] https://docs.openshift.com/container-platform/4.5/backup_and_restore/graceful-cluster-shutdown.html[Graceful Cluster Shutdown]
|
88
modules/ocp4/pages/sop_graceful_startup_ocp_cluster.adoc
Normal file
88
modules/ocp4/pages/sop_graceful_startup_ocp_cluster.adoc
Normal file
|
@ -0,0 +1,88 @@
|
|||
== Graceful Startup of an Openshift 4 Cluster
|
||||
This SOP should be followed in the following scenarios:
|
||||
|
||||
- Graceful start up of an Openshift 4 cluster.
|
||||
|
||||
=== Steps
|
||||
Prequisite steps:
|
||||
|
||||
|
||||
==== Start the VM Control Plane instances
|
||||
Ensure that the control plane instances start first.
|
||||
|
||||
----
|
||||
# Virsh command to start the VMs
|
||||
----
|
||||
|
||||
|
||||
==== Start the physical nodes
|
||||
To connect to `idrac`, you must be connected to the Red Hat VPN. Next find the management IP associated with each node.
|
||||
|
||||
On the `batcave01` instance, in the dns configuration, the following bare metal machines make up the production/staging OCP4 worker nodes.
|
||||
|
||||
----
|
||||
oshift-dell01 IN A 10.3.160.180 # worker01 prod
|
||||
oshift-dell02 IN A 10.3.160.181 # worker02 prod
|
||||
oshift-dell03 IN A 10.3.160.182 # worker03 prod
|
||||
oshift-dell04 IN A 10.3.160.183 # worker01 staging
|
||||
oshift-dell05 IN A 10.3.160.184 # worker02 staging
|
||||
oshift-dell06 IN A 10.3.160.185 # worker03 staging
|
||||
----
|
||||
|
||||
Login to the `idrac` interface that corresponds with each worker, one at a time. Ensure the node is booting via harddrive, then power it on.
|
||||
|
||||
==== Once the nodes have been started they must be uncordoned if appropriate
|
||||
|
||||
----
|
||||
oc get nodes
|
||||
NAME STATUS ROLES AGE VERSION
|
||||
dumpty-n1.ci.centos.org Ready,SchedulingDisabled worker 77d v1.18.3+6c42de8
|
||||
dumpty-n2.ci.centos.org Ready,SchedulingDisabled worker 77d v1.18.3+6c42de8
|
||||
dumpty-n3.ci.centos.org Ready,SchedulingDisabled worker 77d v1.18.3+6c42de8
|
||||
dumpty-n4.ci.centos.org Ready,SchedulingDisabled worker 77d v1.18.3+6c42de8
|
||||
dumpty-n5.ci.centos.org Ready,SchedulingDisabled worker 77d v1.18.3+6c42de8
|
||||
kempty-n10.ci.centos.org Ready,SchedulingDisabled worker 106d v1.18.3+6c42de8
|
||||
kempty-n11.ci.centos.org Ready,SchedulingDisabled worker 106d v1.18.3+6c42de8
|
||||
kempty-n12.ci.centos.org Ready,SchedulingDisabled worker 106d v1.18.3+6c42de8
|
||||
kempty-n6.ci.centos.org Ready,SchedulingDisabled master 106d v1.18.3+6c42de8
|
||||
kempty-n7.ci.centos.org Ready,SchedulingDisabled master 106d v1.18.3+6c42de8
|
||||
kempty-n8.ci.centos.org Ready,SchedulingDisabled master 106d v1.18.3+6c42de8
|
||||
kempty-n9.ci.centos.org Ready,SchedulingDisabled worker 106d v1.18.3+6c42de8
|
||||
|
||||
nodes=$(oc get nodes -o name | sed -E "s/node\///")
|
||||
|
||||
for node in ${nodes[@]}; do oc adm uncordon $node; done
|
||||
node/dumpty-n1.ci.centos.org uncordoned
|
||||
node/dumpty-n2.ci.centos.org uncordoned
|
||||
node/dumpty-n3.ci.centos.org uncordoned
|
||||
node/dumpty-n4.ci.centos.org uncordoned
|
||||
node/dumpty-n5.ci.centos.org uncordoned
|
||||
node/kempty-n10.ci.centos.org uncordoned
|
||||
node/kempty-n11.ci.centos.org uncordoned
|
||||
node/kempty-n12.ci.centos.org uncordoned
|
||||
node/kempty-n6.ci.centos.org uncordoned
|
||||
node/kempty-n7.ci.centos.org uncordoned
|
||||
node/kempty-n8.ci.centos.org uncordoned
|
||||
node/kempty-n9.ci.centos.org uncordoned
|
||||
|
||||
oc get nodes
|
||||
NAME STATUS ROLES AGE VERSION
|
||||
dumpty-n1.ci.centos.org Ready worker 77d v1.18.3+6c42de8
|
||||
dumpty-n2.ci.centos.org Ready worker 77d v1.18.3+6c42de8
|
||||
dumpty-n3.ci.centos.org Ready worker 77d v1.18.3+6c42de8
|
||||
dumpty-n4.ci.centos.org Ready worker 77d v1.18.3+6c42de8
|
||||
dumpty-n5.ci.centos.org Ready worker 77d v1.18.3+6c42de8
|
||||
kempty-n10.ci.centos.org Ready worker 106d v1.18.3+6c42de8
|
||||
kempty-n11.ci.centos.org Ready worker 106d v1.18.3+6c42de8
|
||||
kempty-n12.ci.centos.org Ready worker 106d v1.18.3+6c42de8
|
||||
kempty-n6.ci.centos.org Ready master 106d v1.18.3+6c42de8
|
||||
kempty-n7.ci.centos.org Ready master 106d v1.18.3+6c42de8
|
||||
kempty-n8.ci.centos.org Ready master 106d v1.18.3+6c42de8
|
||||
kempty-n9.ci.centos.org Ready worker 106d v1.18.3+6c42de8
|
||||
----
|
||||
|
||||
|
||||
=== Resources
|
||||
|
||||
- [1] https://docs.openshift.com/container-platform/4.5/backup_and_restore/graceful-cluster-restart.html[Graceful Cluster Startup]
|
||||
- [2] https://docs.openshift.com/container-platform/4.5/backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html#dr-restoring-cluster-state[Cluster disaster recovery]
|
|
@ -1,12 +1,15 @@
|
|||
== SOPs
|
||||
|
||||
- xref:sop_installation.adoc[SOP Openshift 4 Installation on Fedora Infra]
|
||||
- xref:sop_create_machineconfigs.adoc[SOP Create MachineConfigs to Configure RHCOS]
|
||||
- xref:sop_configure_baremetal_pxe_uefi_boot.adoc[SOP Configure Baremetal PXE-UEFI Boot]
|
||||
- xref:sop_retrieve_ocp4_cacert.adoc[SOP Retrieve OCP4 CACERT]
|
||||
- xref:sop_configure_image_registry_operator.adoc[SOP Configure the Image Registry Operator]
|
||||
- xref:sop_configure_oauth_ipa.adoc[SOP Configure oauth Authentication via IPA/Noggin]
|
||||
- xref:sop_disable_provisioners_role.adoc[SOP Disable the Provisioners Role]
|
||||
- xref:sop_configure_local_storage_operator.adoc[SOP Configure the Local Storage Operator]
|
||||
- xref:sop_configure_oauth_ipa.adoc[SOP Configure oauth Authentication via IPA/Noggin]
|
||||
- xref:sop_configure_openshift_container_storage.adoc[SOP Configure the Openshift Container Storage Operator]
|
||||
- xref:sop_configure_userworkload_monitoring_stack.adoc[SOP Configure the Userworkload Monitoring Stack]
|
||||
- xref:sop_cordoning_nodes_and_draining_pods.adoc[SOP Cordoning and Draining Nodes]
|
||||
- xref:sop_create_machineconfigs.adoc[SOP Create MachineConfigs to Configure RHCOS]
|
||||
- xref:sop_disable_provisioners_role.adoc[SOP Disable the Provisioners Role]
|
||||
- xref:sop_graceful_shutdown_ocp_cluster.adoc[SOP Graceful Cluster Shutdown]
|
||||
- xref:sop_graceful_startup_ocp_cluster.adoc[SOP Graceful Cluster Startup]
|
||||
- xref:sop_installation.adoc[SOP Openshift 4 Installation on Fedora Infra]
|
||||
- xref:sop_retrieve_ocp4_cacert.adoc[SOP Retrieve OCP4 CACERT]
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue