38 lines
2.2 KiB
Text
38 lines
2.2 KiB
Text
|
== Upgrade OCP4 Cluster
|
||
|
Please see the official documentation for more information [1][3], this SOP can be used as a rough guide.
|
||
|
|
||
|
=== Resources
|
||
|
|
||
|
- [1] https://docs.openshift.com/container-platform/4.8/updating/updating-cluster-between-minor.html[Upgrading OCP4 Cluster Between Minor Versions]
|
||
|
- [2] xref:sop_etcd_backup.adoc[SOP Create etcd backup]
|
||
|
- [3] https://docs.openshift.com/container-platform/4.8/operators/admin/olm-upgrading-operators.html
|
||
|
- [4] https://docs.openshift.com/container-platform/4.8/backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html#dr-restoring-cluster-state[Restore etcd backup]
|
||
|
- [5] https://docs.openshift.com/container-platform/4.8/operators/admin/olm-upgrading-operators.html#olm-upgrading-operators[Upgrading Operators Prior to Cluster Update]
|
||
|
|
||
|
=== Prerequisites
|
||
|
|
||
|
- Incase an upgrade fails, it is wise to first take an `etcd` backup. To do so follow the SOP [2].
|
||
|
- Ensuare that all installed Operators are at the latest versions for their channel [5].
|
||
|
|
||
|
=== Upgrade OCP
|
||
|
At the time of writing the version installed on the cluster is `4.8.11` and the `upgrade channel` is set to `stable-4.8`. It is easiest to update the cluster via the web console. Go to:
|
||
|
|
||
|
- Administration
|
||
|
- Cluster Settings
|
||
|
- In order to upgrade between `z` or `patch` version (x.y.z), when one is available, click the update button.
|
||
|
- When moving between `y` or `minor` versions, you must first switch the `upgrade channel` to `fast-4.9` as an example. You should also be on the very latest `z`/`patch` version before upgrading.
|
||
|
- When the upgrade has finished, switch back to the `upgrade channel` for stable.
|
||
|
|
||
|
|
||
|
=== Upgrade failures
|
||
|
In the worst case scenario we may have to restore etcd from the backups taken at the start [4]. Or reinstall a node entirely.
|
||
|
|
||
|
==== Troubleshooting
|
||
|
There are many possible ways an upgrade can fail mid way through.
|
||
|
|
||
|
- Check the monitoring alerts currently firing, this can often hint towards the problem
|
||
|
- Often individual nodes are failing to take the new MachineConfig changes and will show up when examining the `MachineConfigPool` status.
|
||
|
- Might require a manual reboot of that particular node
|
||
|
- Might require killing pods on that particular node
|
||
|
|