Review massupgrade SOP

Signed-off-by: Michal Konečný <mkonecny@redhat.com>
This commit is contained in:
Michal Konečný 2021-09-06 16:44:44 +02:00
parent 7675a5e0b4
commit 13e9b3d32c
2 changed files with 106 additions and 112 deletions

View file

@ -65,7 +65,7 @@
** xref:layered-image-buildsys.adoc[Layered Image Build System - SOP] ** xref:layered-image-buildsys.adoc[Layered Image Build System - SOP]
** xref:mailman.adoc[Mailman Infrastructure - SOP] ** xref:mailman.adoc[Mailman Infrastructure - SOP]
** xref:making-ssl-certificates.adoc[SSL Certificate Creation - SOP] ** xref:making-ssl-certificates.adoc[SSL Certificate Creation - SOP]
** xref:massupgrade.adoc[massupgrade - SOP in review ] ** xref:massupgrade.adoc[Mass Upgrade Infrastructure - SOP]
** xref:mastermirror.adoc[mastermirror - SOP in review ] ** xref:mastermirror.adoc[mastermirror - SOP in review ]
** xref:mbs.adoc[mbs - SOP in review ] ** xref:mbs.adoc[mbs - SOP in review ]
** xref:memcached.adoc[memcached - SOP in review ] ** xref:memcached.adoc[memcached - SOP in review ]

View file

@ -5,26 +5,22 @@ various security and other upgrades.
== Contents == Contents
[arabic] * <<_contact_information>>
. Contact Information * <<_preparation>>
. Preparation * <<_staging>>
. Staging * <<_special_considerations>>
. Special Considerations ** <<_disable_builders>>
+ ** <<_post_reboot_action>>
____ ** <<_schedule_autoqa01_reboot>>
* Disable builders ** <<_bastion01_and_bastion02_and_openvpn_server>>
* Post reboot action ** <<_special_yum_directives>>
* Schedule autoqa01 reboot * <<_update_leader>>
* Bastion01 and Bastion02 and openvpn server * <<_group_a_reboots>>
* Special yum directives * <<_group_b_reboots>>
____ * <<_group_c_reboots>>
. Update Leader * <<_doing_the_upgrade>>
. Group A reboots * <<_doing_the_reboot>>
. Group B reboots * <<_aftermath>>
. Group C reboots
. Doing the upgrade
. Doing the reboot
. Aftermath
== Contact Information == Contact Information
@ -56,7 +52,7 @@ Group "C"::
servers that infrastructure will notice are down, or are redundent servers that infrastructure will notice are down, or are redundent
enough to reboot some with others taking the load. enough to reboot some with others taking the load.
. Appoint an 'Update Leader' for the updates. . Appoint an 'Update Leader' for the updates.
. Follow the [61]Outage Infrastructure SOP and send advance notification . Follow the xref:outage.adoc[Outage Infrastructure SOP] and send advance notification
to the appropriate lists. Try to schedule the update at a time when many to the appropriate lists. Try to schedule the update at a time when many
admins are around to help/watch for problems and when impact for the admins are around to help/watch for problems and when impact for the
group affected is less. Do NOT do multiple groups on the same day if group affected is less. Do NOT do multiple groups on the same day if
@ -112,10 +108,10 @@ koji enable-host NAME
.... ....
[NOTE] [NOTE]
.Note
==== ====
you must be a koji admin you must be a koji admin
==== ====
Additionally, rel-eng and builder boxes may need a special version Additionally, rel-eng and builder boxes may need a special version
of rpm. Make sure to check with rel-eng on any rpm upgrades for them. of rpm. Make sure to check with rel-eng on any rpm upgrades for them.
@ -142,12 +138,12 @@ iptables -t nat -I POSTROUTING -s 192.168.122.3/32 -j SNAT --to-source 66.135.62
.... ....
[NOTE] [NOTE]
.Note
==== ====
The source is the internal guest ips, the to-source is the external ips The source is the internal guest ips, the to-source is the external ips
that map to that guest ip. If there are multiple guests, each one needs that map to that guest ip. If there are multiple guests, each one needs
the above SNAT rule inserted. the above SNAT rule inserted.
==== ====
=== Schedule autoqa01 reboot === Schedule autoqa01 reboot
There is currently an autoqa01.c host on cnode01. Check with QA folks There is currently an autoqa01.c host on cnode01. Check with QA folks
@ -157,7 +153,7 @@ before rebooting this guest/host.
We need one of the bastion machines to be up to provide openvpn for all We need one of the bastion machines to be up to provide openvpn for all
machines. Before rebooting bastion02, modify: machines. Before rebooting bastion02, modify:
`manifests/nodes/bastion0*.phx2.fedoraproject.org.pp` files to start `manifests/nodes/bastion0*.iad2.fedoraproject.org.pp` files to start
openvpn server on bastion01, wait for all clients to re-connect, reboot openvpn server on bastion01, wait for all clients to re-connect, reboot
bastion02 and then revert back to it as openvpn hub. bastion02 and then revert back to it as openvpn hub.
@ -166,7 +162,8 @@ bastion02 and then revert back to it as openvpn hub.
Sometimes we will wish to exclude or otherwise modify the yum.conf on a Sometimes we will wish to exclude or otherwise modify the yum.conf on a
machine. For this purpose, all machines have an include, making them machine. For this purpose, all machines have an include, making them
read read
[62]http://infrastructure.fedoraproject.org/infra/hosts/FQHN/yum.conf.include http://infrastructure.fedoraproject.org/infra/hosts/FQHN/yum.conf.include
(TODO Fix link)
from the infrastructure repo. If you need to make such changes, add them from the infrastructure repo. If you need to make such changes, add them
to the infrastructure repo before doing updates. to the infrastructure repo before doing updates.
@ -197,35 +194,35 @@ These hosts are grouped based on the virt host they reside on:
* ibiblio03.fedoraproject.org * ibiblio03.fedoraproject.org
* collab01.fedoraproject.org * collab01.fedoraproject.org
* serverbeach09.fedoraproject.org * serverbeach09.fedoraproject.org
* db05.phx2.fedoraproject.org * db05.iad2.fedoraproject.org
* virthost03.phx2.fedoraproject.org * virthost03.iad2.fedoraproject.org
* db01.phx2.fedoraproject.org * db01.iad2.fedoraproject.org
* virthost04.phx2.fedoraproject.org * virthost04.iad2.fedoraproject.org
* db-fas01.phx2.fedoraproject.org * db-fas01.iad2.fedoraproject.org
* proxy01.phx2.fedoraproject.org * proxy01.iad2.fedoraproject.org
* virthost05.phx2.fedoraproject.org * virthost05.iad2.fedoraproject.org
* ask01.phx2.fedoraproject.org * ask01.iad2.fedoraproject.org
* virthost06.phx2.fedoraproject.org * virthost06.iad2.fedoraproject.org
These are the rest: These are the rest:
* bapp02.phx2.fedoraproject.org * bapp02.iad2.fedoraproject.org
* bastion02.phx2.fedoraproject.org * bastion02.iad2.fedoraproject.org
* app05.fedoraproject.org * app05.fedoraproject.org
* backup02.fedoraproject.org * backup02.fedoraproject.org
* bastion01.phx2.fedoraproject.org * bastion01.iad2.fedoraproject.org
* fas01.phx2.fedoraproject.org * fas01.iad2.fedoraproject.org
* fas02.phx2.fedoraproject.org * fas02.iad2.fedoraproject.org
* log02.phx2.fedoraproject.org * log02.iad2.fedoraproject.org
* memcached03.phx2.fedoraproject.org * memcached03.iad2.fedoraproject.org
* noc01.phx2.fedoraproject.org * noc01.iad2.fedoraproject.org
* ns02.fedoraproject.org * ns02.fedoraproject.org
* ns04.phx2.fedoraproject.org * ns04.iad2.fedoraproject.org
* proxy04.fedoraproject.org * proxy04.fedoraproject.org
* smtp-mm03.fedoraproject.org * smtp-mm03.fedoraproject.org
* batcave02.phx2.fedoraproject.org * batcave02.iad2.fedoraproject.org
* mm3test.fedoraproject.org * mm3test.fedoraproject.org
* packages02.phx2.fedoraproject.org * packages02.iad2.fedoraproject.org
=== Group B reboots === Group B reboots
@ -235,20 +232,20 @@ devel-announce list.
These hosts are grouped based on the virt host they reside on: These hosts are grouped based on the virt host they reside on:
* db04.phx2.fedoraproject.org * db04.iad2.fedoraproject.org
* bvirthost01.phx2.fedoraproject.org * bvirthost01.iad2.fedoraproject.org
* nfs01.phx2.fedoraproject.org * nfs01.iad2.fedoraproject.org
* bvirthost02.phx2.fedoraproject.org * bvirthost02.iad2.fedoraproject.org
* pkgs01.phx2.fedoraproject.org * pkgs01.iad2.fedoraproject.org
* bvirthost03.phx2.fedoraproject.org * bvirthost03.iad2.fedoraproject.org
* kojipkgs02.phx2.fedoraproject.org * kojipkgs02.iad2.fedoraproject.org
* bvirthost04.phx2.fedoraproject.org * bvirthost04.iad2.fedoraproject.org
These are the rest: These are the rest:
* koji04.phx2.fedoraproject.org * koji04.iad2.fedoraproject.org
* releng03.phx2.fedoraproject.org * releng03.iad2.fedoraproject.org
* releng04.phx2.fedoraproject.org * releng04.iad2.fedoraproject.org
=== Group C reboots === Group C reboots
@ -280,69 +277,68 @@ Group C hosts that have proxy servers on them:
* app08.fedoraproject.org * app08.fedoraproject.org
* proxy08.fedoraproject.org * proxy08.fedoraproject.org
* coloamer01.fedoraproject.org * coloamer01.fedoraproject.org
+
____
Other Group C hosts: Other Group C hosts:
____
* ask01.stg.phx2.fedoraproject.org * ask01.stg.iad2.fedoraproject.org
* app02.stg.phx2.fedoraproject.org * app02.stg.iad2.fedoraproject.org
* proxy01.stg.phx2.fedoraproject.org * proxy01.stg.iad2.fedoraproject.org
* releng01.stg.phx2.fedoraproject.org * releng01.stg.iad2.fedoraproject.org
* value01.stg.phx2.fedoraproject.org * value01.stg.iad2.fedoraproject.org
* virthost13.phx2.fedoraproject.org * virthost13.iad2.fedoraproject.org
* db-fas01.stg.phx2.fedoraproject.org * db-fas01.stg.iad2.fedoraproject.org
* pkgs01.stg.phx2.fedoraproject.org * pkgs01.stg.iad2.fedoraproject.org
* packages01.stg.phx2.fedoraproject.org * packages01.stg.iad2.fedoraproject.org
* virthost11.phx2.fedoraproject.org * virthost11.iad2.fedoraproject.org
* app01.stg.phx2.fedoraproject.org * app01.stg.iad2.fedoraproject.org
* koji01.stg.phx2.fedoraproject.org * koji01.stg.iad2.fedoraproject.org
* db02.stg.phx2.fedoraproject.org * db02.stg.iad2.fedoraproject.org
* fas01.stg.phx2.fedoraproject.org * fas01.stg.iad2.fedoraproject.org
* virthost10.phx2.fedoraproject.org * virthost10.iad2.fedoraproject.org
* autoqa01.qa.fedoraproject.org * autoqa01.qa.fedoraproject.org
* autoqa-stg01.qa.fedoraproject.org * autoqa-stg01.qa.fedoraproject.org
* bastion-comm01.qa.fedoraproject.org * bastion-comm01.qa.fedoraproject.org
* batcave-comm01.qa.fedoraproject.org * batcave-comm01.qa.fedoraproject.org
* virthost-comm01.qa.fedoraproject.org * virthost-comm01.qa.fedoraproject.org
* compose-x86-01.phx2.fedoraproject.org * compose-x86-01.iad2.fedoraproject.org
* compose-x86-02.phx2.fedoraproject.org * compose-x86-02.iad2.fedoraproject.org
* download01.phx2.fedoraproject.org * download01.iad2.fedoraproject.org
* download02.phx2.fedoraproject.org * download02.iad2.fedoraproject.org
* download03.phx2.fedoraproject.org * download03.iad2.fedoraproject.org
* download04.phx2.fedoraproject.org * download04.iad2.fedoraproject.org
* download05.phx2.fedoraproject.org * download05.iad2.fedoraproject.org
* download-rdu01.vpn.fedoraproject.org * download-rdu01.vpn.fedoraproject.org
* download-rdu02.vpn.fedoraproject.org * download-rdu02.vpn.fedoraproject.org
* download-rdu03.vpn.fedoraproject.org * download-rdu03.vpn.fedoraproject.org
* fas03.phx2.fedoraproject.org * fas03.iad2.fedoraproject.org
* secondary01.phx2.fedoraproject.org * secondary01.iad2.fedoraproject.org
* memcached04.phx2.fedoraproject.org * memcached04.iad2.fedoraproject.org
* virthost01.phx2.fedoraproject.org * virthost01.iad2.fedoraproject.org
* app02.phx2.fedoraproject.org * app02.iad2.fedoraproject.org
* value03.phx2.fedoraproject.org * value03.iad2.fedoraproject.org
* virthost07.phx2.fedoraproject.org * virthost07.iad2.fedoraproject.org
* app03.phx2.fedoraproject.org * app03.iad2.fedoraproject.org
* value04.phx2.fedoraproject.org * value04.iad2.fedoraproject.org
* ns03.phx2.fedoraproject.org * ns03.iad2.fedoraproject.org
* darkserver01.phx2.fedoraproject.org * darkserver01.iad2.fedoraproject.org
* virthost08.phx2.fedoraproject.org * virthost08.iad2.fedoraproject.org
* app04.phx2.fedoraproject.org * app04.iad2.fedoraproject.org
* packages02.phx2.fedoraproject.org * packages02.iad2.fedoraproject.org
* virthost09.phx2.fedoraproject.org * virthost09.iad2.fedoraproject.org
* hosted03.fedoraproject.org * hosted03.fedoraproject.org
* serverbeach06.fedoraproject.org * serverbeach06.fedoraproject.org
* hosted04.fedoraproject.org * hosted04.fedoraproject.org
* serverbeach07.fedoraproject.org * serverbeach07.fedoraproject.org
* collab02.fedoraproject.org * collab02.fedoraproject.org
* serverbeach08.fedoraproject.org * serverbeach08.fedoraproject.org
* dhcp01.phx2.fedoraproject.org * dhcp01.iad2.fedoraproject.org
* relepel01.phx2.fedoraproject.org * relepel01.iad2.fedoraproject.org
* sign-bridge02.phx2.fedoraproject.org * sign-bridge02.iad2.fedoraproject.org
* koji03.phx2.fedoraproject.org * koji03.iad2.fedoraproject.org
* bvirthost05.phx2.fedoraproject.org * bvirthost05.iad2.fedoraproject.org
* (disable each builder in turn, update and reenable). * (disable each builder in turn, update and reenable).
* ppc11.phx2.fedoraproject.org * ppc11.iad2.fedoraproject.org
* ppc12.phx2.fedoraproject.org * ppc12.iad2.fedoraproject.org
* backup03 * backup03
== Doing the upgrade == Doing the upgrade
@ -350,7 +346,7 @@ ____
If possible, system upgrades should be done in advance of the reboot If possible, system upgrades should be done in advance of the reboot
(with relevant testing of new packages on staging). To do the upgrades, (with relevant testing of new packages on staging). To do the upgrades,
make sure that the Infrastructure RHEL repo is updated as necessary to make sure that the Infrastructure RHEL repo is updated as necessary to
pull in the new packages ([63]Infrastructure Yum Repo SOP) pull in the new packages (xref:infra-repo.adoc[Infrastructure Yum Repo SOP])
On batcave01, as root run: On batcave01, as root run:
@ -379,7 +375,7 @@ sudo func-command --timeout=10 --oneline /usr/local/bin/needs-reboot.py after-up
In the order determined above, reboots will usually be grouped by the In the order determined above, reboots will usually be grouped by the
virtualization hosts that the servers are on. You can see the guests per virtualization hosts that the servers are on. You can see the guests per
virt host on batcave01 in /var/log/virthost-lists.out virt host on batcave01 in `/var/log/virthost-lists.out`
To reboot sets of boxes based on which virthost they are we've written a To reboot sets of boxes based on which virthost they are we've written a
special script which facilitates it: special script which facilitates it:
@ -391,7 +387,7 @@ func-vhost-reboot virthost-fqdn
ex: ex:
.... ....
sudo func-vhost-reboot virthost13.phx2.fedoraproject.org sudo func-vhost-reboot virthost13.iad2.fedoraproject.org
.... ....
== Aftermath == Aftermath
@ -399,13 +395,11 @@ sudo func-vhost-reboot virthost13.phx2.fedoraproject.org
[arabic] [arabic]
. Make sure that everything's running fine . Make sure that everything's running fine
. Reenable nagios notification as needed . Reenable nagios notification as needed
. {blank} . Make sure to perform any manual post-boot setup (such as entering
+
Make sure to perform any manual post-boot setup (such as entering::
passphrases for encrypted volumes) passphrases for encrypted volumes)
. Close outage ticket. . Close outage ticket.
=== Non virthost reboots: === Non virthost reboots
If you need to reboot specific hosts and make sure they recover - If you need to reboot specific hosts and make sure they recover -
consider using: consider using:
@ -415,4 +409,4 @@ sudo func-host-reboot hostname hostname1 hostname2 ...
.... ....
If you want to reboot the hosts one at a time waiting for each to come If you want to reboot the hosts one at a time waiting for each to come
back before rebooting the next pass a -o to func-host-reboot. back before rebooting the next pass a `-o` to `func-host-reboot`.