ansible/playbooks/vhost_reboot.yml

#
# This playbook lets you safely reboot a virthost and all it's guests. 
# 
# requires --extra-vars="target=somevhost fqdn"

#General overview:
# talk to the vhost
# get back list of instances
# add each of their hostnames to an addhoc group
# halt each of them in a second play
# wait for them to die
# third play, reboot the vhost
#     wait for vhost to come back

# TODO: Figure out how to compare virt info pre and post boot. 

- name: find instances
  hosts: "{{ target }}"
  gather_facts: False
  user: root

  tasks:
  - name: get list of guests
    virt: command=list_vms
    register: vmlist

#  - name: get info on guests (prereboot)
#    virt: command=info
#    register: vminfo_pre

  - name: add them to myvms_new group
    local_action: add_host hostname={{ item }} groupname=myvms_new
    with_items: vmlist.list_vms

- name: halt instances
  hosts: myvms_new
  user: root
  gather_facts: False
  serial: 1

  tasks:
  - name: schedule regular host downtime
    nagios: action=downtime minutes=30 service=host host={{ inventory_hostname_short }}
    delegate_to: noc01.phx2.fedoraproject.org
    ignore_errors: true
    when: not inventory_hostname.startswith('stg')

  - name: schedule stg host downtime
    nagios: action=downtime minutes=30 service=host host={{ inventory_hostname_short }}.stg
    delegate_to: noc01.phx2.fedoraproject.org
    ignore_errors: true
    when: inventory_hostname.startswith('stg')

  - name: halt the vm instances - to poweroff
    command: /sbin/halt -p
    ignore_errors: true
    # if one of them is down we don't care

- name: wait for the whole set to die.
  hosts: myvms_new
  gather_facts: False
  user: root

  tasks:
  - name: wait for them to die
    local_action: wait_for port=22 delay=30 timeout=300 state=stopped host={{ inventory_hostname }}

- name: reboot vhost
  hosts: "{{ target }}"
  gather_facts: False
  user: root

  tasks:
  - name: tell nagios to shush
    nagios: action=downtime minutes=60 service=host host={{ inventory_hostname }}
    delegate_to: noc01.phx2.fedoraproject.org
    ignore_errors: true

  - name: reboot the virthost
    command: /sbin/reboot

  - name: wait for virthost to come back - up to 6 minutes
    local_action: wait_for host={{ target }} port=22 delay=120 timeout=420

  - name: wait for libvirtd to come back on the virthost
    wait_for: path=/var/run/libvirtd.pid state=present

  - name: look up vmlist
    virt: command=list_vms
    register: newvmlist

  - name: sync time
    command: ntpdate -u 66.187.233.4

  - name: serverbeach hosts need a special iptables config
    command: /root/fix-iptables.sh
    when: inventory_hostname_short.startswith('serverbeach')

#  - name: get info on guests (postreboot)
#    virt: command=info
#    register: vminfo_post
Update for current syntax, reorder, add serverbeach thing. 2013-11-21 21:52:05 +00:00			`#`
			`# This playbook lets you safely reboot a virthost and all it's guests.`
			`#`
migrate all the script-like playbooks such that the primary host-spec is the same variable: $target 2013-03-04 22:37:13 +00:00			`# requires --extra-vars="target=somevhost fqdn"`
update notes on what else needs to be done 2012-11-21 21:09:19 +00:00
			`#General overview:`
trial vhost-reboot playbooks 2012-11-21 17:56:10 +00:00			`# talk to the vhost`
			`# get back list of instances`
			`# add each of their hostnames to an addhoc group`
			`# halt each of them in a second play`
update notes on what else needs to be done 2012-11-21 21:09:19 +00:00			`# wait for them to die`
trial vhost-reboot playbooks 2012-11-21 17:56:10 +00:00			`# third play, reboot the vhost`
			`# wait for vhost to come back`
update notes on what else needs to be done 2012-11-21 21:09:19 +00:00
Update for current syntax, reorder, add serverbeach thing. 2013-11-21 21:52:05 +00:00			`# TODO: Figure out how to compare virt info pre and post boot.`
trial vhost-reboot playbooks 2012-11-21 17:56:10 +00:00
			`- name: find instances`
Update for current syntax, reorder, add serverbeach thing. 2013-11-21 21:52:05 +00:00			`hosts: "{{ target }}"`
			`gather_facts: False`
trial vhost-reboot playbooks 2012-11-21 17:56:10 +00:00			`user: root`

			`tasks:`
			`- name: get list of guests`
Update for current syntax, reorder, add serverbeach thing. 2013-11-21 21:52:05 +00:00			`virt: command=list_vms`
trial vhost-reboot playbooks 2012-11-21 17:56:10 +00:00			`register: vmlist`
typo in the yaml 2012-11-21 17:59:07 +00:00
Update for current syntax, reorder, add serverbeach thing. 2013-11-21 21:52:05 +00:00			`# - name: get info on guests (prereboot)`
			`# virt: command=info`
			`# register: vminfo_pre`

update notes on what else needs to be done 2012-11-21 21:09:19 +00:00			`- name: add them to myvms_new group`
More fixes 2013-11-21 22:07:02 +00:00			`local_action: add_host hostname={{ item }} groupname=myvms_new`
			`with_items: vmlist.list_vms`
trial vhost-reboot playbooks 2012-11-21 17:56:10 +00:00
			`- name: halt instances`
			`hosts: myvms_new`
			`user: root`
Update for current syntax, reorder, add serverbeach thing. 2013-11-21 21:52:05 +00:00			`gather_facts: False`
try out vhost-reboot with nagios cancelling of guests and vhost 2013-01-28 21:27:18 +00:00			`serial: 1`
trial vhost-reboot playbooks 2012-11-21 17:56:10 +00:00
			`tasks:`
Update for current syntax, reorder, add serverbeach thing. 2013-11-21 21:52:05 +00:00			`- name: schedule regular host downtime`
			`nagios: action=downtime minutes=30 service=host host={{ inventory_hostname_short }}`
Readd nagios stuff here, it should work now. 2013-03-01 16:24:16 +00:00			`delegate_to: noc01.phx2.fedoraproject.org`
you have to ignore errors if you're rebooting nagios :) 2013-05-07 21:50:45 +00:00			`ignore_errors: true`
Not! 2013-11-21 22:41:51 +00:00			`when: not inventory_hostname.startswith('stg')`
Try and fix this stg downtime issue. 2013-11-21 20:19:48 +00:00
Update for current syntax, reorder, add serverbeach thing. 2013-11-21 21:52:05 +00:00			`- name: schedule stg host downtime`
			`nagios: action=downtime minutes=30 service=host host={{ inventory_hostname_short }}.stg`
Try and fix this stg downtime issue. 2013-11-21 20:19:48 +00:00			`delegate_to: noc01.phx2.fedoraproject.org`
			`ignore_errors: true`
More fixes 2013-11-21 22:07:02 +00:00			`when: inventory_hostname.startswith('stg')`
try out vhost-reboot with nagios cancelling of guests and vhost 2013-01-28 21:27:18 +00:00
Update for current syntax, reorder, add serverbeach thing. 2013-11-21 21:52:05 +00:00			`- name: halt the vm instances - to poweroff`
			`command: /sbin/halt -p`
more involved - temp check 2012-11-21 18:45:20 +00:00			`ignore_errors: true`
			`# if one of them is down we don't care`

Try to wait for downed vguests in parallel. 2013-11-21 10:24:17 +00:00			`- name: wait for the whole set to die.`
			`hosts: myvms_new`
Update for current syntax, reorder, add serverbeach thing. 2013-11-21 21:52:05 +00:00			`gather_facts: False`
Try to wait for downed vguests in parallel. 2013-11-21 10:24:17 +00:00			`user: root`

			`tasks:`
more involved - temp check 2012-11-21 18:45:20 +00:00			`- name: wait for them to die`
Update for current syntax, reorder, add serverbeach thing. 2013-11-21 21:52:05 +00:00			`local_action: wait_for port=22 delay=30 timeout=300 state=stopped host={{ inventory_hostname }}`
Try to wait for downed vguests in parallel. 2013-11-21 10:24:17 +00:00
echo'y 2012-11-21 18:22:03 +00:00			`- name: reboot vhost`
Update for current syntax, reorder, add serverbeach thing. 2013-11-21 21:52:05 +00:00			`hosts: "{{ target }}"`
			`gather_facts: False`
echo'y 2012-11-21 18:22:03 +00:00			`user: root`

			`tasks:`
Readd nagios stuff here, it should work now. 2013-03-01 16:24:16 +00:00			`- name: tell nagios to shush`
Update for current syntax, reorder, add serverbeach thing. 2013-11-21 21:52:05 +00:00			`nagios: action=downtime minutes=60 service=host host={{ inventory_hostname }}`
Readd nagios stuff here, it should work now. 2013-03-01 16:24:16 +00:00			`delegate_to: noc01.phx2.fedoraproject.org`
you have to ignore errors if you're rebooting nagios :) 2013-05-07 21:50:45 +00:00			`ignore_errors: true`
try out vhost-reboot with nagios cancelling of guests and vhost 2013-01-28 21:27:18 +00:00
Update for current syntax, reorder, add serverbeach thing. 2013-11-21 21:52:05 +00:00			`- name: reboot the virthost`
			`command: /sbin/reboot`
more involved - temp check 2012-11-21 18:45:20 +00:00
Update for current syntax, reorder, add serverbeach thing. 2013-11-21 21:52:05 +00:00			`- name: wait for virthost to come back - up to 6 minutes`
			`local_action: wait_for host={{ target }} port=22 delay=120 timeout=420`
more involved - temp check 2012-11-21 18:45:20 +00:00
Lets try this to fix issues 2013-12-19 20:25:13 +00:00			`- name: wait for libvirtd to come back on the virthost`
Workaround socket thing 2013-12-19 21:07:50 +00:00			`wait_for: path=/var/run/libvirtd.pid state=present`
Lets try this to fix issues 2013-12-19 20:25:13 +00:00
not sure how to compare old to new, yet 2012-11-21 18:58:50 +00:00			`- name: look up vmlist`
Update for current syntax, reorder, add serverbeach thing. 2013-11-21 21:52:05 +00:00			`virt: command=list_vms`
more involved - temp check 2012-11-21 18:45:20 +00:00			`register: newvmlist`
update notes on what else needs to be done 2012-11-21 21:09:19 +00:00
Add a ntpdate after the vhost comes back up. 2013-11-21 20:36:36 +00:00			`- name: sync time`
			`command: ntpdate -u 66.187.233.4`
Update for current syntax, reorder, add serverbeach thing. 2013-11-21 21:52:05 +00:00
			`- name: serverbeach hosts need a special iptables config`
			`command: /root/fix-iptables.sh`
			`when: inventory_hostname_short.startswith('serverbeach')`

			`# - name: get info on guests (postreboot)`
			`# virt: command=info`
			`# register: vminfo_post`