148 lines
4.5 KiB
Text
148 lines
4.5 KiB
Text
= Loopabull
|
|
|
|
https://github.com/maxamillion/loopabull[Loopabull] is an event-driven
|
|
https://www.ansible.com/[Ansible]-based automation engine. This is used
|
|
for various tasks, originally slated for
|
|
https://pagure.io/releng-automation[Release Engineering Automation].
|
|
|
|
== Contents
|
|
|
|
[arabic]
|
|
. Contact Information
|
|
. Overview
|
|
. Setup
|
|
. Outage
|
|
|
|
== Contact Information
|
|
|
|
Owner::
|
|
Adam Miller (maxamillion) Pierre-Yves Chibon (pingou)
|
|
Contact::
|
|
#fedora-admin, #fedora-releng, #fedora-noc, sysadmin-main,
|
|
sysadmin-releng
|
|
Location::
|
|
loopabull01.phx2.fedoraproject.org
|
|
loopabull01.stg.phx2.fedoraproject.org
|
|
Purpose::
|
|
Event Driven Automation of tasks within the Fedora Infrastructure and
|
|
Fedora Release Engineering
|
|
|
|
== Overview
|
|
|
|
The https://github.com/maxamillion/loopabull[loopabull] system is setup
|
|
such that an event will take place within the infrastructure and a
|
|
http://www.fedmsg.com/en/latest/[fedmsg] is sent, then loopabull will
|
|
consume that message, trigger an https://www.ansible.com/[Ansible]
|
|
http://docs.ansible.com/ansible/playbooks.html[playbook] that shares a
|
|
name with the fedmsg topic, and provide the payload of the fedmsg to the
|
|
playbook as
|
|
https://github.com/ansible/ansible/blob/devel/docs/man/man1/ansible-playbook.1.asciidoc.in[extra
|
|
variables].
|
|
|
|
== Setup
|
|
|
|
The setup is relatively simple, the Overview above describes it and a
|
|
more detailed version can be found in the [.title-ref]#releng docs#.
|
|
|
|
....
|
|
+-----------------+ +-------------------------------+
|
|
| | | |
|
|
| fedmsg +------------>| Looper |
|
|
| | | (fedmsg handler plugin) |
|
|
| | | |
|
|
+-----------------+ +-------------------------------+
|
|
|
|
|
|
|
|
+-------------------+ |
|
|
| | |
|
|
| | |
|
|
| Loopabull +<-------------+
|
|
| (Event Loop) |
|
|
| |
|
|
+---------+---------+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
V
|
|
+----------+-----------+
|
|
| |
|
|
| ansible-playbook |
|
|
| |
|
|
+----------------------+
|
|
....
|
|
|
|
=== Deployment
|
|
|
|
Loopabull is deployed on two hosts, one for the production instance:
|
|
`loopabull01.prod.phx2.fedoraproject.org` and one for the staging
|
|
instance: `loopabull01.stg.phx2.fedoraproject.org`.
|
|
|
|
Each host is running loopabull with 5 workers reacting to fedmsg
|
|
notifications.
|
|
|
|
== Expanding loopabull
|
|
|
|
The documentation to expand loopabull's usage is documented at:
|
|
https://pagure.io/Fedora-Infra/loopabull-tasks
|
|
|
|
== Outage
|
|
|
|
In the event that loopabull isn't responding or isn't running playbooks
|
|
as it should be, the following scenarios should be approached.
|
|
|
|
=== What is going on?
|
|
|
|
There are a few commands that may help figuring out what is going:
|
|
|
|
* Check the status of the different services:
|
|
|
|
....
|
|
systemctl |grep loopabull
|
|
....
|
|
|
|
* Follow the logs of the different services:
|
|
|
|
....
|
|
journalctl -lfu loopabull -u loopabull@1 -u loopabull@2 -u loopabull@3 \
|
|
-u loopabull@4 -u loopabull@5
|
|
....
|
|
|
|
If a playbook returns a non-zero error code, the worker running it will
|
|
be stopped. If that happens, you may want to carefully review the logs
|
|
to assess what lead to this situation so it can be prevented in the
|
|
future.
|
|
|
|
* Monitoring the queue size
|
|
|
|
The loopabull service listens to the fedmsg bus and puts the messages as
|
|
they come into a rabbitmq/amqp queue for the workers to process. If you
|
|
want to see the number of messages pending to be processed by the
|
|
workers you can check the queue size using:
|
|
|
|
....
|
|
rabbitmqctl list_queues
|
|
....
|
|
|
|
The output will be something like:
|
|
|
|
....
|
|
Listing queues ...
|
|
workers 489989
|
|
...done.
|
|
....
|
|
|
|
Where `workers` is the name of the queue used by loopabull and `489989`
|
|
the number of messages in that queue (yes that day we were recovering
|
|
from a several-day long outage).
|
|
|
|
=== Network Interruption
|
|
|
|
Sometimes if the network is interrupted, the loopabull service will hang
|
|
because the fedmsg listener will hold a dead socket open. The service
|
|
and its workers simply needs to be restarted at that point.
|
|
|
|
....
|
|
systemctl restart loopabull loopabull@1 loopabull@2 loopabull@3 \
|
|
loopabull@4 loopabull@5
|
|
....
|