128 lines
4.3 KiB
Text
128 lines
4.3 KiB
Text
= RabbitMQ SOP
|
|
|
|
https://www.rabbitmq.com/[RabbitMQ] is the message broker Fedora uses to allow applications
|
|
to send each other (or themselves) messages.
|
|
|
|
== Contact Information
|
|
|
|
=== Owner
|
|
|
|
Fedora Infrastructure Team
|
|
|
|
=== Contact
|
|
|
|
#fedora-admin
|
|
|
|
=== Servers
|
|
|
|
* rabbitmq0[1-3].rdu3.fedoraproject.org
|
|
* rabbitmq0[1-3].stg.rdu3.fedoraproject.org
|
|
|
|
=== Purpose
|
|
|
|
General purpose publish-subscribe message broker as well as
|
|
application-specific messaging.
|
|
|
|
== Description
|
|
|
|
RabbitMQ is a message broker written in Erlang that offers a number of
|
|
interfaces including AMQP 0.9.1, AMQP 1.0, STOMP, and MQTT. At this time
|
|
only AMQP 0.9.1 is made available to clients.
|
|
|
|
Fedora uses the RabbitMQ packages provided by the Red Hat Openstack
|
|
repository as it has a more up-to-date version.
|
|
|
|
=== The Cluster
|
|
|
|
RabbitMQ supports https://www.rabbitmq.com/clustering.html[clustering]
|
|
a set of hosts into a single logical
|
|
message broker. The Fedora cluster is composed of 3 nodes,
|
|
rabbitmq01-03, in both staging and production. `groups/rabbitmq.yml` is
|
|
the playbook that deploys the cluster.
|
|
|
|
=== Virtual Hosts
|
|
|
|
The cluster contains a number of virtual hosts. Each virtual host has
|
|
its own set of resources - exchanges, bindings, queues - and users are
|
|
given permissions by virtual host.
|
|
|
|
==== /pubsub
|
|
|
|
The /pubsub virtual host is the generic publish-subscribe virtual host
|
|
used by most applications. Messages published via AMQP are sent to the
|
|
"amq.topic" exchange.
|
|
|
|
==== /public_pubsub
|
|
|
|
This virtual host has the "amq.topic" and "zmq.topic" exchanges from
|
|
/pubsub https://www.rabbitmq.com/federation.html[federated] to it,
|
|
and we allow anyone on the Internet to
|
|
connect to this virtual host. For the moment it is on the same broker
|
|
cluster, but if people abuse it it can be moved to a separate cluster.
|
|
|
|
=== Authentication
|
|
|
|
Clients authenticate to the broker using x509 certificates. The common
|
|
name of the certificate needs to match the username of a user in
|
|
RabbitMQ.
|
|
|
|
== Troubleshooting
|
|
|
|
RabbitMQ offers a CLI, rabbitmqctl, which you can use on any node in the
|
|
cluster. It also offers a web interface for management and monitoring,
|
|
but that is not currently configured.
|
|
|
|
=== Network Partition
|
|
|
|
In case of network partitions, the RabbitMQ cluster should handle it and
|
|
recover on its own. In case it doesn't when the network situation is
|
|
fixed, the partition can be diagnosed with `rabbitmqctl cluster_status`.
|
|
It should include the line `{partitions,[]},` (empty array).
|
|
|
|
If the array is not empty, the first nodes in the array can be
|
|
restartedi one by one, but make sure you give them plenty of time to
|
|
sync messages after restart (this can be watched in the
|
|
`/var/log/rabbitmq/rabbit.log` file)
|
|
|
|
=== Federation Status
|
|
|
|
Federation is the process of copying messages from the internal
|
|
`/pubsub` vhost to the external `/public_pubsub` vhost. During network
|
|
partitions, it has been seen that the Federation relaying process does
|
|
not come back up. The federation status can be checked with the command
|
|
`rabbitmqctl eval 'rabbit_federation_status:status().'` on `rabbitmq01`.
|
|
It should not return the empty array (`[]`) but something like:
|
|
|
|
....
|
|
[[{exchange,<<"amq.topic">>},
|
|
{upstream_exchange,<<"amq.topic">>},
|
|
{type,exchange},
|
|
{vhost,<<"/public_pubsub">>},
|
|
{upstream,<<"pubsub-to-public_pubsub">>},
|
|
{id,<<"b40208be0a999cc93a78eb9e41531618f96d4cb2">>},
|
|
{status,running},
|
|
{local_connection,<<"<rabbit@rabbitmq01.rdu3.fedoraproject.org.2.8709.481>">>},
|
|
{uri,<<"amqps://rabbitmq01.rdu3.fedoraproject.org/%2Fpubsub">>},
|
|
{timestamp,{{2020,3,11},{16,45,18}}}],
|
|
[{exchange,<<"zmq.topic">>},
|
|
{upstream_exchange,<<"zmq.topic">>},
|
|
{type,exchange},
|
|
{vhost,<<"/public_pubsub">>},
|
|
{upstream,<<"pubsub-to-public_pubsub">>},
|
|
{id,<<"c1e7747425938349520c60dda5671b2758e210b8">>},
|
|
{status,running},
|
|
{local_connection,<<"<rabbit@rabbitmq01.rdu3.fedoraproject.org.2.8718.481>">>},
|
|
{uri,<<"amqps://rabbitmq01.rdu3.fedoraproject.org/%2Fpubsub">>},
|
|
{timestamp,{{2020,3,11},{16,45,17}}}]]
|
|
....
|
|
|
|
If the empty array is returned, the following command will restart the
|
|
federation (again on `rabbitmq01`):
|
|
|
|
....
|
|
rabbitmqctl clear_policy -p /public_pubsub pubsub-to-public_pubsub
|
|
rabbitmqctl set_policy -p /public_pubsub --apply-to exchanges pubsub-to-public_pubsub "^(amq|zmq)\.topic$" '{"federation-upstream":"pubsub-to-public_pubsub"}'
|
|
....
|
|
|
|
After which the Federation link status can be checked with the same
|
|
command as before.
|