= RabbitMQ SOP https://www.rabbitmq.com/[RabbitMQ] is the message broker Fedora uses to allow applications to send each other (or themselves) messages. == Contact Information === Owner Fedora Infrastructure Team === Contact #fedora-admin === Servers * rabbitmq0[1-3].iad2.fedoraproject.org * rabbitmq0[1-3].stg.iad2.fedoraproject.org === Purpose General purpose publish-subscribe message broker as well as application-specific messaging. == Description RabbitMQ is a message broker written in Erlang that offers a number of interfaces including AMQP 0.9.1, AMQP 1.0, STOMP, and MQTT. At this time only AMQP 0.9.1 is made available to clients. Fedora uses the RabbitMQ packages provided by the Red Hat Openstack repository as it has a more up-to-date version. === The Cluster RabbitMQ supports https://www.rabbitmq.com/clustering.html[clustering] a set of hosts into a single logical message broker. The Fedora cluster is composed of 3 nodes, rabbitmq01-03, in both staging and production. `groups/rabbitmq.yml` is the playbook that deploys the cluster. === Virtual Hosts The cluster contains a number of virtual hosts. Each virtual host has its own set of resources - exchanges, bindings, queues - and users are given permissions by virtual host. ==== /pubsub The /pubsub virtual host is the generic publish-subscribe virtual host used by most applications. Messages published via AMQP are sent to the "amq.topic" exchange. ==== /public_pubsub This virtual host has the "amq.topic" and "zmq.topic" exchanges from /pubsub https://www.rabbitmq.com/federation.html[federated] to it, and we allow anyone on the Internet to connect to this virtual host. For the moment it is on the same broker cluster, but if people abuse it it can be moved to a separate cluster. === Authentication Clients authenticate to the broker using x509 certificates. The common name of the certificate needs to match the username of a user in RabbitMQ. == Troubleshooting RabbitMQ offers a CLI, rabbitmqctl, which you can use on any node in the cluster. It also offers a web interface for management and monitoring, but that is not currently configured. === Network Partition In case of network partitions, the RabbitMQ cluster should handle it and recover on its own. In case it doesn't when the network situation is fixed, the partition can be diagnosed with `rabbitmqctl cluster_status`. It should include the line `{partitions,[]},` (empty array). If the array is not empty, the first nodes in the array can be restartedi one by one, but make sure you give them plenty of time to sync messages after restart (this can be watched in the `/var/log/rabbitmq/rabbit.log` file) === Federation Status Federation is the process of copying messages from the internal `/pubsub` vhost to the external `/public_pubsub` vhost. During network partitions, it has been seen that the Federation relaying process does not come back up. The federation status can be checked with the command `rabbitmqctl eval 'rabbit_federation_status:status().'` on `rabbitmq01`. It should not return the empty array (`[]`) but something like: .... [[{exchange,<<"amq.topic">>}, {upstream_exchange,<<"amq.topic">>}, {type,exchange}, {vhost,<<"/public_pubsub">>}, {upstream,<<"pubsub-to-public_pubsub">>}, {id,<<"b40208be0a999cc93a78eb9e41531618f96d4cb2">>}, {status,running}, {local_connection,<<"">>}, {uri,<<"amqps://rabbitmq01.iad2.fedoraproject.org/%2Fpubsub">>}, {timestamp,{{2020,3,11},{16,45,18}}}], [{exchange,<<"zmq.topic">>}, {upstream_exchange,<<"zmq.topic">>}, {type,exchange}, {vhost,<<"/public_pubsub">>}, {upstream,<<"pubsub-to-public_pubsub">>}, {id,<<"c1e7747425938349520c60dda5671b2758e210b8">>}, {status,running}, {local_connection,<<"">>}, {uri,<<"amqps://rabbitmq01.iad2.fedoraproject.org/%2Fpubsub">>}, {timestamp,{{2020,3,11},{16,45,17}}}]] .... If the empty array is returned, the following command will restart the federation (again on `rabbitmq01`): .... rabbitmqctl clear_policy -p /public_pubsub pubsub-to-public_pubsub rabbitmqctl set_policy -p /public_pubsub --apply-to exchanges pubsub-to-public_pubsub "^(amq|zmq)\.topic$" '{"federation-upstream":"pubsub-to-public_pubsub"}' .... After which the Federation link status can be checked with the same command as before.