Add a note about duplicated messages with timescaledb

Signed-off-by: Pierre-Yves Chibon <pingou@pingoured.fr>
This commit is contained in:
Pierre-Yves Chibon 2021-02-15 14:57:01 +01:00
parent dc8d7b0d87
commit 8971cc5538

View file

@ -66,6 +66,7 @@ timescaledb uses table partitioning as well.
This leads to the same issue with the foreign key constraints that we have seen
in the plain partitioning approach we took.
Foreign key considerations
~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -93,6 +94,22 @@ database is mostly about inserts and has no updates or deletes, we don't foresee
much problems with this.
Duplicated messages
~~~~~~~~~~~~~~~~~~~
When testing datagrepper and datanommer in our test instance with the timescaledb
plugin, we saw a number of duplicated messages showing up in the `/raw` endpoint.
Checking if we could fix this server side, we found out that the previous database
schema had an `UNIQUE` constraint on `msg_id` field. However, with the timescaledb
plugin, that constraint is now on both `msg_id` and `timestamp` fields, meaning
a message can be inserted twice in the database if there is a little delay between
the two inserts.
However, migrating datanommer from fedmsg to fedora-messaging should resolve that
issue client side as rabbitmq will ensure there is only one consumer at a time
handling a message.
Open questions
--------------