From 8971cc553863c2884a996f49e39dc58e2f57cd56 Mon Sep 17 00:00:00 2001 From: Pierre-Yves Chibon Date: Mon, 15 Feb 2021 14:57:01 +0100 Subject: [PATCH] Add a note about duplicated messages with timescaledb Signed-off-by: Pierre-Yves Chibon --- docs/datanommer_datagrepper/pg_timescaledb.rst | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/docs/datanommer_datagrepper/pg_timescaledb.rst b/docs/datanommer_datagrepper/pg_timescaledb.rst index cf377ff..030a505 100644 --- a/docs/datanommer_datagrepper/pg_timescaledb.rst +++ b/docs/datanommer_datagrepper/pg_timescaledb.rst @@ -66,6 +66,7 @@ timescaledb uses table partitioning as well. This leads to the same issue with the foreign key constraints that we have seen in the plain partitioning approach we took. + Foreign key considerations ~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -93,6 +94,22 @@ database is mostly about inserts and has no updates or deletes, we don't foresee much problems with this. +Duplicated messages +~~~~~~~~~~~~~~~~~~~ + +When testing datagrepper and datanommer in our test instance with the timescaledb +plugin, we saw a number of duplicated messages showing up in the `/raw` endpoint. +Checking if we could fix this server side, we found out that the previous database +schema had an `UNIQUE` constraint on `msg_id` field. However, with the timescaledb +plugin, that constraint is now on both `msg_id` and `timestamp` fields, meaning +a message can be inserted twice in the database if there is a little delay between +the two inserts. + +However, migrating datanommer from fedmsg to fedora-messaging should resolve that +issue client side as rabbitmq will ensure there is only one consumer at a time +handling a message. + + Open questions --------------