fix parsing errors and sphinx warnings

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2023-11-16 08:02:56 +10:00 · 2023-11-16 08:02:56 +10:00 · ba720c3d77
commit ba720c3d77
parent 8fb9b2fdf0
98 changed files with 4799 additions and 4788 deletions
--- a/docs/datanommer_datagrepper/datanommer.rst
+++ b/docs/datanommer_datagrepper/datanommer.rst
@ -3,15 +3,15 @@
 Datanommer
 ==========

-* Reads-in messages from the bus
-* Stores them into the database
+- Reads-in messages from the bus
+- Stores them into the database

 Database tables
 ---------------

 Here is how the database schema looks like currently:

-::
+.. code-block::

    datanommer=# \dt
                   List of relations
@ -24,14 +24,12 @@ Here is how the database schema looks like currently:
     public | user             | table | datanommer
     public | user_messages    | table | datanommer

-
 Table sizes
 -----------

 Here is the size of each table:

-
-::
+.. code-block::

    datanommer-#
       SELECT
@ -49,16 +47,15 @@ Here is the size of each table:
     alembic_version  | 8192 bytes | 0 bytes
    (6 rows)

+The 3 columns are:

-The 3 columns are::
+.. code-block::

    Table – The name of the table
    Size – The total size that this table takes
    External Size – The size that related objects of this table like indices take

-
-
-::
+.. code-block::

    datanommer=#
      SELECT
@ -109,12 +106,15 @@ The 3 columns are::
     sql_features                               | r          |            716 | 64 kB
    (37 rows)

+The 4 columns are:

-The 4 columns are::
+.. code-block::

    objectname – The name of the object
    objecttype – r for the table, i for an index, t for toast data, ...
    #entries – The number of entries in the object (e.g. rows)
    size – The size of the object

-(source for these queries: https://wiki-bsse.ethz.ch/display/ITDOC/Check+size+of+tables+and+objects+in+PostgreSQL+database )
+(source for these queries:
+https://wiki-bsse.ethz.ch/display/ITDOC/Check+size+of+tables+and+objects+in+PostgreSQL+database
+)
--- a/docs/datanommer_datagrepper/default_delta.rst
+++ b/docs/datanommer_datagrepper/default_delta.rst
@ -1,18 +1,17 @@
 Default delta
 =============

-Checking the current status of datagrepper, we realized that not specifying a
-`delta` value in the URL led to timeouts while specifying one, makes datagrepper
-return properly.
+Checking the current status of datagrepper, we realized that not specifying a `delta`
+value in the URL led to timeouts while specifying one, makes datagrepper return
+properly.

-Investigating the configuration options of datagrepper, we found out that
-there is a `DEFAULT_QUERY_DELTA` configuration key that allows to specify a
-default delta value when one is not specified.
+Investigating the configuration options of datagrepper, we found out that there is a
+`DEFAULT_QUERY_DELTA` configuration key that allows to specify a default delta value
+when one is not specified.

 Just setting that configuration key to ``60*60*24*3`` (ie: 3 days) improves the
-datagrepper performances quite a bit (as in queries actually return instead of
-timing out).
+datagrepper performances quite a bit (as in queries actually return instead of timing
+out).

-
-That configuration change, does break the API a little bit as with it, it will
-limit the messages returned to the last 3 days.
+That configuration change, does break the API a little bit as with it, it will limit the
+messages returned to the last 3 days.
--- a/docs/datanommer_datagrepper/index.rst
+++ b/docs/datanommer_datagrepper/index.rst
@ -4,21 +4,18 @@ Datanommer / Datagrepper
 Datanommer
 ----------

-* Reads-in messages from the bus
-* Stores them into the database
+- Reads-in messages from the bus
+- Stores them into the database

 .. toctree::
    :maxdepth: 1

    datanommer

-
 Datagrepper
 -----------

-* Exposes the messages in the database via an API with different filtering
-  capacity
-
+- Exposes the messages in the database via an API with different filtering capacity

 Investigation
 -------------
@ -35,49 +32,43 @@ Here is the list of ideas/things we looked at:
    pg_array_column_postgrest
    stats

-
 Conclusions
 -----------

-We have investigated different ways to improve the database storing our 180
-millions messages. While we considered looking at the datagrepper application
-itself as well, we considered that replacing datagrepper with another application
-would have too large consequences. We have a number of applications in our
-realm that rely on datagrepper's API and there is an unknown number of applications
-outside our realm that make use of it as well.
-Breaking all of these applications is a non-goal for us. For this reason we
+We have investigated different ways to improve the database storing our 180 millions
+messages. While we considered looking at the datagrepper application itself as well, we
+considered that replacing datagrepper with another application would have too large
+consequences. We have a number of applications in our realm that rely on datagrepper's
+API and there is an unknown number of applications outside our realm that make use of it
+as well. Breaking all of these applications is a non-goal for us. For this reason we
 focused on postgresql first.

-We looked at different solutions, starting with manually partitioning on year,
-then on ``id`` (not ``msg_id``, the primary key field ``id`` which is an integer).
-We then looked at using the postgresql plugin `timescaledb` and finally we looked
-at using this plugin together with a database model change where the relation
-tables are merged into the main ``messages`` table and their is stored using
-arrays.
+We looked at different solutions, starting with manually partitioning on year, then on
+``id`` (not ``msg_id``, the primary key field ``id`` which is an integer). We then
+looked at using the postgresql plugin `timescaledb` and finally we looked at using this
+plugin together with a database model change where the relation tables are merged into
+the main ``messages`` table and their is stored using arrays.

-Based on our investigations, our recommendation is to migrate the postgresql
-database to use the `timescaledb` plugin and configure datagrepper to have a
-default delta value via ``DEFAULT_QUERY_DELTA``.
+Based on our investigations, our recommendation is to migrate the postgresql database to
+use the `timescaledb` plugin and configure datagrepper to have a default delta value via
+``DEFAULT_QUERY_DELTA``.

 As a picture is worth a thousand words:

 .. image:: ../_static/datanommer_percent_sucess.jpg
    :target: ../_images/datanommer_percent_sucess.jpg

-
-We checked, setting a ``DEFAULT_QUERY_DELTA`` alone provides already some
-performance gain, using `timescaledb` with ``DEFAULT_QUERY_DELTA`` provide the
-most gain but using `timescaledb` without ``DEFAULT_QUERY_DELTA`` brings back
-the time out issues we are seeing today when datagrepper is queried without a
-specified ``delta`` value.
+We checked, setting a ``DEFAULT_QUERY_DELTA`` alone provides already some performance
+gain, using `timescaledb` with ``DEFAULT_QUERY_DELTA`` provide the most gain but using
+`timescaledb` without ``DEFAULT_QUERY_DELTA`` brings back the time out issues we are
+seeing today when datagrepper is queried without a specified ``delta`` value.

 We also believe that the performance gain observed with `timescaledb` could be
-reproduced if we were to do the partitioning ourself on the ``timestamp`` field
-of the ``messages`` table. However, it would mean that we have to manually
-maintain that partitioning, take care of creating the new partitions as needed
-and so on, while `timescaledb` provides all of this for us automatically, thus
-simplifying the long term maintenance of that database.
-
+reproduced if we were to do the partitioning ourself on the ``timestamp`` field of the
+``messages`` table. However, it would mean that we have to manually maintain that
+partitioning, take care of creating the new partitions as needed and so on, while
+`timescaledb` provides all of this for us automatically, thus simplifying the long term
+maintenance of that database.

 Proposed roadmap
 ~~~~~~~~~~~~~~~~
@ -86,40 +77,34 @@ We propose the following roadmap to improve datanommer and datagrepper:

 0/ Announce the upcoming API breakage and outage of datagrepper

-Be loud about the upcoming changes and explain how the API breakage can be
-mitigated.
-
+Be loud about the upcoming changes and explain how the API breakage can be mitigated.

 1/ Port datanommer to fedora-messaging and openshift

-This will ensure that there are no duplicate messages are saved in the database
-(cf our ref:`timescaledb_findings`).
-It will also provide a way to store the messages while datagrepper is being
-upgraded (which will require an outage). Using lazy queues in rabbitmq may be
-a way to store the high number of messages that will pile up during the outage
-window (which will be over 24h).
+This will ensure that there are no duplicate messages are saved in the database (cf our
+ref:`timescaledb_findings`). It will also provide a way to store the messages while
+datagrepper is being upgraded (which will require an outage). Using lazy queues in
+rabbitmq may be a way to store the high number of messages that will pile up during the
+outage window (which will be over 24h).

 Rabbitmq lazy queues: https://www.rabbitmq.com/lazy-queues.html

-
 2/ Port datagrepper to timescaledb.

-This will improve the performance of the UI. Thanks to rabbitmq, no messages will
-be lost, they will only show up in datagrepper at the end of the outage and
-with a delayed timestamp.
+This will improve the performance of the UI. Thanks to rabbitmq, no messages will be
+lost, they will only show up in datagrepper at the end of the outage and with a delayed
+timestamp.

 3/ Configure datagrepper to have a ``DEFAULT_QUERY_DELTA``.

-This will simply bound a number of queries which otherwise run slow and lead to
-timeouts at the application level.
-
+This will simply bound a number of queries which otherwise run slow and lead to timeouts
+at the application level.

 4/ Port datagrepper to openshift

 This will make it easier to maintain and/or scale as needed.

-
 5/ Port datagrepper to fedora-messaging

-This will allow to make use of the fedora-messaging schemas provided by the
-applications instead of relying on `fedmsg_meta_fedora_infrastructure`.
+This will allow to make use of the fedora-messaging schemas provided by the applications
+instead of relying on `fedmsg_meta_fedora_infrastructure`.
--- a/docs/datanommer_datagrepper/pg_array_column_postgrest.rst
+++ b/docs/datanommer_datagrepper/pg_array_column_postgrest.rst
@ -4,64 +4,74 @@ Using the array type for user and package queries
 Currently, we use auxiliary tables to query for messages related to packages or users,
 in the standard RDBS fashion.

-We came to some problems when trying to enforce foreign key constrains while using the timescaledb
-extension. We decided to try, if just using a column with array type with proper indes would have simmilar performace.
+We came to some problems when trying to enforce foreign key constrains while using the
+timescaledb extension. We decided to try, if just using a column with array type with
+proper indes would have simmilar performace.

-Array columns support indexing with Generalized Inverted Index, GIN,
-that allows for fast searches on membership and intersection. Because we mostly search for memebership,
+Array columns support indexing with Generalized Inverted Index, GIN, that allows for
+fast searches on membership and intersection. Because we mostly search for memebership,
 array column could be performant enough for our purposes.

 Resources
 ---------

-* PG 12 Array type: https://www.postgresql.org/docs/12/arrays.html
-* GIN index: https://www.postgresql.org/docs/12/gin.html
-* GIN operators for BTREE: https://www.postgresql.org/docs/11/btree-gin.html
-
+- PG 12 Array type: https://www.postgresql.org/docs/12/arrays.html
+- GIN index: https://www.postgresql.org/docs/12/gin.html
+- GIN operators for BTREE: https://www.postgresql.org/docs/11/btree-gin.html

 Installing/enabling/activating
 ------------------------------

-To have comparable results, we enabled timescaledb in same fashion as in our other experiment.
+To have comparable results, we enabled timescaledb in same fashion as in our other
+experiment.

 To add new column
-::

-  alter table messages2 add column packages text[];
+.. code-block::
+
+    alter table messages2 add column packages text[];

 To populate it
-::

-  update messages2 set packages=t_agg.p_agg from
-    (select msg, array_agg(package) as p_agg from package_messages group by msg) as t_agg where messages.id=t_agg.msg;
+.. code-block::

-We need to enable the btree_gin extension to be able to create index with array as well as timestamp
-::
+    update messages2 set packages=t_agg.p_agg from
+      (select msg, array_agg(package) as p_agg from package_messages group by msg) as t_agg where messages.id=t_agg.msg;

-  CREATE EXTENSION btree_gin;
+We need to enable the btree_gin extension to be able to create index with array as well
+as timestamp
+
+.. code-block::
+
+    CREATE EXTENSION btree_gin;

 To create the index
-::
+
+.. code-block::

    CREATE INDEX idx_msg_user on "messages2" USING GIN ("timestamp", "packages");

 To help reuse our testing script, we setup postgrest locally
-::

-  podman run --rm --net=host -p 3000:3000   -e PGRST_DB_URI=$DBURI -e PGRST_DB_ANON_ROLE="datagrepper" -e PGRST_MAX_ROWS=25   postgrest/postgrest:v7.0.
+.. code-block::

-Because we focused only on package queries, as user colun couldn't be populated due to constraints on size,
-we chose two as representative. There is implicit limit to return just 25 rows.
+    podman run --rm --net=host -p 3000:3000   -e PGRST_DB_URI=$DBURI -e PGRST_DB_ANON_ROLE="datagrepper" -e PGRST_MAX_ROWS=25   postgrest/postgrest:v7.0.
+
+Because we focused only on package queries, as user colun couldn't be populated due to
+constraints on size, we chose two as representative. There is implicit limit to return
+just 25 rows.

 A simple membership:
-::

-  /messages_ts?packages=ov.{{kernel}}
+.. code-block::
+
+    /messages_ts?packages=ov.{{kernel}}

 A simple membership ordered by time.
-::

-  /messages_ts?order=timestamp.desc&packages=ov.{{kernel}}
+.. code-block::
+
+    /messages_ts?order=timestamp.desc&packages=ov.{{kernel}}

 Findings
 --------
@ -69,36 +79,40 @@ Findings
 Querying just the package membership
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-The queries were surprisingly fast, with maximum under 4 seconds and
-mean around half a second. This encouraged us to do further experiments.
+The queries were surprisingly fast, with maximum under 4 seconds and mean around half a
+second. This encouraged us to do further experiments.

-Results ::
+Results

-    test_filter_by_package
-  Requests: 300, pass: 300, fail: 0, exception: 0
-  For pass requests:
-  Request per Second - mean: 3.63
-  Time per Request   - mean: 0.522946, min: 0.000000, max: 3.907548
+.. code-block::
+
+      test_filter_by_package
+    Requests: 300, pass: 300, fail: 0, exception: 0
+    For pass requests:
+    Request per Second - mean: 3.63
+    Time per Request   - mean: 0.522946, min: 0.000000, max: 3.907548

 Querying just the package membership ordered by timestamp desc
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-Usually we want to see most recent messages. So we ammended the query,
-to include "order by timestamp desc". The result was less encouraging,
-with longest succesful query taking more than 90 seconds and several timing out.
+Usually we want to see most recent messages. So we ammended the query, to include "order
+by timestamp desc". The result was less encouraging, with longest succesful query taking
+more than 90 seconds and several timing out.

 This seems to be the result of GIN index not supporting order in the index.

-Results ::
+Results

-    test_filter_by_package
-  Requests: 300, pass: 280, fail: 0, exception: 20
-  For pass requests:
-  Request per Second - mean: 0.53
-  Time per Request   - mean: 7.474040, min: 0.000000, max: 99.880939
+.. code-block::
+
+      test_filter_by_package
+    Requests: 300, pass: 280, fail: 0, exception: 20
+    For pass requests:
+    Request per Second - mean: 0.53
+    Time per Request   - mean: 7.474040, min: 0.000000, max: 99.880939

 Conclusion
 ----------

-While array support seems interesting, and for simple queries very fast, indexes that require ordering
-don't seem to be supported. This makes strong case against using them.
+While array support seems interesting, and for simple queries very fast, indexes that
+require ordering don't seem to be supported. This makes strong case against using them.
--- a/docs/datanommer_datagrepper/pg_partitioning.rst
+++ b/docs/datanommer_datagrepper/pg_partitioning.rst
@ -1,60 +1,55 @@
 Partitioning the database
 =========================

-In the database used by datanommer and datagrepper one table stands out from the
-other ones by its size, the ``messages`` table. This can be observed in
-:ref:`datanommer`.
+In the database used by datanommer and datagrepper one table stands out from the other
+ones by its size, the ``messages`` table. This can be observed in :ref:`datanommer`.

-One possibility to speed things up in datagrepper is to partition that table
-into a set of smaller sized partitions.
+One possibility to speed things up in datagrepper is to partition that table into a set
+of smaller sized partitions.

 Here are some resources regarding partitioning postgresql tables:

-* Table partitioning at postgresql's documentation: https://www.postgresql.org/docs/13/ddl-partitioning.html
-* How to use table partitioning to scale PostgreSQL: https://www.enterprisedb.com/postgres-tutorials/how-use-table-partitioning-scale-postgresql
-* Definition of PostgreSQL Partition: https://www.educba.com/postgresql-partition/
-
+- Table partitioning at postgresql's documentation:
+  https://www.postgresql.org/docs/13/ddl-partitioning.html
+- How to use table partitioning to scale PostgreSQL:
+  https://www.enterprisedb.com/postgres-tutorials/how-use-table-partitioning-scale-postgresql
+- Definition of PostgreSQL Partition: https://www.educba.com/postgresql-partition/

 Attempt #1
 ----------

 For our first attempt at partitioning the `messages` table, we thought we would
-partition it by year. Having a different partition for each year.
-We thus started by adding a ``year`` field to the table and fill it by extracting
-the year from the ``timestamp`` field of the table.
+partition it by year. Having a different partition for each year. We thus started by
+adding a ``year`` field to the table and fill it by extracting the year from the
+``timestamp`` field of the table.

-However, one thing to realize when using partitioned table is that each partition
-needs to be considered as an independant table. Meaning an unique constraint has
-to involve the field on which the table is partitioned.
-In other words, if you partition the table by a year field, that year field will
-need to be part of the primary key as well as any ``UNIQUE`` constraint on the
-table.
-
-So to partition the `messages` table on ``year``, we had to add the ``year``
-field to the primary key. However, that broke the foreign key constraints on
-the ``user_messages`` and ``package_messages`` tables which rely on the ``id``
-field to link the tables.
+However, one thing to realize when using partitioned table is that each partition needs
+to be considered as an independant table. Meaning an unique constraint has to involve
+the field on which the table is partitioned. In other words, if you partition the table
+by a year field, that year field will need to be part of the primary key as well as any
+``UNIQUE`` constraint on the table.

+So to partition the `messages` table on ``year``, we had to add the ``year`` field to
+the primary key. However, that broke the foreign key constraints on the
+``user_messages`` and ``package_messages`` tables which rely on the ``id`` field to link
+the tables.

 Attempt #2
 ----------

-Since partitioning on ``year`` did not work, we reconsidered and decided to
-partition on the ``id`` field instead using `RANGE PARTITION`.
-
-We partitioned the ``messages`` table on the ``id`` field with partition of 10
-million records each. This has the advantage of making each partition of similar
-sizes.
-
-
+Since partitioning on ``year`` did not work, we reconsidered and decided to partition on
+the ``id`` field instead using `RANGE PARTITION`.

+We partitioned the ``messages`` table on the ``id`` field with partition of 10 million
+records each. This has the advantage of making each partition of similar sizes.

 More resources
 --------------

 These are a few more resources we looked at and thought were worth bookmarking:

-* Automatic partitioning by day - PostgreSQL: https://stackoverflow.com/questions/55642326/
-* pg_partman, partition manager: https://github.com/pgpartman/pg_partman
-* How to scale PostgreSQL 10 using table inheritance and declarative partitioning: https://blog.timescale.com/blog/scaling-partitioning-data-postgresql-10-explained-cd48a712a9a1/
-
+- Automatic partitioning by day - PostgreSQL:
+  https://stackoverflow.com/questions/55642326/
+- pg_partman, partition manager: https://github.com/pgpartman/pg_partman
+- How to scale PostgreSQL 10 using table inheritance and declarative partitioning:
+  https://blog.timescale.com/blog/scaling-partitioning-data-postgresql-10-explained-cd48a712a9a1/
--- a/docs/datanommer_datagrepper/pg_stat_statements.rst
+++ b/docs/datanommer_datagrepper/pg_stat_statements.rst
@ -1,14 +1,16 @@
 Postgresql's pg_stat_statements
 ===============================

-This is a postgresql module allowing to track planning and execution statistics
-of all SQL statements executed by a server.
+This is a postgresql module allowing to track planning and execution statistics of all
+SQL statements executed by a server.

-Using this, we can monitor/figure out what the slowest queries executed
-on the server are.
+Using this, we can monitor/figure out what the slowest queries executed on the server
+are.

 Resources:

-* Postgresql doc: https://www.postgresql.org/docs/13/pgstatstatements.html
-* How to enable it: https://www.virtual-dba.com/postgresql-performance-enabling-pg-stat-statements/
-* How to use it: https://www.virtual-dba.com/postgresql-performance-identifying-hot-and-slow-queries/
+- Postgresql doc: https://www.postgresql.org/docs/13/pgstatstatements.html
+- How to enable it:
+  https://www.virtual-dba.com/postgresql-performance-enabling-pg-stat-statements/
+- How to use it:
+  https://www.virtual-dba.com/postgresql-performance-identifying-hot-and-slow-queries/
--- a/docs/datanommer_datagrepper/pg_timescaledb.rst
+++ b/docs/datanommer_datagrepper/pg_timescaledb.rst
@ -2,61 +2,64 @@ Using the timescaledb extension
 ===============================

 timescaledb (https://docs.timescale.com/latest/) is a postgresql extension for
-time-series database.
-Considering a lot of the actions done on datagrepper involve the timestamp field
-(for example: all the messages with that topic in this time range), we figured
-this extension is worth investigating.
-
-A bonus point being for this extension to already packaged and available in
-Fedora and EPEL.
+time-series database. Considering a lot of the actions done on datagrepper involve the
+timestamp field (for example: all the messages with that topic in this time range), we
+figured this extension is worth investigating.

+A bonus point being for this extension to already packaged and available in Fedora and
+EPEL.

 Resources
 ---------

-* Setting up/enabling timescaledb: https://severalnines.com/database-blog/how-enable-timescaledb-existing-postgresql-database
-* Migrating an existing database to timescaledb: https://docs.timescale.com/latest/getting-started/migrating-data#same-db
-
+- Setting up/enabling timescaledb:
+  https://severalnines.com/database-blog/how-enable-timescaledb-existing-postgresql-database
+- Migrating an existing database to timescaledb:
+  https://docs.timescale.com/latest/getting-started/migrating-data#same-db

 Installing/enabling/activating
 ------------------------------

 To install the plugin, simply run:
-::
+
+.. code-block::

    dnf install timescaledb

 The edit ``/var/lib/pgsql/data/postgresql.conf`` to tell postgresql to load it:
-::
+
+.. code-block::

    shared_preload_libraries = 'pg_stat_statements,timescaledb'
    timescaledb.max_background_workers=4

-
 It will then need a restart of the entire database server:
-::
+
+.. code-block::

    systemctl restart postgresql

 You can then check if the extension loaded properly:
-::
+
+.. code-block::

    $ sudo -u postgres psql
    SELECT * FROM pg_available_extensions ORDER BY name;

 Then, you will need to activate it for your database:
-::
+
+.. code-block::

    $ sudo -u postgres psql <database_name>
    CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;

 Finally, you can check that the extension was activated for your database:
-::
+
+.. code-block::

    $ sudo -u postgres psql <database_name>
    \dx

-
 .. _timescaledb_findings:

 Findings
@ -66,10 +69,8 @@ Partitioned table
 ~~~~~~~~~~~~~~~~~

 After converting the `messages` table to use timescaledb, we've realized that
-timescaledb uses table partitioning as well.
-This leads to the same issue with the foreign key constraints that we have seen
-in the plain partitioning approach we took.
-
+timescaledb uses table partitioning as well. This leads to the same issue with the
+foreign key constraints that we have seen in the plain partitioning approach we took.

 Foreign key considerations
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -78,25 +79,25 @@ For a better understanding on the challenges we've encountered with foreign key
 constraints, here is a graphical representation of the datanommer database:

 .. image:: ../_static/datanommer_db.jpeg
-        :target: ../_images/datanommer_db.jpeg
+    :target: ../_images/datanommer_db.jpeg

 So here are the issues we've faced:

-* To make the `messages` table a hypertable (ie: activate the timescaledb plugin
-  on it), the tables need to be empty and the data imported in a second step.
-* Once the `messages` table is a hypertable, we cannot add foreign key constraints
-  from the `user_messages` or `package_messages` tables to it. It is just not
-  supported in timescaledb (cf https://docs.timescale.com/latest/using-timescaledb/schema-management#constraints )
-* We tried creating the foreign key constraints before making the `messages` table
-  a hypertable and then importing the data in (tweaking the primary key and
-  foreign keys to include the timestamp, following https://stackoverflow.com/questions/64570143/ )
+- To make the `messages` table a hypertable (ie: activate the timescaledb plugin on it),
+  the tables need to be empty and the data imported in a second step.
+- Once the `messages` table is a hypertable, we cannot add foreign key constraints from
+  the `user_messages` or `package_messages` tables to it. It is just not supported in
+  timescaledb (cf
+  https://docs.timescale.com/latest/using-timescaledb/schema-management#constraints )
+- We tried creating the foreign key constraints before making the `messages` table a
+  hypertable and then importing the data in (tweaking the primary key and foreign keys
+  to include the timestamp, following https://stackoverflow.com/questions/64570143/ )
  but that resulted in an error when importing the data.

-So we ended up with: Keep the same data structure but to not enforce the foreign
-key constaints on `user_messages` and `package_messages` to `messages`. As that
-database is mostly about inserts and has no updates or deletes, we don't foresee
-much problems with this.
-
+So we ended up with: Keep the same data structure but to not enforce the foreign key
+constaints on `user_messages` and `package_messages` to `messages`. As that database is
+mostly about inserts and has no updates or deletes, we don't foresee much problems with
+this.

 Duplicated messages
 ~~~~~~~~~~~~~~~~~~~
@ -105,31 +106,29 @@ When testing datagrepper and datanommer in our test instance with the timescaled
 plugin, we saw a number of duplicated messages showing up in the `/raw` endpoint.
 Checking if we could fix this server side, we found out that the previous database
 schema had an `UNIQUE` constraint on `msg_id` field. However, with the timescaledb
-plugin, that constraint is now on both `msg_id` and `timestamp` fields, meaning
-a message can be inserted twice in the database if there is a little delay between
-the two inserts.
-
-However, migrating datanommer from fedmsg to fedora-messaging should resolve that
-issue client side as rabbitmq will ensure there is only one consumer at a time
-handling a message.
+plugin, that constraint is now on both `msg_id` and `timestamp` fields, meaning a
+message can be inserted twice in the database if there is a little delay between the two
+inserts.

+However, migrating datanommer from fedmsg to fedora-messaging should resolve that issue
+client side as rabbitmq will ensure there is only one consumer at a time handling a
+message.

 Open questions
 --------------

-* How will upgrading the postgresql version with the timescaledb plugin look like?
-
-It looks like the timescaledb folks are involved enough in postgresql itself that
-we think things will work, but we have not had on-hands experience with it.
+- How will upgrading the postgresql version with the timescaledb plugin look like?

+It looks like the timescaledb folks are involved enough in postgresql itself that we
+think things will work, but we have not had on-hands experience with it.

 Patch
 -----

-Here is the patch that needs to be applied to ``datanommer/models/__init__.py``
-to get it working with timescaledb's adjusted postgresql model.
+Here is the patch that needs to be applied to ``datanommer/models/__init__.py`` to get
+it working with timescaledb's adjusted postgresql model.

-::
+.. code-block::

    diff --git a/datanommer.models/datanommer/models/__init__.py b/datanommer.models/datanommer/models/__init__.py
    index ada58fa..7780433 100644
--- a/docs/datanommer_datagrepper/stats.rst
+++ b/docs/datanommer_datagrepper/stats.rst
@ -1,8 +1,8 @@
 Lies, Damn lies and Statistics
 ==============================

-In order to compare the performances of datagrepper in the different configuration
-we looked at, we wrote a small script that runs 30 requests in 10 parallel threads.
+In order to compare the performances of datagrepper in the different configuration we
+looked at, we wrote a small script that runs 30 requests in 10 parallel threads.

 These requests are:

@ -15,22 +15,17 @@ These requests are:

 We have then 4 different environments:

- prod/openshift: this is an openshift deployment of datagrepper hitting the
-  production database, without any configuration change.
-
- prod/aws: this is an AWS deployment of datagrepper, hitting its own local
-  database, with the ``DEFAULT_QUERY_DELTA`` configuration key set to 3 days.
-
- partition/aws: this is an AWS deployment of datagrepper, hitting its own
-  local postgresql database where the ``messages`` table is partition by ``id``
-  with each partition having 10 million records and the ``DEFAULT_QUERY_DELTA``
-  configuration key set to 3 days.
-
- timescaledb/aws: this is an AWS deployment of datagrepper, hitting its own
-  local postgresql database where the ``messages`` table as been partition via
-  the `timescaledb` plugin and the ``DEFAULT_QUERY_DELTA`` configuration key set
-  to 3 days.
-
+- prod/openshift: this is an openshift deployment of datagrepper hitting the production
+  database, without any configuration change.
+- prod/aws: this is an AWS deployment of datagrepper, hitting its own local database,
+  with the ``DEFAULT_QUERY_DELTA`` configuration key set to 3 days.
+- partition/aws: this is an AWS deployment of datagrepper, hitting its own local
+  postgresql database where the ``messages`` table is partition by ``id`` with each
+  partition having 10 million records and the ``DEFAULT_QUERY_DELTA`` configuration key
+  set to 3 days.
+- timescaledb/aws: this is an AWS deployment of datagrepper, hitting its own local
+  postgresql database where the ``messages`` table as been partition via the
+  `timescaledb` plugin and the ``DEFAULT_QUERY_DELTA`` configuration key set to 3 days.

 Results
 -------
@ -40,82 +35,54 @@ Here are the results for each environment and request.
 prod/openshift
 ~~~~~~~~~~~~~~

-+--------------------+------------------+-------------------+------------------+-----------------+
-|                    | Requests per sec | Mean time per Req | Max time per Req | Percent success |
-+====================+==================+===================+==================+=================+
-| filter_by_topic    |  0.32            |  NA               | 45.857601        |   0.00%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-| plain_raw          |  0.32            |  NA               | 31.955371        |   0.00%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-| filter_by_category |  0.32            |  NA               | 31.632514        |   0.00%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-| filter_by_username |  0.32            |  NA               | 33.549061        |   0.00%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-| filter_by_package  |  0.32            |  NA               | 34.531207        |   0.00%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-| get_by_id          |  1.57            |  1.575608         | 31.259095        |  86.67%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-
+================== ==== ======== ========= ======
+================== ==== ======== ========= ======
+filter_by_topic    0.32 NA       45.857601 0.00%
+plain_raw          0.32 NA       31.955371 0.00%
+filter_by_category 0.32 NA       31.632514 0.00%
+filter_by_username 0.32 NA       33.549061 0.00%
+filter_by_package  0.32 NA       34.531207 0.00%
+get_by_id          1.57 1.575608 31.259095 86.67%
+================== ==== ======== ========= ======

 prod/aws
 ~~~~~~~~

-+--------------------+------------------+-------------------+------------------+-----------------+
-|                    | Requests per sec | Mean time per Req | Max time per Req | Percent success |
-+====================+==================+===================+==================+=================+
-| filter_by_topic    |  7.6             |  1.0068           |  11.2743         | 100.00%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-| plain_raw          |  9.06            |  0.712975         |   3.323922       | 100.00%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-| filter_by_category | 12.43            |  0.489915         |   1.676223       | 100.00%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-| filter_by_username |  1.49            |  5.83623          |  10.661274       | 100.00%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-| filter_by_package  |  0               | 52.69256          | 120.229874       |    1.00%        |
-+--------------------+------------------+-------------------+------------------+-----------------+
-| get_by_id          |  0.73            |  1.534168         |  60.455334       |  83.33%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-
+================== ===== ======== ========== =======
+================== ===== ======== ========== =======
+filter_by_topic    7.6   1.0068   11.2743    100.00%
+plain_raw          9.06  0.712975 3.323922   100.00%
+filter_by_category 12.43 0.489915 1.676223   100.00%
+filter_by_username 1.49  5.83623  10.661274  100.00%
+filter_by_package  0     52.69256 120.229874 1.00%
+get_by_id          0.73  1.534168 60.455334  83.33%
+================== ===== ======== ========== =======

 partition/aws
 ~~~~~~~~~~~~~

-+--------------------+------------------+-------------------+------------------+-----------------+
-|                    | Requests per sec | Mean time per Req | Max time per Req | Percent success |
-+====================+==================+===================+==================+=================+
-| filter_by_topic    |  9.98            |  0.711219         |   3.204178       | 100.00%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-| plain_raw          |  9.70            |  0.641497         |   1.24704        | 100.00%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-| filter_by_category | 13.32            |  0.455219         |   0.594465       | 100.00%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-| filter_by_username |  1.3             |  7.084018         |  12.079198       | 100.00%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-| filter_by_package  |  0               | 55.231556         | 120.125013       |   1.00%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-| get_by_id          |  0.48            |  2.198211         |  60.444765       |  76.67%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-
+================== ===== ========= ========== =======
+================== ===== ========= ========== =======
+filter_by_topic    9.98  0.711219  3.204178   100.00%
+plain_raw          9.70  0.641497  1.24704    100.00%
+filter_by_category 13.32 0.455219  0.594465   100.00%
+filter_by_username 1.3   7.084018  12.079198  100.00%
+filter_by_package  0     55.231556 120.125013 1.00%
+get_by_id          0.48  2.198211  60.444765  76.67%
+================== ===== ========= ========== =======

 timescaledb/aws
 ~~~~~~~~~~~~~~~

-+--------------------+------------------+-------------------+------------------+-----------------+
-|                    | Requests per sec | Mean time per Req | Max time per Req | Percent success |
-+====================+==================+===================+==================+=================+
-| filter_by_topic    | 14.1             |  0.4286           |  0.514617        | 100.00%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-| plain_raw          | 12.89            |  0.48235          |  0.661073        | 100.00%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-| filter_by_category | 13.94            |  0.423172         |  0.507337        | 100.00%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-| filter_by_username |  2.68            |  3.188782         |  5.096244        | 100.00%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-| filter_by_package  |  0.26            | 33.216631         | 57.901159        | 100.00%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-| get_by_id          | 12.69            |  0.749068         |  1.73515         | 100.00%         |
-+--------------------+------------------+-------------------+------------------+-----------------+
-
+================== ===== ========= ========= =======
+================== ===== ========= ========= =======
+filter_by_topic    14.1  0.4286    0.514617  100.00%
+plain_raw          12.89 0.48235   0.661073  100.00%
+filter_by_category 13.94 0.423172  0.507337  100.00%
+filter_by_username 2.68  3.188782  5.096244  100.00%
+filter_by_package  0.26  33.216631 57.901159 100.00%
+get_by_id          12.69 0.749068  1.73515   100.00%
+================== ===== ========= ========= =======

 Graphs
 ------
@ -128,24 +95,20 @@ Percentage of success
 .. image:: ../_static/datanommer_percent_sucess.jpg
    :target: ../_images/datanommer_percent_sucess.jpg

-
 Requests per second
 ~~~~~~~~~~~~~~~~~~~

 .. image:: ../_static/datanommer_req_per_sec.jpg
    :target: ../_images/datanommer_req_per_sec.jpg

-
 Mean time per request
 ~~~~~~~~~~~~~~~~~~~~~

 .. image:: ../_static/datanommer_mean_per_req.jpg
    :target: ../_images/datanommer_mean_per_req.jpg

-
 Maximum time per request
 ~~~~~~~~~~~~~~~~~~~~~~~~

 .. image:: ../_static/datanommer_max_per_req.jpg
    :target: ../_images/datanommer_max_per_req.jpg
-