fix parsing errors and sphinx warnings
Signed-off-by: Ryan Lerch <rlerch@redhat.com>
This commit is contained in:
parent
8fb9b2fdf0
commit
ba720c3d77
98 changed files with 4799 additions and 4788 deletions
|
@ -3,15 +3,15 @@
|
|||
Datanommer
|
||||
==========
|
||||
|
||||
* Reads-in messages from the bus
|
||||
* Stores them into the database
|
||||
- Reads-in messages from the bus
|
||||
- Stores them into the database
|
||||
|
||||
Database tables
|
||||
---------------
|
||||
|
||||
Here is how the database schema looks like currently:
|
||||
|
||||
::
|
||||
.. code-block::
|
||||
|
||||
datanommer=# \dt
|
||||
List of relations
|
||||
|
@ -24,14 +24,12 @@ Here is how the database schema looks like currently:
|
|||
public | user | table | datanommer
|
||||
public | user_messages | table | datanommer
|
||||
|
||||
|
||||
Table sizes
|
||||
-----------
|
||||
|
||||
Here is the size of each table:
|
||||
|
||||
|
||||
::
|
||||
.. code-block::
|
||||
|
||||
datanommer-#
|
||||
SELECT
|
||||
|
@ -49,16 +47,15 @@ Here is the size of each table:
|
|||
alembic_version | 8192 bytes | 0 bytes
|
||||
(6 rows)
|
||||
|
||||
The 3 columns are:
|
||||
|
||||
The 3 columns are::
|
||||
.. code-block::
|
||||
|
||||
Table – The name of the table
|
||||
Size – The total size that this table takes
|
||||
External Size – The size that related objects of this table like indices take
|
||||
|
||||
|
||||
|
||||
::
|
||||
.. code-block::
|
||||
|
||||
datanommer=#
|
||||
SELECT
|
||||
|
@ -109,12 +106,15 @@ The 3 columns are::
|
|||
sql_features | r | 716 | 64 kB
|
||||
(37 rows)
|
||||
|
||||
The 4 columns are:
|
||||
|
||||
The 4 columns are::
|
||||
.. code-block::
|
||||
|
||||
objectname – The name of the object
|
||||
objecttype – r for the table, i for an index, t for toast data, ...
|
||||
#entries – The number of entries in the object (e.g. rows)
|
||||
size – The size of the object
|
||||
|
||||
(source for these queries: https://wiki-bsse.ethz.ch/display/ITDOC/Check+size+of+tables+and+objects+in+PostgreSQL+database )
|
||||
(source for these queries:
|
||||
https://wiki-bsse.ethz.ch/display/ITDOC/Check+size+of+tables+and+objects+in+PostgreSQL+database
|
||||
)
|
||||
|
|
|
@ -1,18 +1,17 @@
|
|||
Default delta
|
||||
=============
|
||||
|
||||
Checking the current status of datagrepper, we realized that not specifying a
|
||||
`delta` value in the URL led to timeouts while specifying one, makes datagrepper
|
||||
return properly.
|
||||
Checking the current status of datagrepper, we realized that not specifying a `delta`
|
||||
value in the URL led to timeouts while specifying one, makes datagrepper return
|
||||
properly.
|
||||
|
||||
Investigating the configuration options of datagrepper, we found out that
|
||||
there is a `DEFAULT_QUERY_DELTA` configuration key that allows to specify a
|
||||
default delta value when one is not specified.
|
||||
Investigating the configuration options of datagrepper, we found out that there is a
|
||||
`DEFAULT_QUERY_DELTA` configuration key that allows to specify a default delta value
|
||||
when one is not specified.
|
||||
|
||||
Just setting that configuration key to ``60*60*24*3`` (ie: 3 days) improves the
|
||||
datagrepper performances quite a bit (as in queries actually return instead of
|
||||
timing out).
|
||||
datagrepper performances quite a bit (as in queries actually return instead of timing
|
||||
out).
|
||||
|
||||
|
||||
That configuration change, does break the API a little bit as with it, it will
|
||||
limit the messages returned to the last 3 days.
|
||||
That configuration change, does break the API a little bit as with it, it will limit the
|
||||
messages returned to the last 3 days.
|
||||
|
|
|
@ -4,21 +4,18 @@ Datanommer / Datagrepper
|
|||
Datanommer
|
||||
----------
|
||||
|
||||
* Reads-in messages from the bus
|
||||
* Stores them into the database
|
||||
- Reads-in messages from the bus
|
||||
- Stores them into the database
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
datanommer
|
||||
|
||||
|
||||
Datagrepper
|
||||
-----------
|
||||
|
||||
* Exposes the messages in the database via an API with different filtering
|
||||
capacity
|
||||
|
||||
- Exposes the messages in the database via an API with different filtering capacity
|
||||
|
||||
Investigation
|
||||
-------------
|
||||
|
@ -35,49 +32,43 @@ Here is the list of ideas/things we looked at:
|
|||
pg_array_column_postgrest
|
||||
stats
|
||||
|
||||
|
||||
Conclusions
|
||||
-----------
|
||||
|
||||
We have investigated different ways to improve the database storing our 180
|
||||
millions messages. While we considered looking at the datagrepper application
|
||||
itself as well, we considered that replacing datagrepper with another application
|
||||
would have too large consequences. We have a number of applications in our
|
||||
realm that rely on datagrepper's API and there is an unknown number of applications
|
||||
outside our realm that make use of it as well.
|
||||
Breaking all of these applications is a non-goal for us. For this reason we
|
||||
We have investigated different ways to improve the database storing our 180 millions
|
||||
messages. While we considered looking at the datagrepper application itself as well, we
|
||||
considered that replacing datagrepper with another application would have too large
|
||||
consequences. We have a number of applications in our realm that rely on datagrepper's
|
||||
API and there is an unknown number of applications outside our realm that make use of it
|
||||
as well. Breaking all of these applications is a non-goal for us. For this reason we
|
||||
focused on postgresql first.
|
||||
|
||||
We looked at different solutions, starting with manually partitioning on year,
|
||||
then on ``id`` (not ``msg_id``, the primary key field ``id`` which is an integer).
|
||||
We then looked at using the postgresql plugin `timescaledb` and finally we looked
|
||||
at using this plugin together with a database model change where the relation
|
||||
tables are merged into the main ``messages`` table and their is stored using
|
||||
arrays.
|
||||
We looked at different solutions, starting with manually partitioning on year, then on
|
||||
``id`` (not ``msg_id``, the primary key field ``id`` which is an integer). We then
|
||||
looked at using the postgresql plugin `timescaledb` and finally we looked at using this
|
||||
plugin together with a database model change where the relation tables are merged into
|
||||
the main ``messages`` table and their is stored using arrays.
|
||||
|
||||
Based on our investigations, our recommendation is to migrate the postgresql
|
||||
database to use the `timescaledb` plugin and configure datagrepper to have a
|
||||
default delta value via ``DEFAULT_QUERY_DELTA``.
|
||||
Based on our investigations, our recommendation is to migrate the postgresql database to
|
||||
use the `timescaledb` plugin and configure datagrepper to have a default delta value via
|
||||
``DEFAULT_QUERY_DELTA``.
|
||||
|
||||
As a picture is worth a thousand words:
|
||||
|
||||
.. image:: ../_static/datanommer_percent_sucess.jpg
|
||||
:target: ../_images/datanommer_percent_sucess.jpg
|
||||
|
||||
|
||||
We checked, setting a ``DEFAULT_QUERY_DELTA`` alone provides already some
|
||||
performance gain, using `timescaledb` with ``DEFAULT_QUERY_DELTA`` provide the
|
||||
most gain but using `timescaledb` without ``DEFAULT_QUERY_DELTA`` brings back
|
||||
the time out issues we are seeing today when datagrepper is queried without a
|
||||
specified ``delta`` value.
|
||||
We checked, setting a ``DEFAULT_QUERY_DELTA`` alone provides already some performance
|
||||
gain, using `timescaledb` with ``DEFAULT_QUERY_DELTA`` provide the most gain but using
|
||||
`timescaledb` without ``DEFAULT_QUERY_DELTA`` brings back the time out issues we are
|
||||
seeing today when datagrepper is queried without a specified ``delta`` value.
|
||||
|
||||
We also believe that the performance gain observed with `timescaledb` could be
|
||||
reproduced if we were to do the partitioning ourself on the ``timestamp`` field
|
||||
of the ``messages`` table. However, it would mean that we have to manually
|
||||
maintain that partitioning, take care of creating the new partitions as needed
|
||||
and so on, while `timescaledb` provides all of this for us automatically, thus
|
||||
simplifying the long term maintenance of that database.
|
||||
|
||||
reproduced if we were to do the partitioning ourself on the ``timestamp`` field of the
|
||||
``messages`` table. However, it would mean that we have to manually maintain that
|
||||
partitioning, take care of creating the new partitions as needed and so on, while
|
||||
`timescaledb` provides all of this for us automatically, thus simplifying the long term
|
||||
maintenance of that database.
|
||||
|
||||
Proposed roadmap
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
@ -86,40 +77,34 @@ We propose the following roadmap to improve datanommer and datagrepper:
|
|||
|
||||
0/ Announce the upcoming API breakage and outage of datagrepper
|
||||
|
||||
Be loud about the upcoming changes and explain how the API breakage can be
|
||||
mitigated.
|
||||
|
||||
Be loud about the upcoming changes and explain how the API breakage can be mitigated.
|
||||
|
||||
1/ Port datanommer to fedora-messaging and openshift
|
||||
|
||||
This will ensure that there are no duplicate messages are saved in the database
|
||||
(cf our ref:`timescaledb_findings`).
|
||||
It will also provide a way to store the messages while datagrepper is being
|
||||
upgraded (which will require an outage). Using lazy queues in rabbitmq may be
|
||||
a way to store the high number of messages that will pile up during the outage
|
||||
window (which will be over 24h).
|
||||
This will ensure that there are no duplicate messages are saved in the database (cf our
|
||||
ref:`timescaledb_findings`). It will also provide a way to store the messages while
|
||||
datagrepper is being upgraded (which will require an outage). Using lazy queues in
|
||||
rabbitmq may be a way to store the high number of messages that will pile up during the
|
||||
outage window (which will be over 24h).
|
||||
|
||||
Rabbitmq lazy queues: https://www.rabbitmq.com/lazy-queues.html
|
||||
|
||||
|
||||
2/ Port datagrepper to timescaledb.
|
||||
|
||||
This will improve the performance of the UI. Thanks to rabbitmq, no messages will
|
||||
be lost, they will only show up in datagrepper at the end of the outage and
|
||||
with a delayed timestamp.
|
||||
This will improve the performance of the UI. Thanks to rabbitmq, no messages will be
|
||||
lost, they will only show up in datagrepper at the end of the outage and with a delayed
|
||||
timestamp.
|
||||
|
||||
3/ Configure datagrepper to have a ``DEFAULT_QUERY_DELTA``.
|
||||
|
||||
This will simply bound a number of queries which otherwise run slow and lead to
|
||||
timeouts at the application level.
|
||||
|
||||
This will simply bound a number of queries which otherwise run slow and lead to timeouts
|
||||
at the application level.
|
||||
|
||||
4/ Port datagrepper to openshift
|
||||
|
||||
This will make it easier to maintain and/or scale as needed.
|
||||
|
||||
|
||||
5/ Port datagrepper to fedora-messaging
|
||||
|
||||
This will allow to make use of the fedora-messaging schemas provided by the
|
||||
applications instead of relying on `fedmsg_meta_fedora_infrastructure`.
|
||||
This will allow to make use of the fedora-messaging schemas provided by the applications
|
||||
instead of relying on `fedmsg_meta_fedora_infrastructure`.
|
||||
|
|
|
@ -4,64 +4,74 @@ Using the array type for user and package queries
|
|||
Currently, we use auxiliary tables to query for messages related to packages or users,
|
||||
in the standard RDBS fashion.
|
||||
|
||||
We came to some problems when trying to enforce foreign key constrains while using the timescaledb
|
||||
extension. We decided to try, if just using a column with array type with proper indes would have simmilar performace.
|
||||
We came to some problems when trying to enforce foreign key constrains while using the
|
||||
timescaledb extension. We decided to try, if just using a column with array type with
|
||||
proper indes would have simmilar performace.
|
||||
|
||||
Array columns support indexing with Generalized Inverted Index, GIN,
|
||||
that allows for fast searches on membership and intersection. Because we mostly search for memebership,
|
||||
Array columns support indexing with Generalized Inverted Index, GIN, that allows for
|
||||
fast searches on membership and intersection. Because we mostly search for memebership,
|
||||
array column could be performant enough for our purposes.
|
||||
|
||||
Resources
|
||||
---------
|
||||
|
||||
* PG 12 Array type: https://www.postgresql.org/docs/12/arrays.html
|
||||
* GIN index: https://www.postgresql.org/docs/12/gin.html
|
||||
* GIN operators for BTREE: https://www.postgresql.org/docs/11/btree-gin.html
|
||||
|
||||
- PG 12 Array type: https://www.postgresql.org/docs/12/arrays.html
|
||||
- GIN index: https://www.postgresql.org/docs/12/gin.html
|
||||
- GIN operators for BTREE: https://www.postgresql.org/docs/11/btree-gin.html
|
||||
|
||||
Installing/enabling/activating
|
||||
------------------------------
|
||||
|
||||
To have comparable results, we enabled timescaledb in same fashion as in our other experiment.
|
||||
To have comparable results, we enabled timescaledb in same fashion as in our other
|
||||
experiment.
|
||||
|
||||
To add new column
|
||||
::
|
||||
|
||||
alter table messages2 add column packages text[];
|
||||
.. code-block::
|
||||
|
||||
alter table messages2 add column packages text[];
|
||||
|
||||
To populate it
|
||||
::
|
||||
|
||||
update messages2 set packages=t_agg.p_agg from
|
||||
(select msg, array_agg(package) as p_agg from package_messages group by msg) as t_agg where messages.id=t_agg.msg;
|
||||
.. code-block::
|
||||
|
||||
We need to enable the btree_gin extension to be able to create index with array as well as timestamp
|
||||
::
|
||||
update messages2 set packages=t_agg.p_agg from
|
||||
(select msg, array_agg(package) as p_agg from package_messages group by msg) as t_agg where messages.id=t_agg.msg;
|
||||
|
||||
CREATE EXTENSION btree_gin;
|
||||
We need to enable the btree_gin extension to be able to create index with array as well
|
||||
as timestamp
|
||||
|
||||
.. code-block::
|
||||
|
||||
CREATE EXTENSION btree_gin;
|
||||
|
||||
To create the index
|
||||
::
|
||||
|
||||
.. code-block::
|
||||
|
||||
CREATE INDEX idx_msg_user on "messages2" USING GIN ("timestamp", "packages");
|
||||
|
||||
To help reuse our testing script, we setup postgrest locally
|
||||
::
|
||||
|
||||
podman run --rm --net=host -p 3000:3000 -e PGRST_DB_URI=$DBURI -e PGRST_DB_ANON_ROLE="datagrepper" -e PGRST_MAX_ROWS=25 postgrest/postgrest:v7.0.
|
||||
.. code-block::
|
||||
|
||||
Because we focused only on package queries, as user colun couldn't be populated due to constraints on size,
|
||||
we chose two as representative. There is implicit limit to return just 25 rows.
|
||||
podman run --rm --net=host -p 3000:3000 -e PGRST_DB_URI=$DBURI -e PGRST_DB_ANON_ROLE="datagrepper" -e PGRST_MAX_ROWS=25 postgrest/postgrest:v7.0.
|
||||
|
||||
Because we focused only on package queries, as user colun couldn't be populated due to
|
||||
constraints on size, we chose two as representative. There is implicit limit to return
|
||||
just 25 rows.
|
||||
|
||||
A simple membership:
|
||||
::
|
||||
|
||||
/messages_ts?packages=ov.{{kernel}}
|
||||
.. code-block::
|
||||
|
||||
/messages_ts?packages=ov.{{kernel}}
|
||||
|
||||
A simple membership ordered by time.
|
||||
::
|
||||
|
||||
/messages_ts?order=timestamp.desc&packages=ov.{{kernel}}
|
||||
.. code-block::
|
||||
|
||||
/messages_ts?order=timestamp.desc&packages=ov.{{kernel}}
|
||||
|
||||
Findings
|
||||
--------
|
||||
|
@ -69,36 +79,40 @@ Findings
|
|||
Querying just the package membership
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The queries were surprisingly fast, with maximum under 4 seconds and
|
||||
mean around half a second. This encouraged us to do further experiments.
|
||||
The queries were surprisingly fast, with maximum under 4 seconds and mean around half a
|
||||
second. This encouraged us to do further experiments.
|
||||
|
||||
Results ::
|
||||
Results
|
||||
|
||||
test_filter_by_package
|
||||
Requests: 300, pass: 300, fail: 0, exception: 0
|
||||
For pass requests:
|
||||
Request per Second - mean: 3.63
|
||||
Time per Request - mean: 0.522946, min: 0.000000, max: 3.907548
|
||||
.. code-block::
|
||||
|
||||
test_filter_by_package
|
||||
Requests: 300, pass: 300, fail: 0, exception: 0
|
||||
For pass requests:
|
||||
Request per Second - mean: 3.63
|
||||
Time per Request - mean: 0.522946, min: 0.000000, max: 3.907548
|
||||
|
||||
Querying just the package membership ordered by timestamp desc
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Usually we want to see most recent messages. So we ammended the query,
|
||||
to include "order by timestamp desc". The result was less encouraging,
|
||||
with longest succesful query taking more than 90 seconds and several timing out.
|
||||
Usually we want to see most recent messages. So we ammended the query, to include "order
|
||||
by timestamp desc". The result was less encouraging, with longest succesful query taking
|
||||
more than 90 seconds and several timing out.
|
||||
|
||||
This seems to be the result of GIN index not supporting order in the index.
|
||||
|
||||
Results ::
|
||||
Results
|
||||
|
||||
test_filter_by_package
|
||||
Requests: 300, pass: 280, fail: 0, exception: 20
|
||||
For pass requests:
|
||||
Request per Second - mean: 0.53
|
||||
Time per Request - mean: 7.474040, min: 0.000000, max: 99.880939
|
||||
.. code-block::
|
||||
|
||||
test_filter_by_package
|
||||
Requests: 300, pass: 280, fail: 0, exception: 20
|
||||
For pass requests:
|
||||
Request per Second - mean: 0.53
|
||||
Time per Request - mean: 7.474040, min: 0.000000, max: 99.880939
|
||||
|
||||
Conclusion
|
||||
----------
|
||||
|
||||
While array support seems interesting, and for simple queries very fast, indexes that require ordering
|
||||
don't seem to be supported. This makes strong case against using them.
|
||||
While array support seems interesting, and for simple queries very fast, indexes that
|
||||
require ordering don't seem to be supported. This makes strong case against using them.
|
||||
|
|
|
@ -1,60 +1,55 @@
|
|||
Partitioning the database
|
||||
=========================
|
||||
|
||||
In the database used by datanommer and datagrepper one table stands out from the
|
||||
other ones by its size, the ``messages`` table. This can be observed in
|
||||
:ref:`datanommer`.
|
||||
In the database used by datanommer and datagrepper one table stands out from the other
|
||||
ones by its size, the ``messages`` table. This can be observed in :ref:`datanommer`.
|
||||
|
||||
One possibility to speed things up in datagrepper is to partition that table
|
||||
into a set of smaller sized partitions.
|
||||
One possibility to speed things up in datagrepper is to partition that table into a set
|
||||
of smaller sized partitions.
|
||||
|
||||
Here are some resources regarding partitioning postgresql tables:
|
||||
|
||||
* Table partitioning at postgresql's documentation: https://www.postgresql.org/docs/13/ddl-partitioning.html
|
||||
* How to use table partitioning to scale PostgreSQL: https://www.enterprisedb.com/postgres-tutorials/how-use-table-partitioning-scale-postgresql
|
||||
* Definition of PostgreSQL Partition: https://www.educba.com/postgresql-partition/
|
||||
|
||||
- Table partitioning at postgresql's documentation:
|
||||
https://www.postgresql.org/docs/13/ddl-partitioning.html
|
||||
- How to use table partitioning to scale PostgreSQL:
|
||||
https://www.enterprisedb.com/postgres-tutorials/how-use-table-partitioning-scale-postgresql
|
||||
- Definition of PostgreSQL Partition: https://www.educba.com/postgresql-partition/
|
||||
|
||||
Attempt #1
|
||||
----------
|
||||
|
||||
For our first attempt at partitioning the `messages` table, we thought we would
|
||||
partition it by year. Having a different partition for each year.
|
||||
We thus started by adding a ``year`` field to the table and fill it by extracting
|
||||
the year from the ``timestamp`` field of the table.
|
||||
partition it by year. Having a different partition for each year. We thus started by
|
||||
adding a ``year`` field to the table and fill it by extracting the year from the
|
||||
``timestamp`` field of the table.
|
||||
|
||||
However, one thing to realize when using partitioned table is that each partition
|
||||
needs to be considered as an independant table. Meaning an unique constraint has
|
||||
to involve the field on which the table is partitioned.
|
||||
In other words, if you partition the table by a year field, that year field will
|
||||
need to be part of the primary key as well as any ``UNIQUE`` constraint on the
|
||||
table.
|
||||
|
||||
So to partition the `messages` table on ``year``, we had to add the ``year``
|
||||
field to the primary key. However, that broke the foreign key constraints on
|
||||
the ``user_messages`` and ``package_messages`` tables which rely on the ``id``
|
||||
field to link the tables.
|
||||
However, one thing to realize when using partitioned table is that each partition needs
|
||||
to be considered as an independant table. Meaning an unique constraint has to involve
|
||||
the field on which the table is partitioned. In other words, if you partition the table
|
||||
by a year field, that year field will need to be part of the primary key as well as any
|
||||
``UNIQUE`` constraint on the table.
|
||||
|
||||
So to partition the `messages` table on ``year``, we had to add the ``year`` field to
|
||||
the primary key. However, that broke the foreign key constraints on the
|
||||
``user_messages`` and ``package_messages`` tables which rely on the ``id`` field to link
|
||||
the tables.
|
||||
|
||||
Attempt #2
|
||||
----------
|
||||
|
||||
Since partitioning on ``year`` did not work, we reconsidered and decided to
|
||||
partition on the ``id`` field instead using `RANGE PARTITION`.
|
||||
|
||||
We partitioned the ``messages`` table on the ``id`` field with partition of 10
|
||||
million records each. This has the advantage of making each partition of similar
|
||||
sizes.
|
||||
|
||||
|
||||
Since partitioning on ``year`` did not work, we reconsidered and decided to partition on
|
||||
the ``id`` field instead using `RANGE PARTITION`.
|
||||
|
||||
We partitioned the ``messages`` table on the ``id`` field with partition of 10 million
|
||||
records each. This has the advantage of making each partition of similar sizes.
|
||||
|
||||
More resources
|
||||
--------------
|
||||
|
||||
These are a few more resources we looked at and thought were worth bookmarking:
|
||||
|
||||
* Automatic partitioning by day - PostgreSQL: https://stackoverflow.com/questions/55642326/
|
||||
* pg_partman, partition manager: https://github.com/pgpartman/pg_partman
|
||||
* How to scale PostgreSQL 10 using table inheritance and declarative partitioning: https://blog.timescale.com/blog/scaling-partitioning-data-postgresql-10-explained-cd48a712a9a1/
|
||||
|
||||
- Automatic partitioning by day - PostgreSQL:
|
||||
https://stackoverflow.com/questions/55642326/
|
||||
- pg_partman, partition manager: https://github.com/pgpartman/pg_partman
|
||||
- How to scale PostgreSQL 10 using table inheritance and declarative partitioning:
|
||||
https://blog.timescale.com/blog/scaling-partitioning-data-postgresql-10-explained-cd48a712a9a1/
|
||||
|
|
|
@ -1,14 +1,16 @@
|
|||
Postgresql's pg_stat_statements
|
||||
===============================
|
||||
|
||||
This is a postgresql module allowing to track planning and execution statistics
|
||||
of all SQL statements executed by a server.
|
||||
This is a postgresql module allowing to track planning and execution statistics of all
|
||||
SQL statements executed by a server.
|
||||
|
||||
Using this, we can monitor/figure out what the slowest queries executed
|
||||
on the server are.
|
||||
Using this, we can monitor/figure out what the slowest queries executed on the server
|
||||
are.
|
||||
|
||||
Resources:
|
||||
|
||||
* Postgresql doc: https://www.postgresql.org/docs/13/pgstatstatements.html
|
||||
* How to enable it: https://www.virtual-dba.com/postgresql-performance-enabling-pg-stat-statements/
|
||||
* How to use it: https://www.virtual-dba.com/postgresql-performance-identifying-hot-and-slow-queries/
|
||||
- Postgresql doc: https://www.postgresql.org/docs/13/pgstatstatements.html
|
||||
- How to enable it:
|
||||
https://www.virtual-dba.com/postgresql-performance-enabling-pg-stat-statements/
|
||||
- How to use it:
|
||||
https://www.virtual-dba.com/postgresql-performance-identifying-hot-and-slow-queries/
|
||||
|
|
|
@ -2,61 +2,64 @@ Using the timescaledb extension
|
|||
===============================
|
||||
|
||||
timescaledb (https://docs.timescale.com/latest/) is a postgresql extension for
|
||||
time-series database.
|
||||
Considering a lot of the actions done on datagrepper involve the timestamp field
|
||||
(for example: all the messages with that topic in this time range), we figured
|
||||
this extension is worth investigating.
|
||||
|
||||
A bonus point being for this extension to already packaged and available in
|
||||
Fedora and EPEL.
|
||||
time-series database. Considering a lot of the actions done on datagrepper involve the
|
||||
timestamp field (for example: all the messages with that topic in this time range), we
|
||||
figured this extension is worth investigating.
|
||||
|
||||
A bonus point being for this extension to already packaged and available in Fedora and
|
||||
EPEL.
|
||||
|
||||
Resources
|
||||
---------
|
||||
|
||||
* Setting up/enabling timescaledb: https://severalnines.com/database-blog/how-enable-timescaledb-existing-postgresql-database
|
||||
* Migrating an existing database to timescaledb: https://docs.timescale.com/latest/getting-started/migrating-data#same-db
|
||||
|
||||
- Setting up/enabling timescaledb:
|
||||
https://severalnines.com/database-blog/how-enable-timescaledb-existing-postgresql-database
|
||||
- Migrating an existing database to timescaledb:
|
||||
https://docs.timescale.com/latest/getting-started/migrating-data#same-db
|
||||
|
||||
Installing/enabling/activating
|
||||
------------------------------
|
||||
|
||||
To install the plugin, simply run:
|
||||
::
|
||||
|
||||
.. code-block::
|
||||
|
||||
dnf install timescaledb
|
||||
|
||||
The edit ``/var/lib/pgsql/data/postgresql.conf`` to tell postgresql to load it:
|
||||
::
|
||||
|
||||
.. code-block::
|
||||
|
||||
shared_preload_libraries = 'pg_stat_statements,timescaledb'
|
||||
timescaledb.max_background_workers=4
|
||||
|
||||
|
||||
It will then need a restart of the entire database server:
|
||||
::
|
||||
|
||||
.. code-block::
|
||||
|
||||
systemctl restart postgresql
|
||||
|
||||
You can then check if the extension loaded properly:
|
||||
::
|
||||
|
||||
.. code-block::
|
||||
|
||||
$ sudo -u postgres psql
|
||||
SELECT * FROM pg_available_extensions ORDER BY name;
|
||||
|
||||
Then, you will need to activate it for your database:
|
||||
::
|
||||
|
||||
.. code-block::
|
||||
|
||||
$ sudo -u postgres psql <database_name>
|
||||
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
|
||||
|
||||
Finally, you can check that the extension was activated for your database:
|
||||
::
|
||||
|
||||
.. code-block::
|
||||
|
||||
$ sudo -u postgres psql <database_name>
|
||||
\dx
|
||||
|
||||
|
||||
.. _timescaledb_findings:
|
||||
|
||||
Findings
|
||||
|
@ -66,10 +69,8 @@ Partitioned table
|
|||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
After converting the `messages` table to use timescaledb, we've realized that
|
||||
timescaledb uses table partitioning as well.
|
||||
This leads to the same issue with the foreign key constraints that we have seen
|
||||
in the plain partitioning approach we took.
|
||||
|
||||
timescaledb uses table partitioning as well. This leads to the same issue with the
|
||||
foreign key constraints that we have seen in the plain partitioning approach we took.
|
||||
|
||||
Foreign key considerations
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
@ -78,25 +79,25 @@ For a better understanding on the challenges we've encountered with foreign key
|
|||
constraints, here is a graphical representation of the datanommer database:
|
||||
|
||||
.. image:: ../_static/datanommer_db.jpeg
|
||||
:target: ../_images/datanommer_db.jpeg
|
||||
:target: ../_images/datanommer_db.jpeg
|
||||
|
||||
So here are the issues we've faced:
|
||||
|
||||
* To make the `messages` table a hypertable (ie: activate the timescaledb plugin
|
||||
on it), the tables need to be empty and the data imported in a second step.
|
||||
* Once the `messages` table is a hypertable, we cannot add foreign key constraints
|
||||
from the `user_messages` or `package_messages` tables to it. It is just not
|
||||
supported in timescaledb (cf https://docs.timescale.com/latest/using-timescaledb/schema-management#constraints )
|
||||
* We tried creating the foreign key constraints before making the `messages` table
|
||||
a hypertable and then importing the data in (tweaking the primary key and
|
||||
foreign keys to include the timestamp, following https://stackoverflow.com/questions/64570143/ )
|
||||
- To make the `messages` table a hypertable (ie: activate the timescaledb plugin on it),
|
||||
the tables need to be empty and the data imported in a second step.
|
||||
- Once the `messages` table is a hypertable, we cannot add foreign key constraints from
|
||||
the `user_messages` or `package_messages` tables to it. It is just not supported in
|
||||
timescaledb (cf
|
||||
https://docs.timescale.com/latest/using-timescaledb/schema-management#constraints )
|
||||
- We tried creating the foreign key constraints before making the `messages` table a
|
||||
hypertable and then importing the data in (tweaking the primary key and foreign keys
|
||||
to include the timestamp, following https://stackoverflow.com/questions/64570143/ )
|
||||
but that resulted in an error when importing the data.
|
||||
|
||||
So we ended up with: Keep the same data structure but to not enforce the foreign
|
||||
key constaints on `user_messages` and `package_messages` to `messages`. As that
|
||||
database is mostly about inserts and has no updates or deletes, we don't foresee
|
||||
much problems with this.
|
||||
|
||||
So we ended up with: Keep the same data structure but to not enforce the foreign key
|
||||
constaints on `user_messages` and `package_messages` to `messages`. As that database is
|
||||
mostly about inserts and has no updates or deletes, we don't foresee much problems with
|
||||
this.
|
||||
|
||||
Duplicated messages
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
@ -105,31 +106,29 @@ When testing datagrepper and datanommer in our test instance with the timescaled
|
|||
plugin, we saw a number of duplicated messages showing up in the `/raw` endpoint.
|
||||
Checking if we could fix this server side, we found out that the previous database
|
||||
schema had an `UNIQUE` constraint on `msg_id` field. However, with the timescaledb
|
||||
plugin, that constraint is now on both `msg_id` and `timestamp` fields, meaning
|
||||
a message can be inserted twice in the database if there is a little delay between
|
||||
the two inserts.
|
||||
|
||||
However, migrating datanommer from fedmsg to fedora-messaging should resolve that
|
||||
issue client side as rabbitmq will ensure there is only one consumer at a time
|
||||
handling a message.
|
||||
plugin, that constraint is now on both `msg_id` and `timestamp` fields, meaning a
|
||||
message can be inserted twice in the database if there is a little delay between the two
|
||||
inserts.
|
||||
|
||||
However, migrating datanommer from fedmsg to fedora-messaging should resolve that issue
|
||||
client side as rabbitmq will ensure there is only one consumer at a time handling a
|
||||
message.
|
||||
|
||||
Open questions
|
||||
--------------
|
||||
|
||||
* How will upgrading the postgresql version with the timescaledb plugin look like?
|
||||
|
||||
It looks like the timescaledb folks are involved enough in postgresql itself that
|
||||
we think things will work, but we have not had on-hands experience with it.
|
||||
- How will upgrading the postgresql version with the timescaledb plugin look like?
|
||||
|
||||
It looks like the timescaledb folks are involved enough in postgresql itself that we
|
||||
think things will work, but we have not had on-hands experience with it.
|
||||
|
||||
Patch
|
||||
-----
|
||||
|
||||
Here is the patch that needs to be applied to ``datanommer/models/__init__.py``
|
||||
to get it working with timescaledb's adjusted postgresql model.
|
||||
Here is the patch that needs to be applied to ``datanommer/models/__init__.py`` to get
|
||||
it working with timescaledb's adjusted postgresql model.
|
||||
|
||||
::
|
||||
.. code-block::
|
||||
|
||||
diff --git a/datanommer.models/datanommer/models/__init__.py b/datanommer.models/datanommer/models/__init__.py
|
||||
index ada58fa..7780433 100644
|
||||
|
|
|
@ -1,8 +1,8 @@
|
|||
Lies, Damn lies and Statistics
|
||||
==============================
|
||||
|
||||
In order to compare the performances of datagrepper in the different configuration
|
||||
we looked at, we wrote a small script that runs 30 requests in 10 parallel threads.
|
||||
In order to compare the performances of datagrepper in the different configuration we
|
||||
looked at, we wrote a small script that runs 30 requests in 10 parallel threads.
|
||||
|
||||
These requests are:
|
||||
|
||||
|
@ -15,22 +15,17 @@ These requests are:
|
|||
|
||||
We have then 4 different environments:
|
||||
|
||||
- prod/openshift: this is an openshift deployment of datagrepper hitting the
|
||||
production database, without any configuration change.
|
||||
|
||||
- prod/aws: this is an AWS deployment of datagrepper, hitting its own local
|
||||
database, with the ``DEFAULT_QUERY_DELTA`` configuration key set to 3 days.
|
||||
|
||||
- partition/aws: this is an AWS deployment of datagrepper, hitting its own
|
||||
local postgresql database where the ``messages`` table is partition by ``id``
|
||||
with each partition having 10 million records and the ``DEFAULT_QUERY_DELTA``
|
||||
configuration key set to 3 days.
|
||||
|
||||
- timescaledb/aws: this is an AWS deployment of datagrepper, hitting its own
|
||||
local postgresql database where the ``messages`` table as been partition via
|
||||
the `timescaledb` plugin and the ``DEFAULT_QUERY_DELTA`` configuration key set
|
||||
to 3 days.
|
||||
|
||||
- prod/openshift: this is an openshift deployment of datagrepper hitting the production
|
||||
database, without any configuration change.
|
||||
- prod/aws: this is an AWS deployment of datagrepper, hitting its own local database,
|
||||
with the ``DEFAULT_QUERY_DELTA`` configuration key set to 3 days.
|
||||
- partition/aws: this is an AWS deployment of datagrepper, hitting its own local
|
||||
postgresql database where the ``messages`` table is partition by ``id`` with each
|
||||
partition having 10 million records and the ``DEFAULT_QUERY_DELTA`` configuration key
|
||||
set to 3 days.
|
||||
- timescaledb/aws: this is an AWS deployment of datagrepper, hitting its own local
|
||||
postgresql database where the ``messages`` table as been partition via the
|
||||
`timescaledb` plugin and the ``DEFAULT_QUERY_DELTA`` configuration key set to 3 days.
|
||||
|
||||
Results
|
||||
-------
|
||||
|
@ -40,82 +35,54 @@ Here are the results for each environment and request.
|
|||
prod/openshift
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| | Requests per sec | Mean time per Req | Max time per Req | Percent success |
|
||||
+====================+==================+===================+==================+=================+
|
||||
| filter_by_topic | 0.32 | NA | 45.857601 | 0.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| plain_raw | 0.32 | NA | 31.955371 | 0.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| filter_by_category | 0.32 | NA | 31.632514 | 0.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| filter_by_username | 0.32 | NA | 33.549061 | 0.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| filter_by_package | 0.32 | NA | 34.531207 | 0.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| get_by_id | 1.57 | 1.575608 | 31.259095 | 86.67% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
|
||||
================== ==== ======== ========= ======
|
||||
================== ==== ======== ========= ======
|
||||
filter_by_topic 0.32 NA 45.857601 0.00%
|
||||
plain_raw 0.32 NA 31.955371 0.00%
|
||||
filter_by_category 0.32 NA 31.632514 0.00%
|
||||
filter_by_username 0.32 NA 33.549061 0.00%
|
||||
filter_by_package 0.32 NA 34.531207 0.00%
|
||||
get_by_id 1.57 1.575608 31.259095 86.67%
|
||||
================== ==== ======== ========= ======
|
||||
|
||||
prod/aws
|
||||
~~~~~~~~
|
||||
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| | Requests per sec | Mean time per Req | Max time per Req | Percent success |
|
||||
+====================+==================+===================+==================+=================+
|
||||
| filter_by_topic | 7.6 | 1.0068 | 11.2743 | 100.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| plain_raw | 9.06 | 0.712975 | 3.323922 | 100.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| filter_by_category | 12.43 | 0.489915 | 1.676223 | 100.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| filter_by_username | 1.49 | 5.83623 | 10.661274 | 100.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| filter_by_package | 0 | 52.69256 | 120.229874 | 1.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| get_by_id | 0.73 | 1.534168 | 60.455334 | 83.33% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
|
||||
================== ===== ======== ========== =======
|
||||
================== ===== ======== ========== =======
|
||||
filter_by_topic 7.6 1.0068 11.2743 100.00%
|
||||
plain_raw 9.06 0.712975 3.323922 100.00%
|
||||
filter_by_category 12.43 0.489915 1.676223 100.00%
|
||||
filter_by_username 1.49 5.83623 10.661274 100.00%
|
||||
filter_by_package 0 52.69256 120.229874 1.00%
|
||||
get_by_id 0.73 1.534168 60.455334 83.33%
|
||||
================== ===== ======== ========== =======
|
||||
|
||||
partition/aws
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| | Requests per sec | Mean time per Req | Max time per Req | Percent success |
|
||||
+====================+==================+===================+==================+=================+
|
||||
| filter_by_topic | 9.98 | 0.711219 | 3.204178 | 100.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| plain_raw | 9.70 | 0.641497 | 1.24704 | 100.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| filter_by_category | 13.32 | 0.455219 | 0.594465 | 100.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| filter_by_username | 1.3 | 7.084018 | 12.079198 | 100.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| filter_by_package | 0 | 55.231556 | 120.125013 | 1.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| get_by_id | 0.48 | 2.198211 | 60.444765 | 76.67% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
|
||||
================== ===== ========= ========== =======
|
||||
================== ===== ========= ========== =======
|
||||
filter_by_topic 9.98 0.711219 3.204178 100.00%
|
||||
plain_raw 9.70 0.641497 1.24704 100.00%
|
||||
filter_by_category 13.32 0.455219 0.594465 100.00%
|
||||
filter_by_username 1.3 7.084018 12.079198 100.00%
|
||||
filter_by_package 0 55.231556 120.125013 1.00%
|
||||
get_by_id 0.48 2.198211 60.444765 76.67%
|
||||
================== ===== ========= ========== =======
|
||||
|
||||
timescaledb/aws
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| | Requests per sec | Mean time per Req | Max time per Req | Percent success |
|
||||
+====================+==================+===================+==================+=================+
|
||||
| filter_by_topic | 14.1 | 0.4286 | 0.514617 | 100.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| plain_raw | 12.89 | 0.48235 | 0.661073 | 100.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| filter_by_category | 13.94 | 0.423172 | 0.507337 | 100.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| filter_by_username | 2.68 | 3.188782 | 5.096244 | 100.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| filter_by_package | 0.26 | 33.216631 | 57.901159 | 100.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
| get_by_id | 12.69 | 0.749068 | 1.73515 | 100.00% |
|
||||
+--------------------+------------------+-------------------+------------------+-----------------+
|
||||
|
||||
================== ===== ========= ========= =======
|
||||
================== ===== ========= ========= =======
|
||||
filter_by_topic 14.1 0.4286 0.514617 100.00%
|
||||
plain_raw 12.89 0.48235 0.661073 100.00%
|
||||
filter_by_category 13.94 0.423172 0.507337 100.00%
|
||||
filter_by_username 2.68 3.188782 5.096244 100.00%
|
||||
filter_by_package 0.26 33.216631 57.901159 100.00%
|
||||
get_by_id 12.69 0.749068 1.73515 100.00%
|
||||
================== ===== ========= ========= =======
|
||||
|
||||
Graphs
|
||||
------
|
||||
|
@ -128,24 +95,20 @@ Percentage of success
|
|||
.. image:: ../_static/datanommer_percent_sucess.jpg
|
||||
:target: ../_images/datanommer_percent_sucess.jpg
|
||||
|
||||
|
||||
Requests per second
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. image:: ../_static/datanommer_req_per_sec.jpg
|
||||
:target: ../_images/datanommer_req_per_sec.jpg
|
||||
|
||||
|
||||
Mean time per request
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. image:: ../_static/datanommer_mean_per_req.jpg
|
||||
:target: ../_images/datanommer_mean_per_req.jpg
|
||||
|
||||
|
||||
Maximum time per request
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. image:: ../_static/datanommer_max_per_req.jpg
|
||||
:target: ../_images/datanommer_max_per_req.jpg
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue