Update the documentation about the datanommer/datagrepper work
Signed-off-by: Pierre-Yves Chibon <pingou@pingoured.fr>
This commit is contained in:
parent
1ff0d8fbd2
commit
f61f8c482a
3 changed files with 117 additions and 0 deletions
|
@ -25,3 +25,5 @@ Here is the list of ideas/things we looked at:
|
|||
:maxdepth: 1
|
||||
|
||||
pg_stat_statements
|
||||
pg_partitioning
|
||||
pg_timescaledb
|
||||
|
|
60
docs/datanommer_datagrepper/pg_partitioning.rst
Normal file
60
docs/datanommer_datagrepper/pg_partitioning.rst
Normal file
|
@ -0,0 +1,60 @@
|
|||
Partitioning the database
|
||||
=========================
|
||||
|
||||
In the database used by datanommer and datagrepper one table stands out from the
|
||||
other ones by its size, the ``messages`` table. This can be observed in
|
||||
:ref:`datanommer`.
|
||||
|
||||
One possibility to speed things up in datagrepper is to partition that table
|
||||
into a set of smaller sized partitions.
|
||||
|
||||
Here are some resources regarding partitioning postgresql tables:
|
||||
|
||||
* Table partitioning at postgresql's documentation: https://www.postgresql.org/docs/13/ddl-partitioning.html
|
||||
* How to use table partitioning to scale PostgreSQL: https://www.enterprisedb.com/postgres-tutorials/how-use-table-partitioning-scale-postgresql
|
||||
* Definition of PostgreSQL Partition: https://www.educba.com/postgresql-partition/
|
||||
|
||||
|
||||
Attempt #1
|
||||
----------
|
||||
|
||||
For our first attempt at partitioning the `messages` table, we thought we would
|
||||
partition it by year. Having a different partition for each year.
|
||||
We thus started by adding a ``year`` field to the table and fill it by extracting
|
||||
the year from the ``timestamp`` field of the table.
|
||||
|
||||
However, one thing to realize when using partitioned table is that each partition
|
||||
needs to be considered as an independant table. Meaning an unique constraint has
|
||||
to involve the field on which the table is partitioned.
|
||||
In other words, if you partition the table by a year field, that year field will
|
||||
need to be part of the primary key as well as any ``UNIQUE`` constraint on the
|
||||
table.
|
||||
|
||||
So to partition the `messages` table on ``year``, we had to add the ``year``
|
||||
field to the primary key. However, that broke the foreign key constraints on
|
||||
the ``user_messages`` and ``package_messages`` tables which rely on the ``id``
|
||||
field to link the tables.
|
||||
|
||||
|
||||
Attempt #2
|
||||
----------
|
||||
|
||||
Since partitioning on ``year`` did not work, we reconsidered and decided to
|
||||
partition on the ``id`` field instead using `RANGE PARTITION`.
|
||||
|
||||
We partitioned the ``messages`` table on the ``id`` field with partition of 10
|
||||
million records each. This has the advantage of making each partition of similar
|
||||
sizes.
|
||||
|
||||
|
||||
|
||||
|
||||
More resources
|
||||
--------------
|
||||
|
||||
These are a few more resources we looked at and thought were worth bookmarking:
|
||||
|
||||
* Automatic partitioning by day - PostgreSQL: https://stackoverflow.com/questions/55642326/
|
||||
* pg_partman, partition manager: https://github.com/pgpartman/pg_partman
|
||||
* How to scale PostgreSQL 10 using table inheritance and declarative partitioning: https://blog.timescale.com/blog/scaling-partitioning-data-postgresql-10-explained-cd48a712a9a1/
|
||||
|
55
docs/datanommer_datagrepper/pg_timescaledb.rst
Normal file
55
docs/datanommer_datagrepper/pg_timescaledb.rst
Normal file
|
@ -0,0 +1,55 @@
|
|||
Using the timescaledb extension
|
||||
===============================
|
||||
|
||||
timescaledb (https://docs.timescale.com/latest/) is a postgresql extension for
|
||||
time-series database.
|
||||
Considering a lot of the actions done on datagrepper involve the timestamp field
|
||||
(for example: all the messages with that topic in this time range), we figured
|
||||
this extension is worth investigating.
|
||||
|
||||
A bonus point being for this extension to already packaged and available in
|
||||
Fedora and EPEL.
|
||||
|
||||
|
||||
Resources
|
||||
---------
|
||||
|
||||
* Setting up/enabling timescaledb: https://severalnines.com/database-blog/how-enable-timescaledb-existing-postgresql-database
|
||||
* Migrating an existing database to timescaledb: https://docs.timescale.com/latest/getting-started/migrating-data#same-db
|
||||
|
||||
|
||||
Installing/enabling/activating
|
||||
------------------------------
|
||||
|
||||
To install the plugin, simply run:
|
||||
::
|
||||
|
||||
dnf install timescaledb
|
||||
|
||||
The edit ``/var/lib/pgsql/data/postgresql.conf`` to tell postgresql to load it:
|
||||
::
|
||||
|
||||
shared_preload_libraries = 'pg_stat_statements,timescaledb'
|
||||
timescaledb.max_background_workers=4
|
||||
|
||||
|
||||
It will then need a restart of the entire database server:
|
||||
::
|
||||
|
||||
systemctl restart postgresql
|
||||
|
||||
You can then check if the extension loaded properly:
|
||||
::
|
||||
|
||||
$ sudo -u postgres psql
|
||||
SELECT * FROM pg_available_extensions ORDER BY name;
|
||||
|
||||
Then, you will need to activate it for your database:
|
||||
::
|
||||
$ sudo -u postgres psql <database_name>
|
||||
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
|
||||
|
||||
Finally, you can check that the extension was activated for your database:
|
||||
::
|
||||
$ sudo -u postgres psql <database_name>
|
||||
\dx
|
Loading…
Add table
Add a link
Reference in a new issue