Added short report on performance of pg column of array type
This commit is contained in:
parent
8971cc5538
commit
bb9779faf3
1 changed files with 101 additions and 0 deletions
101
docs/datanommer_datagrepper/pg_array_column_postgrest.rst
Normal file
101
docs/datanommer_datagrepper/pg_array_column_postgrest.rst
Normal file
|
@ -0,0 +1,101 @@
|
||||||
|
Using the array type for user and package queries
|
||||||
|
===============================
|
||||||
|
|
||||||
|
Currently, we use auxiliary tables to query for messages related to packages or users,
|
||||||
|
in the standard RDBS fashion.
|
||||||
|
|
||||||
|
We came to some problems when trying to enforce foreign key constrains while using the timescaledb
|
||||||
|
extension. We decided to try, if just using a column with array type with proper indes would have simmilar performace.
|
||||||
|
|
||||||
|
|
||||||
|
Resources
|
||||||
|
---------
|
||||||
|
|
||||||
|
* PG 12 Array type: https://www.postgresql.org/docs/12/arrays.html
|
||||||
|
* GIN index: https://www.postgresql.org/docs/12/gin.html
|
||||||
|
* GIN operators for BTREE: https://www.postgresql.org/docs/11/btree-gin.html
|
||||||
|
|
||||||
|
|
||||||
|
Installing/enabling/activating
|
||||||
|
------------------------------
|
||||||
|
|
||||||
|
To have comparable results, we enabled timescaledb in same fashion as in our other experiment.
|
||||||
|
|
||||||
|
To add new column
|
||||||
|
::
|
||||||
|
|
||||||
|
alter table messages2 add column packages text[];
|
||||||
|
|
||||||
|
To populate it
|
||||||
|
::
|
||||||
|
|
||||||
|
update messages2 set packages=t_agg.p_agg from
|
||||||
|
(select msg, array_agg(package) as p_agg from package_messages group by msg) as t_agg where messages.id=t_agg.msg;
|
||||||
|
|
||||||
|
We need to enable the btree_gin extension to be able to create index with array as well as timestamp
|
||||||
|
::
|
||||||
|
|
||||||
|
CREATE EXTENSION btree_gin;
|
||||||
|
|
||||||
|
To create the index
|
||||||
|
::
|
||||||
|
|
||||||
|
CREATE INDEX idx_msg_user on "messages2" USING GIN ("timestamp", "packages");
|
||||||
|
|
||||||
|
To help reuse our testing script, we setup postgrest locally
|
||||||
|
::
|
||||||
|
|
||||||
|
podman run --rm --net=host -p 3000:3000 -e PGRST_DB_URI=$DBURI -e PGRST_DB_ANON_ROLE="datagrepper" -e PGRST_MAX_ROWS=25 postgrest/postgrest:v7.0.
|
||||||
|
|
||||||
|
Because we focused only on package queries, as user colun couldn't be populated due to constraints on size,
|
||||||
|
we chose two as representative. There is implicit limit to return just 25 rows.
|
||||||
|
|
||||||
|
A simple membership:
|
||||||
|
::
|
||||||
|
|
||||||
|
/messages_ts?packages=ov.{{kernel}}
|
||||||
|
|
||||||
|
A simple membership ordered by time.
|
||||||
|
::
|
||||||
|
|
||||||
|
/messages_ts?order=timestamp.desc&packages=ov.{{kernel}}
|
||||||
|
|
||||||
|
Findings
|
||||||
|
--------
|
||||||
|
|
||||||
|
Querying just the package membership
|
||||||
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The queries were surprisingly fast, with maximum under 4 seconds and
|
||||||
|
mean around half a second. This encouraged us to do further experiments.
|
||||||
|
|
||||||
|
Results ::
|
||||||
|
|
||||||
|
test_filter_by_package
|
||||||
|
Requests: 300, pass: 300, fail: 0, exception: 0
|
||||||
|
For pass requests:
|
||||||
|
Request per Second - mean: 3.63
|
||||||
|
Time per Request - mean: 0.522946, min: 0.000000, max: 3.907548
|
||||||
|
|
||||||
|
Querying just the package membership ordered by timestamp desc
|
||||||
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Usually we want to see most recent messages. So we ammended the query,
|
||||||
|
to include "order by timestamp desc". The result was less encouraging,
|
||||||
|
with longest succesful query taking more than 90 seconds and several timing out.
|
||||||
|
|
||||||
|
This seems to be the result of GIN index not supporting order in the index.
|
||||||
|
|
||||||
|
Results ::
|
||||||
|
|
||||||
|
test_filter_by_package
|
||||||
|
Requests: 300, pass: 280, fail: 0, exception: 20
|
||||||
|
For pass requests:
|
||||||
|
Request per Second - mean: 0.53
|
||||||
|
Time per Request - mean: 7.474040, min: 0.000000, max: 99.880939
|
||||||
|
|
||||||
|
Conclusion
|
||||||
|
--------------
|
||||||
|
|
||||||
|
While array support seems interesting, and for simple queries very fast, indexes that require ordering
|
||||||
|
don't seem to be supported. This makes strong case against using them.
|
Loading…
Add table
Add a link
Reference in a new issue