Added short report on performance of pg column of array type
This commit is contained in:
parent
8971cc5538
commit
bb9779faf3
1 changed files with 101 additions and 0 deletions
101
docs/datanommer_datagrepper/pg_array_column_postgrest.rst
Normal file
101
docs/datanommer_datagrepper/pg_array_column_postgrest.rst
Normal file
|
@ -0,0 +1,101 @@
|
|||
Using the array type for user and package queries
|
||||
===============================
|
||||
|
||||
Currently, we use auxiliary tables to query for messages related to packages or users,
|
||||
in the standard RDBS fashion.
|
||||
|
||||
We came to some problems when trying to enforce foreign key constrains while using the timescaledb
|
||||
extension. We decided to try, if just using a column with array type with proper indes would have simmilar performace.
|
||||
|
||||
|
||||
Resources
|
||||
---------
|
||||
|
||||
* PG 12 Array type: https://www.postgresql.org/docs/12/arrays.html
|
||||
* GIN index: https://www.postgresql.org/docs/12/gin.html
|
||||
* GIN operators for BTREE: https://www.postgresql.org/docs/11/btree-gin.html
|
||||
|
||||
|
||||
Installing/enabling/activating
|
||||
------------------------------
|
||||
|
||||
To have comparable results, we enabled timescaledb in same fashion as in our other experiment.
|
||||
|
||||
To add new column
|
||||
::
|
||||
|
||||
alter table messages2 add column packages text[];
|
||||
|
||||
To populate it
|
||||
::
|
||||
|
||||
update messages2 set packages=t_agg.p_agg from
|
||||
(select msg, array_agg(package) as p_agg from package_messages group by msg) as t_agg where messages.id=t_agg.msg;
|
||||
|
||||
We need to enable the btree_gin extension to be able to create index with array as well as timestamp
|
||||
::
|
||||
|
||||
CREATE EXTENSION btree_gin;
|
||||
|
||||
To create the index
|
||||
::
|
||||
|
||||
CREATE INDEX idx_msg_user on "messages2" USING GIN ("timestamp", "packages");
|
||||
|
||||
To help reuse our testing script, we setup postgrest locally
|
||||
::
|
||||
|
||||
podman run --rm --net=host -p 3000:3000 -e PGRST_DB_URI=$DBURI -e PGRST_DB_ANON_ROLE="datagrepper" -e PGRST_MAX_ROWS=25 postgrest/postgrest:v7.0.
|
||||
|
||||
Because we focused only on package queries, as user colun couldn't be populated due to constraints on size,
|
||||
we chose two as representative. There is implicit limit to return just 25 rows.
|
||||
|
||||
A simple membership:
|
||||
::
|
||||
|
||||
/messages_ts?packages=ov.{{kernel}}
|
||||
|
||||
A simple membership ordered by time.
|
||||
::
|
||||
|
||||
/messages_ts?order=timestamp.desc&packages=ov.{{kernel}}
|
||||
|
||||
Findings
|
||||
--------
|
||||
|
||||
Querying just the package membership
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
The queries were surprisingly fast, with maximum under 4 seconds and
|
||||
mean around half a second. This encouraged us to do further experiments.
|
||||
|
||||
Results ::
|
||||
|
||||
test_filter_by_package
|
||||
Requests: 300, pass: 300, fail: 0, exception: 0
|
||||
For pass requests:
|
||||
Request per Second - mean: 3.63
|
||||
Time per Request - mean: 0.522946, min: 0.000000, max: 3.907548
|
||||
|
||||
Querying just the package membership ordered by timestamp desc
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
Usually we want to see most recent messages. So we ammended the query,
|
||||
to include "order by timestamp desc". The result was less encouraging,
|
||||
with longest succesful query taking more than 90 seconds and several timing out.
|
||||
|
||||
This seems to be the result of GIN index not supporting order in the index.
|
||||
|
||||
Results ::
|
||||
|
||||
test_filter_by_package
|
||||
Requests: 300, pass: 280, fail: 0, exception: 20
|
||||
For pass requests:
|
||||
Request per Second - mean: 0.53
|
||||
Time per Request - mean: 7.474040, min: 0.000000, max: 99.880939
|
||||
|
||||
Conclusion
|
||||
--------------
|
||||
|
||||
While array support seems interesting, and for simple queries very fast, indexes that require ordering
|
||||
don't seem to be supported. This makes strong case against using them.
|
Loading…
Add table
Add a link
Reference in a new issue