Added short report on performance of pg column of array type

This commit is contained in:
Adam Saleh 2021-02-15 15:52:47 +01:00
parent 8971cc5538
commit bb9779faf3

View file

@ -0,0 +1,101 @@
Using the array type for user and package queries
===============================
Currently, we use auxiliary tables to query for messages related to packages or users,
in the standard RDBS fashion.
We came to some problems when trying to enforce foreign key constrains while using the timescaledb
extension. We decided to try, if just using a column with array type with proper indes would have simmilar performace.
Resources
---------
* PG 12 Array type: https://www.postgresql.org/docs/12/arrays.html
* GIN index: https://www.postgresql.org/docs/12/gin.html
* GIN operators for BTREE: https://www.postgresql.org/docs/11/btree-gin.html
Installing/enabling/activating
------------------------------
To have comparable results, we enabled timescaledb in same fashion as in our other experiment.
To add new column
::
alter table messages2 add column packages text[];
To populate it
::
update messages2 set packages=t_agg.p_agg from
(select msg, array_agg(package) as p_agg from package_messages group by msg) as t_agg where messages.id=t_agg.msg;
We need to enable the btree_gin extension to be able to create index with array as well as timestamp
::
CREATE EXTENSION btree_gin;
To create the index
::
CREATE INDEX idx_msg_user on "messages2" USING GIN ("timestamp", "packages");
To help reuse our testing script, we setup postgrest locally
::
podman run --rm --net=host -p 3000:3000 -e PGRST_DB_URI=$DBURI -e PGRST_DB_ANON_ROLE="datagrepper" -e PGRST_MAX_ROWS=25 postgrest/postgrest:v7.0.
Because we focused only on package queries, as user colun couldn't be populated due to constraints on size,
we chose two as representative. There is implicit limit to return just 25 rows.
A simple membership:
::
/messages_ts?packages=ov.{{kernel}}
A simple membership ordered by time.
::
/messages_ts?order=timestamp.desc&packages=ov.{{kernel}}
Findings
--------
Querying just the package membership
~~~~~~~~~~~~~~~~~
The queries were surprisingly fast, with maximum under 4 seconds and
mean around half a second. This encouraged us to do further experiments.
Results ::
test_filter_by_package
Requests: 300, pass: 300, fail: 0, exception: 0
For pass requests:
Request per Second - mean: 3.63
Time per Request - mean: 0.522946, min: 0.000000, max: 3.907548
Querying just the package membership ordered by timestamp desc
~~~~~~~~~~~~~~~~~
Usually we want to see most recent messages. So we ammended the query,
to include "order by timestamp desc". The result was less encouraging,
with longest succesful query taking more than 90 seconds and several timing out.
This seems to be the result of GIN index not supporting order in the index.
Results ::
test_filter_by_package
Requests: 300, pass: 280, fail: 0, exception: 20
For pass requests:
Request per Second - mean: 0.53
Time per Request - mean: 7.474040, min: 0.000000, max: 99.880939
Conclusion
--------------
While array support seems interesting, and for simple queries very fast, indexes that require ordering
don't seem to be supported. This makes strong case against using them.