PostgreSQL looks very competitive with other mainstream databases on OLTP workload (execution of large number of simple queries). But on OLAP queries, requiring processing of larger volumes of data, DBMS-es oriented on analytic queries processing can provide an order of magnitude better speed. Let's investigate why it happen and can we do something to make PostgreSQL efficient also for OLAP queries.
Profiling execution of queries shows several main factors, limiting Postgres performance:
There are many different ways of addressing this issues. For example we can use JIT (Just-In-Time) compiler to generate native code for query and eliminate interpretation overhead and increase heap deform speed. We can rewrite optimizer from pull to push model. We can try to optimize tuple format to make heap deforming more efficient. Or we can generate byte code for query execution plan which interpretation is more efficient than recursive invocation of evaluate function for each node because of better access locality. But all this approaches require significant rewriting of Postgres executor and some of them also require changes of all Postgres architecture.
But there is an approach which allows to address most of this issues without radical changes of executor. It is vector operations. It is explained in next section.
Traditional query executor (like Postgres executor) deals with single row of data at each moment of time. If it has to evaluate expression (x+y) then it fetches value of "x", then value of "y", performs operation "+" and returns the result value to the upper node. In contrast vectorized executor is able to process in one operation multiple values. In this case "x" and "y" represent not just a single scalar value, but vector of values and result is also a vector of values. In vector execution model interpretation and function call overhead is divided by size of vector. The price of performing function call is the same, but as far as function proceeds N values instead of just one, this overhead become less critical.
What is the optimal size for the vector? From the explanation above it is clear that the larger vector is, the less per-row overhead we have. So we can form vector from all values of the correspondent table attribute. It is so called vertical data model or columnar store. Unlike classical "horizontal" data model where the unit of storing data is row (tuple), here we have vertical columns. Columnar store has the following main advantages:
There are several DBMS-es implementing columnar store model. Most popular are Vertical, MonetDB. But actually performing operation on the whole column is not so good idea. Table can be very large (OLAP queries are used to work with large data sets), so vector can also be very big and even doesn't fit in memory. But even if it fits in memory, working with such larger vectors prevent efficient utilization of CPU caches (L1, L2,...). Consider expression (x+y)*(x-y). Vector executor performs addition of two vectors : "x" and "y" and produces result vector "r1". But when last element of vector "r" is produced, first elements of vector "r1" are already thrown from CPU cache, as well as first elements of "x" and "y" vectors. So when we need to calculate (x-y) we once again have to load data for "x" and "y" from slow memory to fast cache. Then we produce "r2" and perform multiplication of "r1" and "r2". But here we also need first to load data for this vectors into the CPU cache.
So it is more efficient to split column into relatively small chunks (or tiles - there is no single notion for it accepted by everyone). This chunk is a unit of processing by vectorized executor. Size of such chunk is chosen to keep all operands of vector operations in cache even for complex expressions. Typical size of chunk is from 100 to 1000 elements. So in case of (x+y)*(x-y) expression, we calculate it not for the whole column but only for 100 values (assume that size of the chunk is 100). Splitting columns into chunks in successors of MonetDB x100 and HyPer allows to increase speed up to ten times.
There are several attempts to integrate columnar store in PostgreSQL. The most known is CStore FDW by CitusDB. It is implemented as foreign data wrapper (FDW) and is efficient for queries fetching relatively small fraction of columns. But it is using standard Postgres raw-based executor and so is not able to take advantages of vector processing. There is interesting project done by CitusDB intern. He implements vector operations on top of CStore using executors hooks for some nodes. IT reports 4-6 times speedup for grand aggregates and 3 times speedup for aggregation with group by.
Another project is IMCS: In-Memory Columnar Store. Here columnar store is implemented in memory and is accessed using special functions. So you can not use standard SQL query to work with this storage - you have to rewrite it using IMCS functions. IMCS provides vector operations (using tiles) and parallel execution.
Both CStore and IMCS are keeping data outside Postgres. But what if we want to use vector operations for data kept in standard Postgres tables? Definitely, the best approach is to impalement alternative heap format. Or even further: eliminate notion of heap at all - treat heap just as yet another access method, similar with other indexes.
But such radical changes requires deep redesign of all Postgres architecture. It will be better to estimate first possible advantages we can expect from usage of vector vector operations. Vector executor is widely discussed in Postgres forums, but efficient vector executor is not possible without underlying support at storage layer. Advantages of vector processing will be annihilated if vectors are formed from attributes of rows extracted from existed Postgres heap page.
The idea of VOPS extension is to implement vector operations for tiles represented as special Postgres types. Tiles should be used as table column types instead of scalar types. For example instead of "real" we should use "vops_float4" which is tile representing up to 64 values of the correspondent column. Why 64? There are several reasons for choosing this number:
VOPS is implemented as Postgres extension. It doesn't change anything in Postgres executor and page format. It also doesn't setup any executors hooks or alter query execution plan. The main idea of this project was to measure speedup which can be reached by using vector operation with existed executor and heap manager. VOPS provides set of standard operators for tile types, allowing to write SQL queries in the way similar with normal SQL queries. Right now vector operators can be used inside predicates and aggregate expressions. Joins are not currently supported. Details of VOPS architecture are described below.
VOPS supports all basic Postgres numeric types: 1,2,4,8 byte integers and 4,8 bytes floats.
Also it supports date
and timestamp
types but them are using the same implementation as
int4
and int8
correspondingly.
SQL type | C type | VOPS tile type |
---|---|---|
bool | bool | vops_bool |
"char", char or char(1) | char | vops_char |
int2 | int16 | vops_int2 |
int4 | int32 | vops_int4 |
int8 | int64 | vops_int8 |
float4 | float4 | vops_float4 |
float8 | float8 | vops_float8 |
date | DateADT | vops_date |
timestamp | Timestamp | vops_timestamp |
VOPS doesn't support work with strings (char or varchar types), except case of single character. If strings are used as identifiers, in most cases it is preferable to place them in some dictionary and use integer identifiers instead of original strings.
VOPS provides implementation of all built-in SQL arithmetic operations for
numeric types: + - / *
Certainly it also implements all comparison operators: = <> > >= < <=.
Operands of such operators can be either tiles, either scalar constants: x=y
or x=1
.
Boolean operators and
, or
, not
can not be overloaded.
This is why VOPS provides instead of them operators & | !. Please notice that precedence of this operators is different
from and
, or
, not
operators. So you can not write predicate as x=1 | x=2
- it will
cause syntax error. To solve this problem please use braces: (x=1) | (x=2)
.
Also VOPS provides analog of between operator. In SQL expression (x BETWEEN a AND b)
is equivalent to
(x >= a AND x <= b)
. But as far as AND operator can not be overloaded, such substitution will not work for VOPS tiles.
This is why VOPS provides special function for range check. Unfortunately BETWEEN
is reserved keyword, so
no function with such name can be defined. This is why synonym BETWIXT
is used.
Postgres requires predicate expression to have boolean type. But result of vector boolean operators is vops_bool
, not bool
.
This is why compiler doesn't allow to use it in predicate. The problem can be solved by introducing special filter
function.
This function is given arbitrary vector boolean expression and returns normal boolean which ... is always true.
So from Postgres executor point of view predicate value is always true. But filter
function sets filter_mask
which is actually used in subsequent operators to determine selected records.
So query in VOPS looks something like this:
select sum(price) from trades where filter(day >= '2017-01-01'::date);
Please notice one more difference from normal sequence: we have to use explicit cast of string constant to appreciate data type (date
type in this example). For betwixt
function it is not needed:
select sum(price) from trades where filter(betwixt(day, '2017-01-01', '2017-02-01'));
For char
, int2
and int4
types VOPS provides concatenation operator || which produces doubled integer type:
(char || char) -> int2
, (int2 || int2) -> int4
, (int4 || int4) -> int8
.
Them can be used for grouping by several columns (see below).
Operator | Description |
---|---|
+ | Addition |
- | Binary subtraction or unary negation |
* | Multiplication |
/ | Division |
= | Equals |
<> | Not equals |
< | Less than |
<= | Less than or Equals |
> | Greater than |
>= | Greater than or equals |
& | Boolean AND |
| | Boolean OR |
! | Boolean NOT |
BITWIXT | Analog of BETWEEN |
OLAP queries usually perform some kind of aggregation of large volumes of data. These includes grand
aggregates which are calculated for the whole
table or aggregates with group by
which are calculated for each group.
VOPS implements all standard SQL aggregates: count, min, max, sum, avg
. Them can be used exactly in the same way as in normal SQL queries:
select sum(l_extendedprice*l_discount) as revenue from vops_lineitem where filter(betwixt(l_shipdate, '1996-01-01', '1997-01-01') & betwixt(l_discount, 0.08, 0.1) & (l_quantity < 24));
Using aggregation with group by is more complex. VOPS provides two functions for it: map
and reduce
.
The work is actually done by map(group_by_expression, aggregate_list, expr {, expr })
VOPS implements aggregation using hash table, which entries collect aggregate states for all groups. And set returning function reduce
just
iterates through the hash table consrtucted by map
. reduce
function is needed because result of aggregate in Postgres can not be a set. So aggregate query with group by looks something like this:
select reduce(map(l_returnflag||l_linestatus, 'sum,sum,sum,sum,avg,avg,avg', l_quantity, l_extendedprice, l_extendedprice*(1-l_discount), l_extendedprice*(1-l_discount)*(1+l_tax), l_quantity, l_extendedprice, l_discount)) from vops_lineitem where filter(l_shipdate <= '1998-12-01'::date);
Here we use concatenation operator to perform grouping by two columns.
Right now VOPS supports grouping only by integer type.
Another serious restriction is that all aggregated expressions should have the same type, for example vops_float4
.
It is not possible to calculate aggregates for vops_float4
and vopd_int8
columns in one call of map
function,
because it accepts aggregation arguments as variadic array, so all elements of this array should have the same type.
Aggregate string in map
function should contain list of requested aggregate functions, separated by colon.
Standard lowercase names should be used: count, sum, agg, min, max
. Count is executed for the particular column: count(x)
.
There is no need to explicitly specify count(*)
because number of records in each group is returned by reduce
function in any case.
reduce
function returns set of vops_aggregate
type.
It contains three components: value of group by expression, number of records in the group and array of floats with aggregate values.
Please notice that values of all aggregates, including count
and min/max
, are returned as floats.
create type vops_aggregates as(group_by int8, count int8, aggs float8[]); create function reduce(bigint) returns setof vops_aggregates;
Analytic queries are usually performed on the data for which no indexes are defined. And columnar store vector operations are most efficient in this case. But it is still possible to use indexes with VOPS.
As far as each VOPS tile represents multiple values, index can be used only for some preliminary, non-precise filtering of data.
It is something similar with BRIN indexes.
VOPS provides four functions: first, last, high, low
which can be used to obtain high/low boundary of values stored in the tile.
First two functions first
and last
should be used for sorted data set. In this case first value is the smallest value in the tile
and last value is the largest value in the tile. If data is not sorted, then low
high functions should be used, which
are more expensive, because them need to inspect all tile values.
Using this four function it is possible to construct functional indexes for VOPS table.
BRIN index seems to be the best choice for VOPS table:
create index low_boundary on trades using brin(first(day)); -- trades table is ordered by day create index high_boundary on trades using brin(last(day)); -- trades table is ordered by day
Now it is possible to use this indexes in query. Please notice that we have to recheck precise condition because index gives only approximate result:
select sum(price) from trades where first(day) >= '2015-01-01' and last(day) <= '2016-01-01' and filter(betwixt(day, '2015-01-01', '2016-01-01'));
Now the most interesting question (from which may be we should start) - how we managed to prepare data for VOPS queries?
Who and how will combine attribute values of several rows inside one VOPS tile?
It is done by populate
functions, provided by VOPS extension.
First of all you need to create table with columns having VOPS tile types.
It can map all columns of the original table or just some most frequently used subset of them.
This table can be treated as projection
of original table
(this concept of projections is taken from Vertica).
Projection should include columns which are most frequently used together in queries.
Original table from TPC-H benchmark:
create table lineitem( l_orderkey integer, l_partkey integer, l_suppkey integer, l_linenumber integer, l_quantity real, l_extendedprice real, l_discount real, l_tax real, l_returnflag char, l_linestatus char, l_shipdate date, l_commitdate date, l_receiptdate date, l_shipinstruct char(25), l_shipmode char(10), l_comment char(44));
VOPS projection of this table:
create table vops_lineitem( l_shipdate vops_date not null, l_quantity vops_float4 not null, l_extendedprice vops_float4 not null, l_discount vops_float4 not null, l_tax vops_float4 not null, l_returnflag vops_char not null, l_linestatus vops_char not null );
Original table can be treated as write optimized storage (WOS). If it has not indexes, then Postgres is able to provide very fast insertion speed, comparable with raw disk write speed. Projection in VOPS format can be treated as read-optimized storage (ROS), most efficient for execution of OLAP queries.
Data can be transferred from original to projected table using VOPS populate
function:
create function populate(destination regclass, source regclass, predicate cstring default null, sort cstring default null);
Two first mandatory arguments of this function specify target and source tables. Optional predicate and sort clauses allow to restrict amount of imported data and enforce requested order. By specifying predicate it is possible to update VOPS table using only most recently received records. Example of populate function invocation:
select populate(destination := 'vops_lineitem'::regclass, source := 'lineitem'::regclass, sort := 'l_shipdate');
A query from VOPS projection returns set of tiles. Output function of tile type is able to print content of the tile.
But in some cases it is preferable to transfer result to normal (horizontal) format where each tuple represents one record.
It can be done using unnest
function:
postgres=# select unnest(l.*) from vops_lineitem l where filter(l_shipdate <= '1998-12-01'::date) limit 3; unnest --------------------------------------- (1996-03-13,17,33078.9,0.04,0.02,N,O) (1996-04-12,36,38306.2,0.09,0.06,N,O) (1996-01-29,8,15479.7,0.1,0.02,N,O) (3 rows)
The most popular benchmark for OLAP is TPC-H. It contains 21 different queries. We adopted for VOPS only two of them: Q1 and Q6 which are not using joins. Most of fragments of this code are already mentioned above, but here we collect it together:
-- Standard way of creating extension create extension vops; -- Original TPC-H table create table lineitem( l_orderkey integer, l_partkey integer, l_suppkey integer, l_linenumber integer, l_quantity real, l_extendedprice real, l_discount real, l_tax real, l_returnflag char, l_linestatus char, l_shipdate date, l_commitdate date, l_receiptdate date, l_shipinstruct char(25), l_shipmode char(10), l_comment char(44), l_dummy char(1)); -- this table is needed because of terminator after last column in generated data -- Import data to it copy lineitem from '/mnt/data/lineitem.tbl' delimiter '|' csv; -- Create VOPS projection create table vops_lineitem( l_shipdate vops_date not null, l_quantity vops_float4 not null, l_extendedprice vops_float4 not null, l_discount vops_float4 not null, l_tax vops_float4 not null, l_returnflag vops_char not null, l_linestatus vops_char not null ); -- Copy data to the projection table select populate(destination := 'vops_lineitem'::regclass, source := 'lineitem'::regclass); -- For honest comparison creates the same projection without VOPS types create table lineitem_projection as (select l_shipdate,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag::"char",l_linestatus::"char" from lineitem); -- Let's measure time \timing -- Original Q6 query performing filtering with calculation of grand aggregate select sum(l_extendedprice*l_discount) as revenue from lineitem where l_shipdate between '1996-01-01' and '1997-01-01' and l_discount between 0.08 and 0.1 and l_quantity < 24; -- VOPS version of Q6 select sum(l_extendedprice*l_discount) as revenue from vops_lineitem where filter(betwixt(l_shipdate, '1996-01-01', '1997-01-01') & betwixt(l_discount, 0.08, 0.1) & (l_quantity < 24)); -- Original version of Q1: filtering + grouping + aggregation select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice*(1-l_discount)) as sum_disc_price, sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= '1998-12-01' group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus; -- VOPS version of Q1, sorry - no final sorting select reduce(map(l_returnflag||l_linestatus, 'sum,sum,sum,sum,avg,avg,avg', l_quantity, l_extendedprice, l_extendedprice*(1-l_discount), l_extendedprice*(1-l_discount)*(1+l_tax), l_quantity, l_extendedprice, l_discount)) from vops_lineitem where filter(l_shipdate <= '1998-12-01'::date);
Now most interesting thing: compare performance results on original table and using vector operations on VOPS projection. All measurements were performed at desktop with 16Gb of RAM and quad-core i7-4770 CPU @ 3.40GHz processor with enabled hyper-threading. Data set for benchmark was generated by dbgen utility included in TPC-H benchmark. Scale factor is 100 which corresponds to about 8Gb database. It can completely fit in memory, so we are measuring best query execution time for warm data. Postgres was configured with shared buffer size equal to 8Gb. For each query we measured time of sequential and parallel execution with 8 parallel workers.
Query | Sequential execution (msec) | Parallel execution (msec) |
---|---|---|
Original Q1 for lineitem | 38028 | 10997 |
Original Q1 for lineitem_projection | 33872 | 9656 |
Vectorized Q1 for vops_lineitem | 3372 | 951 |
Original Q6 for lineitem | 16796 | 4110 |
Original Q6 for lineitem_projection | 10896 | 2679 |
Vectorized Q6 for vops_lineitem | 875 | 284 |
As you can see in performance results, VOPS can provide more than 10 times improvement of query speed. And this result is achieved without changing something in query planner and executor. It is better than any of existed attempt to speed up execution of OLAP queries using JIT (4 times for Q1, 3 times for Q6): Speeding up query execution in PostgreSQL using LLVM JIT compiler.
Definitely VOPS extension is just a prototype which main role is to demonstrate potential of vectorized executor. But I hope that it also can be useful in practice to speedup execution of OLAP aggregation queries for existed databases. And in future we should think about the best approach of integrating vectorized executor in Postgres core.
ALL sources of VOPS project can be obtained from this GIT repository. Please send any feedbacks, complaints, bug reports, change requests to Konstantin Knizhnik.