This has benchmark results for Postgres 12 through 17beta3 using the Insert Benchmark and a medium server. By small, medium or large server I mean < 10 cores for small, 10 to 19 cores for medium, 20+ cores for large. A recent result up to Postgres 17 beta2 from the same server is here.
This work was done by Small Datum LLC.
tl;dr
- 17beta3 looks (mostly) good
- There might be regressions in 17 beta1, beta2 and beta3 on the l.i1 and l.i2 benchmark steps related to get_actual_variable_range
- cached - database fits in the Postgres buffer pool
- IO-bound - database is larger than memory and there are many reads from disk
- l.i0
- insert X million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client. The value of X is 10 for cached and 128 for IO-bound.
- l.x
- create 3 secondary indexes per table. There is one connection per client.
- l.i1
- use 2 connections/client. One inserts X rows per table and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate. The value of X is 40M for cached and 4M for IO-bound.
- l.i2
- like l.i1 but each transaction modifies 5 rows (small transactions) and X rows are inserted and deleted per table. The value of X is 10M for cached and 1M for IO-bound.
- Wait for X seconds after the step finishes to reduce variance during the read-write benchmark steps that follow. The value of X is a function of the table size.
- qr100
- use 3 connections/client. One does range queries and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for 1800 seconds. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.
- qp100
- like qr100 except uses point queries on the PK index
- qr500
- like qr100 but the insert and delete rates are increased from 100/s to 500/s
- qp500
- like qp100 but the insert and delete rates are increased from 100/s to 500/s
- qr1000
- like qr100 but the insert and delete rates are increased from 100/s to 1000/s
- qp1000
- like qp100 but the insert and delete rates are increased from 100/s to 1000/s
- insert/s for l.i0, l.i1, l.i2
- indexed rows/s for l.x
- range queries/s for qr100, qr500, qr1000
- point queries/s for qp100, qp500, qp1000
- Results for 17beta3 are similar to 16.4 with one exception. Results for the l.i2 benchmark step in 17 beta1, beta2 and beta3 are between 5% and 8% worse than in 16.4. I assume this is another problem related to get_actual_variable_range.
- This is confusing because 17 beta3 does better than the base case on l.i1 and the l.i1 workload is similar to l.i2 except there are more rows modified per transaction (so the optimizer overhead is amortized over more work).
- For Postgres 17beta I see a slight increase in CPU per operation (cpupq) and a slight reduction in context switches per operation (cspq) in the metrics section relative to Postgres 16.4.
- l.i0
- relative QPS is 1.02 in PG 16.4
- relative QPS is 0.98 in PG 17 beta3
- l.x - I ignore this for now
- l.i1, l.i2
- relative QPS is 1.06, 1.00 in PG 16.4
- relative QPS is 1.09, 0.92 in PG 17 beta3
- qr100, qr500, qr1000
- relative QPS is 1.03, 1.04, 1.04 in PG 16.4
- relative QPS is 1.04, 1.05, 1.08 in PG 17 beta3
- qp100, qp500, qp1000
- relative QPS is 0.99, 0.99, 0.99 in PG 16.4
- relative QPS is 0.98, 0.98, 0.98 in PG 17 beta3
- Results for 17beta3 are similar to 16.4 with one exception. Results for the l.i1 and l.i2 benchmark steps in 17 beta1, beta2 and beta3 are mostly much worse. I assume this is another problem related to get_actual_variable_range.
- This is confusing because 17 beta3 does better than the base case on l.i1 and the l.i1 workload is similar to l.i2 except there are more rows modified per transaction (so the optimizer overhead is amortized over more work).
- For Postgres 17beta I see a an increase in CPU per operation (cpupq) in the metrics section relative to Postgres 16.4.
- l.i0
- relative QPS is 1.00 in PG 16.4
- relative QPS is 0.96 in PG 17 beta3
- l.x - I ignore this for now
- l.i1, l.i2
- relative QPS is 1.01, 1.45 in PG 16.4
- relative QPS is 0.88, 1.01 in PG 17 beta3
- qr100, qr500, qr1000
- relative QPS is 0.98, 0.98, 0.98 in PG 16.4
- relative QPS is 1.00, 1.00, 0.99 in PG 17 beta3
- qp100, qp500, qp1000
- relative QPS is 1.01, 1.01, 1.01 in PG 16.4
- relative QPS is 1.01, 1.01, 1.01 in PG 17 beta3