Small Datum: Updated Insert benchmark: Postgres 9.x to 16.x, small server, cached database, v3

I now have 4 server types at home (8 cores + 16G RAM, 8 cores + 32G RAM, 24 cores, 32 cores) and am trying to finish a round of the Insert Benchmark for each. This has results for the smallest (8 cores + 16G RAM) using a cached workload and Postgres.

In previous blog posts I claimed that there are large regressions from old to new MySQL but not from old to new Postgres. And I shared results for MySQL 5.6, 5.7 and 8.0 along with Postgres versions 10 through 16. A comment about these results is the comparison was unfair because the first GA MySQL 5.6 release is 5.6.10 from 2013 while the first Postgres 10 GA release is 10.0 from 2017.

Here I have results going back to Postgres 9.0.23 and the first 9.0 release is 9.0.0 from 2010.

tl;dr

the song remains the same: MySQL has large regressions over time while Postgres avoids them
comparing Postgres 16.1 with Postgres 9.0.23

for write-heavy benchmark steps PG 16.1 gets between 1.2X and 2.8X more throughput
for range queries PG 16.1 gets ~1.2X more throughput
for point queries PG 16.1 gets ~1.1X more throughput

Build + Configuration

See the previous report for more details. I used these versions: 9.0.23, 9.1.24, 9.2.24, 9.3.25, 9.4.26, 9.5.25, 9.6.24, 10.23, 11.22, 12.17, 13.13, 14.10, 15.5, 16.1.

The configuration files are in subdirectories from here. Search for files named conf.diff.cx9a2_bee which exist for each major version of Postgres.

The Benchmark

The benchmark is explained here except the first benchmark step, l.i0, loads 30M rows/table here while previously it only loaded 20M. The database still fits in memory as the test server has 16G of RAM and the database tables are ~8G. The benchmark is run with 1 client.

The test server was named SER4 in the previous report. It has 8 cores, 16G RAM, Ubuntu 22.04 and XFS using 1 m.2 device.

The benchmark steps are:

l.i0

insert 30 million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.

create 3 secondary indexes per table. There is one connection per client.

l.i1

use 2 connections/client. One inserts 40M rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.

l.i2

like l.i1 but each transaction modifies 5 rows (small transactions) and 10M rows total
Wait for X seconds after the step finishes to reduce variance during the read-write benchmark steps that follow.

qr100

use 3 connections/client. One does range queries for 1800 seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.

qp100

like qr100 except uses point queries on the PK index

qr500

like qr100 but the insert and delete rates are increased from 100/s to 500/s

qp500

like qp100 but the insert and delete rates are increased from 100/s to 500/s

qr1000

like qr100 but the insert and delete rates are increased from 100/s to 1000/s

qp1000

like qp100 but the insert and delete rates are increased from 100/s to 1000/s

Results

The performance report is here.

The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.

Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures:

insert/s for l.i0, l.i1, l.i2
indexed rows/s for l.x
range queries/s for qr100, qr500, qr1000
point queries/s for qp100, qp500, qp1000

Below I use colors to highlight the relative QPS values with red for <= 0.95, green for >= 1.05 and grey for values between 0.95 and 1.05.

From the summary:

The base case is pg9023_def which means Postgres 9.0.23
For most of the read-write benchmark steps throughput improves a lot from 9.1.24 to 9.2.24 and has been stable since then. The exception is the last step (qp1000) for which throughput is flat. It might be that writeback and/or vacuum hurts query throughput by that point.
For the write-heavy steps (l.i0, l.x, l.i1, l.i2) throughput improves a lot

l.i0 - things get a lot better in Postgres 11.22
l.x - things get a lot better in Postgres 9.6.24
l.i1 - things get a lot better in Postgres 9.5.25 and then again in 12.17
l.i2 - improvements are similar to l.i1 but not as good because of the query planner overhead during DELETE statements (see the comments about get_actual_variable_range)

Comparing throughput in Postgres 16.1 to 9.0.23

Write-heavy

l.i0, l.x, l.i1, l.i2 - relative QPS is 1.23, 1.81, 2.82, 2.69

Range queries

qr100, qr500, qr1000 - relative QPS is 1.20, 1.24, 1.25

Point queries

qp100, qp500, qp1000 - relative QPS is 1.10, 1.09, 1.00

Small Datum

Wednesday, January 24, 2024

Updated Insert benchmark: Postgres 9.x to 16.x, small server, cached database, v3

No comments:

Post a Comment

How efficient is RocksDB for IO-bound, point-query workloads?