Small Datum: Sysbench: PostgreSQL 12, 13 and 14 on a small server

This has results for sysbench with PostgreSQL versions 12.4, 13.4 and 14.0 using the same setup as previously used for the insert benchmark.

Executive summary:

Postgres continues to be boring. There are a few regressions and more improvements.
There were ~40 tests and each ran for 5 minutes. Running each test for such a short time means there is more chance for confusing results. Alas, running each test for 30 minutes or more would have taken too long because I repeated the benchmark many times for different DBMS, configurations and database sizes. I started another round just for the Postgres IO-bound configuration that will take about 3 days to get results.

Details

See the insert benchmark post for more details on the hardware and Postgres configurations. As always, there are layers upon layers of shell scripts. The file all_small.sh lists the sequence in which the test steps are run and each step is run for 300 seconds with 1 connection (low concurrency!). From the all_small.sh script there were 42 different invocations of sysbench, each one is a microbenchmark. There should have been 43 but a typo got in the way. From 42 X 5 minutes for the tests plus more time for the load it takes 4 to 8 hours for all_small.sh to get results for each DBMS + configuration.

Three configurations were tested: 10M rows without prepared statements, 10M rows with prepared statements, 400M rows without prepared statements. The 10M row tests are CPU-bound as the database fits in memory. The 400M row test is usually IO-bound as the database is larger than memory. Below I call the test configurations 10m.prep0, 10m.prep1 and 400m.prep0. The database sizes are 2.6G at 10M rows and 99G at 400M rows after the initial load.

All of the shell scripts are here and the tests were run using a script like this.

I use my fork of sysbench that includes a few more tests (Lua scripts). The Lua scripts are here. My fork might be a few years behind upstream.

How to read the results

The presentation for these results is just text files pasted into gists. There are two types of files: qps and metrics.

The qps file has the relative QPS for each test where the relative QPS is the ratio: QPS-for-me / QPS-for-base.

QPS-for-me is the QPS for one of Postgres 12.4, 13.4 or 14.0
QPS-for-base is the QPS for Postgres 12.4.
Thus each line in the qps file has three numeric columns and the first one has the value 1.00 (QPS- for-12.4 / QPS-for-12.4). The value in the second column is the QPS for 13.4 relative to 12.4, and in the third is the QPS for 14.0 relative to 12.4. When the value in the second or third column is less than 1.0 then that version is slower than 12.4.

The metrics file has absolute and relative (to results for 12.4) values for each of the 42 test steps. This file is long so I start with the qps file and then consult the metrics file to understand why one result is better than another.

cpu/o - CPU per operation, measured by iostat. While there is a unit for this, I don't worry about that as it is most useful for comparing numbers between different test configurations so the units drop out.
r/o - storage reads per operation, measured by iostat
rKB/o - KB read from storage per operation, measured by iostat
wKB/o - KB written to storage per operation, measured by iostat
o/s - operations/second (QPS, inserts/s, etc). I tend to use QPS below for everything.
dbms - the Postgres version and configuration file

Results

Here are the qps and metrics files for the 10m.prep0 (CPU-bound, no prepared statements):

scan.range.pk1 has a regression in 13.4 that was mostly fixed in 14.0. The QPS relative to 12.4 is 0.82 for 13.4 and 0.96 for 14.0. From the metrics file there is a 30% increase in CPU overhead (see the cpu/o column) in 13.4. The scan test does a full scan of the test table using a WHERE clause that filters all rows: SELECT * from %s WHERE LENGTH(c) < 0.

Here are the qps and metrics files for the 10m.prep1 (CPU-bound, prepared statements):

Results are similar to 10m.prep0, the only regression is for scan.range.pk1 for 13.4.

Here are the qps and metrics files for the 400m.prep0 (IO-bound, no prepared statements):

scan.range.pk1 has a regression from CPU overhead for 13.4
read-write.range10.pk has a ~10% regression in 13.4 and 14.0. The biggest difference from the metrics file is for wKB/o. Perhaps this test needs to run for more time to get a better signal.
Five of the tests improve by ~10% or more in 13.4 and 14.0. I don't count the improvement for random-points.pre.range1000.pk1 because the QPS is too small (5 & 6) and rounding might explain the difference. Reduced CPU/query doesn't explain all of the improvements as in many cases there is also less IO/query but my Postgres expertise isn't strong enough to have a good guess. There have been improvements to vacuum and b-tree indexes that are likely part of the reason.

Updates

I repeated the IO-bound tests with more time per microbenchmark, 1800 seconds rather than 300. Here are the qps and metrics files. For 14.0, there were 3 tests for which perf improved more than ~5% and 5 for which it got worse by more than ~5%.

This lists microbenchmarks for which the QPS ratio is > 1.05 for Postgres 13.4 or 14.0 relative to 12.4. I picked 1.05 as a cutoff. There aren't any huge speedups, but it is nice to see some improvements.

12.4 13.4 14.0

1.00 1.12 1.13 point-query.warm.range100.pk1

1.00 1.04 1.07 points-notcovered-si.pre.range100.pk1

1.00 1.02 1.07 read-only.range10.pk1

1.00 1.05 1.07 points-notcovered-si.range100.pk1

And this lists microbenchmarks for which the QPS ratio is < =.95 for Postgres 13.4 or 14.0 relative to 12.4. From the metrics file, new CPU overhead isn't the root cause for 14.0.

12.4 13.4 14.0

1.00 0.98 0.94 update-inlist.range100.pk1

1.00 0.84 0.93 scan.range100.pk1

1.00 1.04 0.93 insert.range100.pk1

Small Datum

Wednesday, December 1, 2021

Sysbench: PostgreSQL 12, 13 and 14 on a small server

No comments:

Post a Comment

Postgres 18 beta2: large server, Insert Benchmark, part 2