Small Datum: Postgres 16beta1 looks good vs sysbench

This has results for Postgres 16 beta1 using sysbench on a small server with low-concurrency and a cached database. The goal is to determine whether there are CPU regressions from Postgres 15.3 to 16-beta1.

tl;dr

There were no regressions
A few queries are ~1.7X faster in PG 16-beta1

Benchmark

A description of how I run sysbench is here. The sysbench microbenchmarks were run for 20 clients, 300 (600) seconds per read (write) microbenchmark using 1 tables with 20M rows per table. The test database was <= 6G and fits in Postgres buffer pool.

The test server is a Beelink SER 4700u with 8 cores, 16G of RAM, 1T of NVMe SSD with XFS running Ubuntu 22.04. I compiled Postgres from source using gcc 11.3.

The config file, conf.diff.cx8_bee, is here for Postgres 15.3 and 16-beta1.

The previous post explains the builds that I used where each build uses different compiler optimizations. I used the def, o2_nofp and o3_native_lto builds and ran the benchmark 6 times -- once per build for each of Postgres 15.3 and 16-beta1.

I use sysbench to run 42 microbenchmarks and each microbenchmark is put in one of three groups based on the dominant operation: point query, range query, writes.

Results

The table below lists the relative throughput which is: (QPS for PG 16-beta1 / QPS for PG 15.3). When the relative throughput is less than 1 then PG 16-beta1 is slower than PG 15.3 and there might be a CPU regression.

With a few exceptions the relative throughput is between 0.97 and 1.04. I consider values between 0.97 and 1.0 to be noise rather than a regression so I will declare there are no regressions. The exceptions are:

point-query.pre_range=100 which has a noisy relative throughput of 0.89, 0.97 and 1.07. This benchmark step runs early in the benchmark and I assume this is just noise.
scan_range=100 which has a noisy relative throughput of 1.00, 0.92, 0.98. I have yet to explain it but the scan benchmark step seems to have more noise so I ignore this.
read-only.pre_range=10000 which has a relative throughput of 1.60, 1.46, 1.63
read-only.range=10000 which has a a relative throughput of 1.50, 1.39, 1.50

The results show that PG 16-beta1 does ~1.5X better for the read-only.*range=10000 benchmark step which is a big improvement. That is discussed in the Follow Up section below.

Legend:

* def - uses the def build

* o2 - uses the o2_nofp build

* o3 - uses the o3_native_lto build

------ build ------ --- benchmark step ---

def o2 o3

1.00 0.97 1.01 hot-points_range=100

0.89 0.97 1.07 point-query.pre_range=100

1.00 1.02 1.01 point-query_range=100

1.00 1.03 1.03 points-covered-pk.pre_range=100

0.99 1.04 1.03 points-covered-pk_range=100

1.00 1.03 1.04 points-covered-si.pre_range=100

0.99 1.01 1.03 points-covered-si_range=100

1.00 1.00 1.01 points-notcovered-pk.pre_range=100

1.00 1.00 1.02 points-notcovered-pk_range=100

1.00 1.00 1.02 points-notcovered-si.pre_range=100

1.00 1.00 1.01 points-notcovered-si_range=100

0.99 0.99 1.00 random-points.pre_range=1000

1.00 1.00 0.99 random-points.pre_range=100

1.01 0.99 1.02 random-points.pre_range=10

0.99 1.00 1.01 random-points_range=1000

1.00 1.00 1.01 random-points_range=100

1.02 0.99 1.02 random-points_range=10

0.99 0.98 1.01 range-covered-pk.pre_range=100

1.00 0.99 1.01 range-covered-pk_range=100

0.98 0.98 1.01 range-covered-si.pre_range=100

1.00 0.97 1.01 range-covered-si_range=100

0.97 0.98 0.99 range-notcovered-pk.pre_range=100

0.98 0.97 1.00 range-notcovered-pk_range=100

0.98 1.00 1.00 range-notcovered-si.pre_range=100

0.98 0.99 1.00 range-notcovered-si_range=100

1.60 1.46 1.63 read-only.pre_range=10000

1.00 0.99 1.04 read-only.pre_range=100

1.01 0.98 0.99 read-only.pre_range=10

1.50 1.39 1.50 read-only_range=10000

1.01 0.99 1.02 read-only_range=100

1.00 1.00 0.97 read-only_range=10

1.00 0.92 0.98 scan_range=100

1.02 1.01 1.04 delete_range=100

1.01 1.02 0.99 insert_range=100

1.02 1.00 1.03 read-write_range=100

1.00 0.98 0.98 read-write_range=10

1.00 0.99 0.98 update-index_range=100

1.00 0.98 1.00 update-inlist_range=100

1.01 1.00 0.98 update-nonindex_range=100

1.01 0.97 0.98 update-one_range=100

1.02 0.99 0.97 update-zipf_range=100

1.01 1.02 1.00 write-only_range=10000

Follow up

The read-only.pre_range=10000 and read-only.range=10000 benchmark steps run a read-only transaction that has 5 types of queries. They use the same queries but read-only.pre_range=10000 runs before a large number of write-heavy benchmark steps while read-only.range=10000 runs after them. The range=10000 in the name indicates that the range queries in the read-only transaction each scan 10,000 rows.

The Lua for the benchmark step is oltp_read_only.lua which relies on oltp_common.lua. The loop that does the queries is here.

There are five types of queries in the read-only*range=10000 benchmark step. The first (q1) is run 10 times per read-only transaction while the others are run once. For q2, q3, q4, q5 the value of N is 10,000. The read-only transaction is run in a loop and sysbench reports the number of transactions/second.

fetch one row by PK
SELECT c FROM sbtest WHERE id=?

scan N rows via PK index, return 1 column/row
SELECT c FROM sbtest WHERE id BETWEEN ? AND ?

scan N rows via an index, return sum of one column
SELECT SUM(k) FROM sbtest WHERE id BETWEEN ? AND ?"

like q2 but sorts the result
SELECT c FROM sbtest WHERE id BETWEEN ? AND ? ORDER BY c

like q4 but adds DISINCT
SELECT DISTINCT c FROM sbtest WHERE id BETWEEN ? AND ? ORDER BY c

To explain the perf improvement from PG 15.3 to 16-beta1 I tried the following:

Looked at query plans - they did not change. Plans are here.
Ran each of the queries separately

performance for q1, q2 and q3 did not change
q4 is ~1.7X faster in PG 16-beta1
q5 is ~1.5X faster in PG 16-beta1

Started to look at CPU profiles from the perf tool. Unfortunately, my work on this was incomplete so I might revisit this in a few days and for now will mumble something about PG 16 using less CPU thanks to ICU and less time doing sorts.

Small Datum

Saturday, May 27, 2023

Postgres 16beta1 looks good vs sysbench

No comments:

Post a Comment

Postgres 18 beta2: large server, Insert Benchmark, part 2