Postgres has done a great job at avoiding performance regressions over time. It has results from sysbench and a large server for all point releases from Postgres 17.x and then the latest point release from Postgres 10 through 16.
This work was done by Small Datum LLC and not sponsored.
tl;dr - over time ...
- For the 27 microbenchmarks there is one regression that arrives in 11.x and remains through 17.x
- Postgres has many big improvements
Updates:
- I am still trying to explain the regression but the problem isn't obvious.
Builds, configuration and hardware
I compiled Postgres from source using -O2 -fno-omit-frame-pointer for versions 10.23, 11.22, 12.22, 13.20, 14.17, 15.12, 16.8, 17.0, 17.1, 17.2, 17.3 and 17.4.
The server is an ax162-s from Hetzner with 48 cores (AMD EPYC 9454P), 128G RAM and AMD SMT disabled. It uses Ubuntu 22.04 and storage is ext4 with SW RAID 1 over 2 locally attached NVMe devices. More details on it are here. At list prices a similar server from Google Cloud costs 10X more than from Hetzner.
The configuration files for the large server are in the pg* subdirectories here with the name conf.diff.cx10a_c32r128.
Benchmark
I used sysbench and my usage is explained here. To save time I only run 27 of the 42 microbenchmarks and most test only 1 type of SQL statement. Benchmarks are run with the database cached by Postgres.
The tests run with 8 tables and 10M rows/table. There are 40 client threads, read-heavy microbenchmarks run for 180 seconds and write-heavy run for 300 seconds.
The tests run with 8 tables and 10M rows/table. There are 40 client threads, read-heavy microbenchmarks run for 180 seconds and write-heavy run for 300 seconds.
The command line to run all tests is: bash r.sh 8 10000000 180 300 md2 1 1 40
Results
For the results below I split the microbenchmarks into 4 groups -- 1 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. The spreadsheet with all data is here.
Values from iostat and vmstat divided by QPS are here. This can help to explain why something is faster or slower because it shows how much HW is used per request.
The relative QPS is the following where $version is >= 11.22.
(QPS for $version) / (QPS for Postgres 10.23)
The numbers in the spreadsheets are the relative QPS. When the relative QPS is > 1 then $version is faster than Postgres 10.23. When it is 3.0 then $version is 3X faster than the base case.
Results: charts
Notes on the charts
- the y-axis shows the relative QPS
- the y-axis starts at 0.80 to make it easier to see differences
- in some cases the y-axis truncates the good outliers, cases where the relative QPS is greater than 1.5. I do this to improve readability for values near 1.0. Regardless, the improvements are nice.
Point queries
- performance is stable over time in most cases
- performance gets much better for hot-points with Postgres 17.0
Range queries, part 1
- performance is stable over time with small improvements for most tests
- performance gets ~1.2X better over time on the scan test
Range queries, part 2
- these do a short range scan with aggregation and the =10, =100 and =10000 is the number of rows scanned per query
- performance is stable over time on the tests that do shorter range scans (=100, =10)
- performance drops by ~10% starting in 11.22 on the test with a longer range scan (=10000)
Writes
- depending on the test, performance over time is either stable, has a small improvement or has a large improvement
A possible regression!
I have yet to explain this.
For the read-only_range=10000 tests where QPS drops by ~10% starting in 11.22 the problem is that Postgres uses more CPU per query starting in 11.22 (see the cpu/o column which stands for CPU per operation or CPU per query).
Metrics per test from vmstat and iostat are here and I highlighted results for the read-only_range=X tests that do a range scan with aggregation.
Things I have checked for read-only_range=10000
For the read-only_range=10000 tests where QPS drops by ~10% starting in 11.22 the problem is that Postgres uses more CPU per query starting in 11.22 (see the cpu/o column which stands for CPU per operation or CPU per query).
Metrics per test from vmstat and iostat are here and I highlighted results for the read-only_range=X tests that do a range scan with aggregation.
Things I have checked for read-only_range=10000
- output from ps for 10.23, 11.22 and 12.22 looks similar
- output from vmstat for 10.23, 11.22 and 12.22 looks similar with a few small differences. This output is from the middle of the microbenchmark.
- both swpd and free are larger in 11.22 and 12.22
- both buff and cache are larger in 10.23
- for 10.23, 11.22 and 12.22 those values don't change during the microbenchmark
- Flamegraphs for 10.23, 11.22 and 12.22 are here and look similar. The numbered (*.1.svg, *.2.svg) files are taken in sequence and then all are combined to create the *.all.svg file.