Saturday, November 29, 2025

Using sysbench to measure how Postgres performance changes over time, November 2025 edition

This has results for the sysbench benchmark on a small and big server for Postgres versions 12 through 18. Once again, Postgres is boring because I search for perf regressions and can't find any here. Results from MySQL are here and MySQL is not boring.

While I don't show the results here, I don't see regressions when comparing the latest point releases with their predecessors -- 13.22 vs 13.23, 14.19 vs 14.20, 15.14 vs 15.15, 16.10 vs 16.11, 17.6 vs 17.7 and 18.0 vs 18.1.

tl;dr

  • a few small regressions
  • many more small improvements
  • for write-heavy tests at high-concurrency there are many large improvements starting in PG 17

Builds, configuration and hardware

I compiled Postgres from source for versions 12.22, 13.22, 13.23, 14.19, 14.20, 15.14, 15.15, 16.10, 16.11, 17.6, 17.7, 18.0 and 18.1.

I used two servers:
  • small
    • an ASUS ExpertCenter PN53 with AMD Ryzen 7735HS CPU, 32G of RAM, 8 cores with AMD SMT disabled, Ubuntu 24.04 and an NVMe device with ext4 and discard enabled.
  • big
    • an ax162s from Hetzner with an AMD EPYC 9454P 48-Core Processor with SMT disabled
    • 2 Intel D7-P5520 NVMe storage devices with RAID 1 (3.8T each) using ext4
    • 128G RAM
    • Ubuntu 22.04 running the non-HWE kernel (5.5.0-118-generic)
Configuration files for the small server
  • Configuration files are here for Postgres versions 1213141516 and 17.
  • For Postgres 18 I used io_method=sync and the configuration file is here.
Configuration files for the big server
  • Configuration files are here for Postgres versions 1213141516 and 17.
  • For Postgres 18 I used io_method=sync and the configuration file is here.
Benchmark

I used sysbench and my usage is explained here. I now run 32 of the 42 microbenchmarks listed in that blog post. Most test only one type of SQL statement. Benchmarks are run with the database cached by InnoDB.

The read-heavy microbenchmarks are run for 600 seconds and the write-heavy for 900 seconds. On the small server the benchmark is run with 1 client and 1 table with 50M rows. On the big server the benchmark is run with 12 clients and 8 tables with 10M rows per table. 

The purpose is to search for regressions from new CPU overhead and mutex contention. I use the small server with low concurrency to find regressions from new CPU overheads and then larger servers with high concurrency to find regressions from new CPU overheads and mutex contention.

Results

The microbenchmarks are split into 4 groups -- 1 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. 

I provide charts below with relative QPS. The relative QPS is the following:
(QPS for some version) / (QPS for Postgres 12.22)
When the relative QPS is > 1 then some version is faster than Postgres 12.12.  When it is < 1 then there might be a regression. When the relative QPS is 1.2 then some version is about 20% faster than Postgres 12.22.

Values from iostat and vmstat divided by QPS are here for the small server and the big serverThese can help to explain why something is faster or slower because it shows how much HW is used per request, including CPU overhead per operation (cpu/o) and context switches per operation (cs/o) which are often a proxy for mutex contention.

The spreadsheet and charts are here and in some cases are easier to read than the charts below. Converting the Google Sheets charts to PNG files does the wrong thing for some of the test names listed at the bottom of the charts below.

Results: point queries

This is from the small server.
  • a large improvement arrived in Postgres 17 for the hot-points test
  • otherwise results have been stable from 12.22 through 18.1
This is from the big server.
  • a large improvement arrived in Postgres 17 for the hot-points test
  • otherwise results have been stable from 12.22 through 18.1
Results: range queries without aggregation

This is from the small server.
  • there are small improvements for the scan test
  • otherwise results have been stable from 12.22 through 18.1
This is from the big server.
  • there are small improvements for the scan test
  • otherwise results have been stable from 12.22 through 18.1
Results: range queries with aggregation

This is from the small server.
  • there are small improvements for a few tests
  • otherwise results have been stable from 12.22 through 18.1
This is from the big server.
  • there might be small regressions for a few tests
  • otherwise results have been stable from 12.22 through 18.1
Results: writes

This is from the small server.
  • there are small improvements for most tests
  • otherwise results have been stable from 12.22 through 18.1
This is from the big server.
  • there are large improvements for half of the tests
  • otherwise results have been stable from 12.22 through 18.1
From vmstat results for update-index the per-operation CPU overhead and context switch rate are much smaller starting in Postgres 17.7. The CPU overhead is about 70% of what it was in 16.11 and the context switch rate is about 50% of the rate for 16.11. Note that context switch rates are often a proxy for mutex contention.

Friday, November 28, 2025

Using sysbench to measure how MySQL performance changes over time, November 2025 edition

This has results for the sysbench benchmark on a small and big server for MySQL versions 5.6 through 9.5. The good news is that the arrival rate of performance regressions has mostly stopped as of 8.0.43. The bad news is that there were large regressions from 5.6 through 8.0.

tl;dr for low-concurrency tests

  • for point queries
    • MySQL 5.7.44 gets about 10% less QPS than 5.6.51
    • MySQL 8.0 through 9.5 get about 30% less QPS than 5.6.51
  • for range queries without aggregation
    • MySQL 5.7.44 gets about 15% less QPS than 5.6.51
    • MySQL 8.0 through 9.5 get about 30% less QPS than 5.6.51
  • for range queries with aggregation
    • MySQL 5.7.44 is faster than 5.6.51 for two tests, as fast for one and gets about 15% less QPS for the other five
    • MySQL 8.0 to 9.5 are faster than 5.6.51 for one test, as fast for one and get about 30% less QPS for the other six
  • for writes
    • MySQL 5.7.44 gets between 10% and 20% less QPS than 5.6.51 for most tests
    • MySQL 8.0 to 9.5 get between 40% to 50% less QPS than 5.6.51 for most tests
tl;dr for high-concurrency tests
  • for point queries
    • for most tests MySQL 5.7 to 9.5 get at least 1.5X more QPS than 5.6.51
    • for tests that use secondary indexes MySQL 5.7 to 9.5 get about 25% less QPS than 5.6.51
  • for range queries without aggregation
    • MySQL 5.7.44 gets about 10% less QPS than 5.6.51
    • MySQL 8.0 through 9.5 get about 30% less QPS than 5.6.51
  • for range queries with aggregation
    • MySQL 5.7.44 is faster than 5.6.51 for six tests, as fast for one test and gets about 20% less QPS for one test
    • MySQL 8.0 to 9.5 are a lot faster than 5.6.51 for two tests, about as fast for three tests and gets between 10% and 30% less QPS for the other three tests
  • for writes
    • MySQL 5.7.44 gets more QPS than 5.6.51 for all tests
    • MySQL 8.0 to 9.5 get more QPS than 5.6.51 for all tests

Builds, configuration and hardware

I compiled MySQL from source for versions 5.6.51, 5.7.44, 8.0.43, 8.0.44, 8.4.6, 8.4.7, 9.4.0 and 9.5.0.

I used two servers:
  • small
    • an ASUS ExpertCenter PN53 with AMD Ryzen 7735HS CPU, 32G of RAM, 8 cores with AMD SMT disabled, Ubuntu 24.04 and an NVMe device with ext4 and discard enabled.
  • big
    • an ax162s from Hetzner with an AMD EPYC 9454P 48-Core Processor with SMT disabled
    • 2 Intel D7-P5520 NVMe storage devices with RAID 1 (3.8T each) using ext4
    • 128G RAM
    • Ubuntu 22.04 running the non-HWE kernel (5.5.0-118-generic)
The config files are here:
Benchmark

I used sysbench and my usage is explained here. I now run 32 of the 42 microbenchmarks listed in that blog post. Most test only one type of SQL statement. Benchmarks are run with the database cached by InnoDB.

The read-heavy microbenchmarks are run for 600 seconds and the write-heavy for 900 seconds. On the small server the benchmark is run with 1 client and 1 table with 50M rows. On the big server the benchmark is run with 40 clients and 8 tables with 10M rows per table. 

The purpose is to search for regressions from new CPU overhead and mutex contention. I use the small server with low concurrency to find regressions from new CPU overheads and then larger servers with high concurrency to find regressions from new CPU overheads and mutex contention.

Results

The microbenchmarks are split into 4 groups -- 1 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. 

I provide charts below with relative QPS. The relative QPS is the following:
(QPS for some version) / (QPS for MySQL 5.6.51)
When the relative QPS is > 1 then some version is faster than MySQL 5.6.51.  When it is < 1 then there might be a regression. When the relative QPS is 1.2 then some version is about 20% faster than MySQL 5.6.51.

Values from iostat and vmstat divided by QPS are here for the small server and the big serverThese can help to explain why something is faster or slower because it shows how much HW is used per request, including CPU overhead per operation (cpu/o) and context switches per operation (cs/o) which are often a proxy for mutex contention.

The spreadsheet and charts are here and in some cases are easier to read than the charts below. Converting the Google Sheets charts to PNG files does the wrong thing for some of the test names listed at the bottom of the charts below.

Results: point queries

This is from the small server.
  • MySQL 5.7.44 gets about 10% less QPS than 5.6.51
  • MySQL 8.0 through 9.5 get about 30% less QPS than 5.6.51
  • There are few regressions after MySQL 8.0
  • New CPU overheads explain the regressions. See the vmstat results for the hot-points test.
This is from the large server.
  • For most point query tests MySQL 5.7 to 9.5 get at least 1.5X more QPS than 5.6.51
    • MySQL 5.7 to 9.5 use less CPU, see vmstat results for the hot-points test.
  • For tests that use secondary indexes (*-si) MySQL 5.7 to 9.5 get about 25% less QPS than 5.6.51.
    • This result is similar to what happens on the small server above.
    • The regressions are from extra CPU overhead, see vmstat results
  • MySQL 5.7 does better than 8.0 to 9.5. There are few regressions after MySQL 8.0.
Results: range queries without aggregation

This is from the small server.
  • MySQL 5.7.44 gets about 15% less QPS than 5.6.51
  • MySQL 8.0 through 9.5 get about 30% less QPS than 5.6.51
  • There are few regressions after MySQL 8.0
  • New CPU overheads explain the regressions. See the vmstat results for the scan test.
This is from the large server.
  • MySQL 5.7.44 gets about 10% less QPS than 5.6.51
  • MySQL 8.0 through 9.5 get about 30% less QPS than 5.6.51
  • There are few regressions after MySQL 8.0
  • New CPU overheads explain the regressions. See the vmstat results for the scan test.
Results: range queries with aggregation

This is from the small server.
  • for the read-only-distinct test, MySQL 5.7 to 9.5 are faster than 5.6.51
  • for the read-only_range=X tests
    • with the longest range scan (*_range=10000), MySQL 5.7.44 is faster than 5.6.51 and 8.0 to 9.5 have the same QPS as 5.6.51
    • with shorter range scans (*_range=100 & *_range=10) MySQL 5.6.51 is faster than 5.7 to 9.5. This implies that the regressions are from code above the storage engine layer.
    • From vmstat results the perf differences are explained by CPU overheads
  • for the other tests
    • MySQL 5.7.44 gets about 15% less QPS than 5.6.51
    • MySQL 8.0 to 9.5 get about 30% less QPS than 5.6.51
    • From vmstat results for read-only-count the reason is new CPU overhead
This is from the large server.
  • for the read-only-distinct test, MySQL 5.7 to 9.5 are faster than 5.6.51
  • for the read-only_range=X tests
    • MySQL 5.7.44 is as fast as 5.6.51 for the longest range scan and faster than 5.6.51 for the shorter range scans
    • MySQL 8.0 to 9.5 are much faster than 5.6.51 for the longest range scan and somewhat faster for the shorter range scans
    • From vmstat results the perf differences are explained by CPU overheads and possible from changes in mutex contention
  • for the other tests
    • MySQL 5.7.44 gets about 20% less QPS than 5.6.51 for read-only-count and about 10% more QPS than 5.6.51 for read-only-simple and read-only-sum
    • MySQL 8.0 to 9.5 get about 30% less QPS than 5.6.51 for read-only-count and up to 20% less QPS than 5.6.51 for read-only-simple and read-only-sum
    • From vmstat results for read-only-count the reason is new CPU overhead
Results: writes

This is from the small server.
  • For most tests
    • MySQL 5.7.44 gets between 10% and 20% less QPS than 5.6.51
    • MySQL 8.0 to 9.5 get between 40% to 50% less QPS than 5.6.51
    • From vmstat results for the insert test, MySQL 5.7 to 9.5 use a lot more CPU
  • For the update-index test
    • MySQL 5.7.44 is faster than 5.6.51
    • MySQL 8.0 to 9.5 get about 10% less QPS than 5.6.51
    • From vmstat metrics MySQL 5.6.51 has more mutex contention
  • For the update-inlist test
    • MySQL 5.7.44 is as fast as 5.6.51
    • MySQL 8.0 to 9.5 get about 30% less QPS than 5.6.51
    • From vmstat metrics MySQL 5.6.51 has more mutex contention
This is from the large server and the y-axis truncates the result for the update-index test to improve readability for the other results.
  • For all tests MySQL 5.7 to 9.5 get more QPS than 5.6.51
    • From vmstat results for the write-only test MySQL 5.6.51 uses more CPU and has more mutex contention.
  • For some tests (read-write_range=X) MySQL 8.0 to 9.5 get less QPS than 5.7.44
    • These are the classic sysbench transaction with different range scan lengths and the performance is dominated by the range query response time, thus 5.7 is fastest.
  • For most tests MySQL 5.7 to 9.5 have similar perf with two exceptions
    • For the delete test, MySQL 8.0 to 9.5 are faster than 5.7. From vmstat metrics 5.7 uses more CPU and has more mutex contention than 8.0 to 9.5.
    • For the update-inlist test, MySQL 8.0 to 9.5 are faster than 5.7. From vmstat metrics 5.7 uses more CPU than 8.0 to 9.5.
This is also from the large server and does not truncate the update-index test result.

Saturday, November 22, 2025

Challenges compiling old C++ code on modern Linux

I often compile old versions of MySQL, MariaDB, Postgres and RocksDB in my search for performance regressions. Compiling is easy with Postgres as they do a great job at avoiding compilation warnings and I never encounter broken builds. Certainly the community gets the credit for this, but I suspect their task is easier because they use C.  This started as a LinkedIn post.

I expect people to disagree, and I am far from a C++ expert, but here goes ...

tl;dr - if you maintain widely used header files (widely used by C++ projects) consider not removing that include that you don't really need (like <cstdint>) because such removal is likely to break builds for older releases of projects that use your include.

I have more trouble compiling older releases of C++ projects. For MySQL I have a directory in github that includes patches that must be applied. And for MySQL I have to patch all 5.6 versions, 5.7 versions up to 5.7.33 and 8.0 versions up to 8.0.23. The most common reason for the patch is missing C++ includes (like <cstdint>).

For RocksDB with gcc I don't have to patch files but I need to use gcc-11 for RocksDB 6.x and gcc-12 for RocksDB 7.x.

For RocksDB with clang I don't have to patch files for RocksDB 8.x, 9.x and 10.x while I do have to patch 6.x and 7.x. For RocksDB 7.10 I need to edit two files to add <cstdint>. The files are:

  • table/block_based/data_block_hash_index.h
  • util/string_util.h
All of this is true for Ubuntu 24.04 with clang 18.1.3 and gcc 13.3.0.

One more detail, for my future self, the command line I use to compile RocksDB with clang is one of the following:
  • Rather than remember which of V= and VERBOSE= that I need, I just use both
  • I get errors if I don't define AR and RANLIB when using clang
  • While clang-18 installs clang and clang++ binaries, to get the llvm variants of ar and ranlib I need to use llvm-ar-18 and llvm-ranlib-18 rather than llvm-ar and llvm-ranlib

# without link-time optimization
AR=llvm-ar-18 RANLIB=llvm-ranlib-18 \
CC=clang CXX=clang++ \

make \
DISABLE_WARNING_AS_ERROR=1 DEBUG_LEVEL=0 V=1 VERBOSE=1 -j... \

static_lib db_bench

# with link-time optimization
AR=llvm-ar-18 RANLIB=llvm-ranlib-18 \
CC=clang CXX=clang++ \
make USE_LTO=1 \
DISABLE_WARNING_AS_ERROR=1 DEBUG_LEVEL=0 V=1 VERBOSE=1 -j... \
static_lib db_bench

Using sysbench to measure how Postgres performance changes over time, November 2025 edition

This has results for the sysbench benchmark on a small and big server for Postgres versions 12 through 18. Once again, Postgres is boring be...