Small Datum: Perf regressions in MySQL from 5.6.21 to 8.0.36 using sysbench and a small server

This has results for sysbench vs upstream MySQL on a small server. I have results for some 5.6, 5.7 and 8.0 releases up to 8.0.36. My last report stopped at 8.0.34. The goal is to document where things get faster or slower over time for a low-concurrency and CPU-bound workload. The focus is on CPU regressions.

My results here aren't universal.

The microbenchmarks here mostly measure CPU overheads
Things won't look the same with an IO-bound workload. If nothing else that will make many of the CPU regressions less significant.
Things won't look the same with a workload that has more concurrency. While MySQL tends to get slower over time from more CPU overhead it also gets faster over time on concurrent workloads from improvements to synchronization code. Results from a few months ago on a larger server are here and the regressions are much smaller.
Things won't look the same with a workload that has complex queries. Most of the queries used by sysbench are simple and short running. This amplifies the impact of perf regressions in parse, semantic analysis and query optimization.

tl;dr

Upstream MySQL would benefit from changepoint detection as provided by Nyrkiö.
MySQL 8.0 is the worst for perf regressions, while 5.7 and 5.6 are better at avoiding them. Also, there tend to be large regressions between the last point release in one major version and the first point release in the following major version, for instance from 5.6.51 to 5.7.10.
The scan_range=100 microbenchmark that does a full table scan has a large regression from 8.0.28 to 8.0.36 and bug 111538 is open for this

Comparing 8.0.36 with 5.6.21

For point queries, 8.0.36 gets 19% to 39% less QPS than 5.6.21
For range queries that don't do aggregation (part 1), 8.0.36 gets 29% to 39% less QPS than 5.6.21
For range queries that do aggregation, 8.0.36 gets 3% to 45% less QPS than 5.6.21. The difference depends on the length of the range scan, where shorter scan == larger regression.
Full scan (scan_range=100) has the largest regression (5.6.21 is ~2X faster than 8.0.36)
For most writes (ignoring the update-index microbenchmark), 8.0.36 gets about half of the throughput compared to 5.6.21

Builds

It isn't easy to build older code on newer systems, compilers, etc. Notes on that are here for 5.6, for 5.6 and 5.7, for 5.7 and for 8.0. A note on using cmake is here. The rel builds were used -- everything was compiled using CMAKE_BUILD_TYPE=Release.

Tests were done for:

5.6 - 5.6.21, 5.6.31, 5.6.41, 5.6.51
5.7 - 5.7.10, 5.7.20, 5.7.30, 5.7.44
8.0 - 8.0.13, 8.0.14, 8.0.20, 8.0.28, 8.0.35, 8.0.36

I used the cz10a_bee config and it is here for 5.6, 5.7 and 8.0 (here and here). For 8.0 releases older than 8.0.19 I changed innodb_idle_flush_pct=1 to loose_innodb_idle_flush_pct=1.

Benchmarks

I used sysbench and my usage is explained here. There are 42 microbenchmarks and each tests ~1 type of SQL statement and is run for 1200 seconds.

Tests were run on a small server I have at home (see here). The server is an SER4 from Beelink with 8 cores, 16G of RAM and 1 m.2 storage device with XFS and Ubuntu 22.04. The test tables are cached by InnoDB.

The benchmark is run with:

one connection
30M rows and a database cached by InnoDB
each microbenchmark runs for 1200 seconds
prepared statements were enabled

The command line was: bash r.sh 1 30000000 1200 1200 nvme0n1 1 1 1

Results

For the results below I split the microbenchmarks into 5 groups -- 2 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. Unfortunately, I included the full scan microbenchmark (scan_range=100) in part 2 but it doesn't do aggregation. The spreadsheet with all data and charts is here and is easier to read.

All of the charts have relative throughput on the y-axis where that is (QPS for $me) / (QPS for $base), $me is a version (for example 5.7.20) and $base is the base version. The base version is specified below and one of 5.6.21, 5.7.10 and 8.0.13 depending on what I am comparing. The y-axis doesn't start at 0 to improve readability.

The legend under the x-axis truncates the names I use for the microbenchmarks and I don't know how to fix that other than sharing the link to the Google Sheet I used. Files I used to create the spreadsheets are here.

From 5.6.21 to 8.0.36

This section uses 5.6.21 as the base version and then compares that with 5.6.51, 5.7.10, 5.7.44, 8.0.13, 8.0.14, 8.0.20, 8.0.28, 8.0.35 and 8.0.36 to show how performance has changed from oldest tested (5.6.21) to newest tested (8.0.36).

The largest regressions might occur between the last point release in one major version and the first point release in the next major version.
For point queries, 8.0.36 gets 19% to 39% less QPS vs 5.6.21
For range queries that don't do aggregation (part 1), 8.0.36 gets 29% to 39% less QPS vs 5.6.21
For range queries that do aggregation, 8.0.36 gets 3% to 45% less QPS vs 5.6.21. The difference depends on the length of the range scan -- shorter scan == larger regression. And full scan (scan_range=100) has the largest regression.
For most writes (ignoring the update-index microbenchmark), 8.0.36 gets about half of the throughput compared to 5.6.21

Summary statistics for each of the benchmark groupings:

	min	max	avg	median	stdev
point-1	0.63	0.78	0.71	0.69	0.05
point-2	0.61	0.81	0.71	0.70	0.09
range-1	0.61	0.71	0.64	0.62	0.04
range-2	0.55	0.97	0.76	0.74	0.16
writes	0.44	1.08	0.63	0.56	0.19

MySQL 8.0: some point releases

This section uses 8.0.13 as the base version and then compares that with 8.0.14, 8.0.20, 8.0.28, 8.0.35 and 8.0.36 to show how performance has changed from 8.0.13 to 8.0.36.

There was a perf bug in 8.0.28 (bug 102037) from the optimizer for queries with large in-lists that explains the two results below in Point query, part 2 that are close to 0.40.

From MySQL 8.0.13 to 8.0.36

Point queries are ~5% slower in 8.0.36
Range queries without aggregation are between 6% and 15% slower in 8.0.36 and for a few microbenchmarks there is a big regression after 8.0.28 (possibly bug 111538)
Range queries with aggregation are mostly ~15% slower in 8.0.36
Full scan is ~32% slower in 8.0.36 with a big regression after 8.0.28 (possibly bug 111538)
Writes are ~20% slower in 8.0.36 with a big regression after 8.0.20

MySQL 5.7: some point releases

This section uses 5.7.10 as the base version and then compares that with 5.7.20, 5.7.30 and 5.7.44 to show how performance has changed from 5.7.10 to 5.7.44.

For most microbenchmarks the throughput in 5.7.44 is no more than 5% less than in 5.7.10. For two microbenchmarks (update-index and update-inlist) the throughput in 5.7.44 is larger than in 5.7.10.

MySQL 5.6: some point releases

This section uses 5.6.21 as the base version and then compares that with 5.6.31, 5.6.41 and 5.6.51 to show how performance has changed from 5.6.21 to 5.6.51.

For most microbenchmarks the throughput in 5.6.51 is no more than 5% less than in 5.6.21. The largest regression is ~10% from full scan (scan_range=100) and 5.6.51 is faster than 5.6.21 for the update-inlist microbenchmark.