Small Datum: Perf regressions in MySQL/InnoDB, a big server & sysbench

I used sysbench to test MySQL/InnoDB performance on a big server. This is similar to the results I shared for InnoDB vs sysbench on a small server. The context for the results is short-running queries, in-memory (cached by InnoDB) with high-concurrency (20 clients) on a big server (30-cores). The goals are:

Understand the impact of compiler optimizations
Document how performance has changed from MySQL 5.6 to 5.7 to 8.0

tl;dr

The rel_lto build gets 4%, 0% and 3% more QPS for point query, range query and write microbenchmarks compared to the rel_withdbg build for MySQL 8.0.31. This is similar to the benefit measured on the small server. Link-time optimization is nice.
8.0 releases look much better here with a big server & high-concurrency than on the small server with low-concurrency.
For changes from 5.6 to 8.0

Point queries - version 8.0.32 gets about 4% more QPS (on average) versus version 5.6.51. But microbenchmarks that use the PK index do better than average while ones that use the secondary index do much worse than average where much worse means getting about 25% less QPS than 5.6.51.
Range queries - version 8.0.32 gets about 22% less QPS versus version 5.6.51. The regressions have been gradual from 5.6 to 5.7 to 8.0.
Writes - version 8.0.32 gets almost 3X more QPS versus version 5.6.51. All of that improvement is between 5.6.51 and 5.7.40.

Benchmark

A description of how I run sysbench is here. Tests use the a c2-standard-60 server on GCP with 30-cores, hyperthreading disabled, 240G RAM and 3TB of local attached NVMe. The sysbench tests were run for 20 clients, 600 seconds per microbenchmark using 4 tables with 50M rows per table. All tests use the InnoDB storage engine. The test database fits in the InnoDB buffer pool.

I used a similar configuration (my.cnf) for all versions which is here for 5.6, 5.7, 8.0.2x and 8.0.3x.

Builds

I tested MySQL versions 5.6.51, 5.7.40, 8.0.22, 8.0.28, 8.0.31 and 8.0.32 using multiple builds for each version. For each build+version the full set of sysbench microbenchmarks was repeated. More details on the builds are in the previous post. To save time I only tested all builds for 8.0.31 and for other versions used the rel_lto build.

Results: all versions

The spreadsheet is here. See the 56_to_80.redo tab.

The graphs use relative throughput which is throughput for me / throughput for base case. When the relative throughput is > 1 then my results are better than the base case. When it is 1.10 then my results are ~10% better than the base case. The base case here is MySQL 5.6.51 using the rel_lto build.

There are three graphs per version which group the microbenchmarks by the dominant operation: one for point queries, one for range queries, one for writes. There is much variance within each of the microbenchmark groups:

Point queries - most of the regressions, where the relative throughput is much less than 1, occur on microbenchmarks that use the secondary index. See the spreadsheet for the full microbenchmark names as they are cutoff on the graphs below. So on average 8.0.32 gets about 4% more QPS than 5.6.51 but that can hide something. For microbenchmarks that use the PK index the QPS from 8.0.32 is usually much more than 4% better than 5.6.51. For microbenchmarks that use the secondary index the QPS from 8.0.32 is usually about 25% less than 5.6.51.
Range queries - results are in three classes.

The first class gets about 22% less QPS versus 5.6.51. These do a variety of range scans using the PK or secondary index. For some the index is covering, for others it is not.
The second class gets about 12% more QPS versus 5.6.51. The Lua script for all of these is oltp_read_only.lua which is the classic sysbench transaction excluding writes.
The final class has but one microbenchmark that does a full table scan (scan_range*) and 5.6.51 will soon be 2X faster than modern MySQL for that microbenchmark.

Writes - while there is much variance in the relative throughput across the microbenchmarks in this group, in all cases the throughput with 8.0 is much better than 5.6.51. The read-write* microbenchmarks have the least improvement in 8.0 versus 5.6.51 but those use oltp_read_write.lua which is the classic sysbench transaction and that includes range queries in addition to the writes.

Summary statistics:

my5651_rel	my5740_rel_lto	my8022_rel_lto	my8028_rel_lto	my8031_rel_lto	my8032_rel_lto
Point: avg	1.10	0.97	0.94	0.96	0.95
Point: median	1.23	0.93	0.91	1.06	1.04
Point: min	0.81	0.75	0.72	0.72	0.72
Point: max	1.36	1.29	1.15	1.19	1.18
Point: stddev	0.201	0.169	0.153	0.170	0.165

Range: avg	1.04	0.97	0.95	0.90	0.88
Range: median	0.89	0.86	0.81	0.78	0.78
Range: min	0.74	0.76	0.76	0.63	0.60
Range: max	1.40	1.23	1.21	1.16	1.14
Range: stddev	0.253	0.203	0.199	0.210	0.208

Write: avg	3.19	2.94	2.95	2.89	2.81
Write: median	3.15	2.95	3.03	2.96	2.90
Write: min	1.41	1.32	1.28	1.24	1.21
Write: max	5.83	4.77	4.79	4.66	4.35
Write: stddev	1.251	1.075	1.076	1.071	1.023

Results: version 8.0.31

The spreadsheet is here. See the my8031.redo tab.

There are three graphs per version which group the microbenchmarks by the dominant operation: one for point queries, one for range queries, one for writes. For each group of microbenchmarks:

point queries show little variance
range queries show little variance except on the full scan (scan_range=10). I suspect that is noise from the microbenchmark rather than from compiler optimizations
writes show little variance

Summary statistics:

rel_withdbg	rel_o2	rel_native	rel	rel_o2_lto	rel_native_lto	rel_lto
Point: avg	0.98	1.00	1.01	1.01	1.04	1.04
Point: median	0.98	1.00	1.02	1.01	1.04	1.04
Point: min	0.96	0.98	0.97	0.99	1.01	1.01
Point: max	0.99	1.02	1.03	1.03	1.05	1.06
Point: stddev	0.008	0.011	0.020	0.009	0.010	0.014

Range: avg	0.98	0.98	0.99	1.00	0.99	1.01
Range: median	0.98	0.97	0.97	1.00	0.99	1.00
Range: min	0.97	0.94	0.96	0.97	0.94	0.97
Range: max	1.06	1.14	1.14	1.01	1.03	1.04
Range: stddev	0.023	0.048	0.046	0.010	0.025	0.019

Write: avg	0.98	0.99	0.99	1.01	1.03	1.03
Write: median	0.98	0.99	0.99	1.00	1.03	1.03
Write: min	0.97	0.96	0.96	1.00	1.00	1.01
Write: max	1.00	1.03	1.02	1.05	1.05	1.06
Write: stddev	0.008	0.020	0.018	0.021	0.018	0.019

Small Datum

Tuesday, April 25, 2023

Perf regressions in MySQL/InnoDB, a big server & sysbench

2 comments:

CPU-bound sysbench on a large server: Postgres 12 to 19 beta1