Tuesday, April 25, 2023

Perf regressions in MySQL/InnoDB, a big server & sysbench, part 2

My last post has results for MySQL/InnoDB on a big server and the database fit in the InnoDB buffer pool. Here I have results where the database fits in the OS page cache but not the InnoDB buffer pool.

The context for the results is short-running queries, in-memory (cached by the OS) with high-concurrency (20 clients) on a big server (30-cores). The goals are:

  1. Understand the impact of compiler optimizations
  2. Document how performance has changed from MySQL 5.6 to 5.7 to 8.0
  3. Document performance with fast storage (reading from the OS page cache is fast)
tl;dr
  • The rel_lto build improves QPS by up to 3%
  • 8.0 releases look much better here with a big server & high-concurrency than on the small server with low-concurrency.
  • For changes from 5.6 to 8.0
    • Point queries - version 8.0.32 gets about 3X more QPS than 5.6.51 on most of the microbenchmarks. This is much better than the previous result where the database fits in the buffer pool.
    • Range queries - version 8.0.32 gets about 14% more QPS than 5.6.51. This is much better than the previous result where the database fits in the buffer pool.
    • Writes - version 8.0.32 gets the same QPS as version 5.6.51. This is much worse than the previous result where the database fits in the buffer pool.
Benchmark

A description of how I run sysbench is here. Tests use the a c2-standard-60 server on GCP with 30-cores, hyperthreading disabled, 240G RAM and 3TB of local attached NVMe. The sysbench tests were run for 20 clients, 600 seconds per microbenchmark using 4 tables with 50M rows per table. All tests use the InnoDB storage engine. The test database fits in the InnoDB buffer pool.

I used a similar configuration (my.cnf) for all versions which is here for 5.65.78.0.2x and 8.0.3x.

Builds

I tested MySQL versions 5.6.51, 5.7.40, 8.0.22, 8.0.28, 8.0.31 and 8.0.32 using multiple builds for each version. For each build+version the full set of sysbench microbenchmarks was repeated. More details on the builds are in the previous post. To save time I only tested all builds for 8.0.31 and for other versions used the rel_lto build.

Results: all versions

The spreadsheet is here. See the 56_to_80.redo.4g tab.

The graphs use relative throughput which is throughput for me / throughput for base case. When the relative throughput is > 1 then my results are better than the base case. When it is 1.10 then my results are ~10% better than the base case. The base case here is MySQL 5.6.51 using the rel_lto build.

There are three graphs per version which group the microbenchmarks by the dominant operation: one for point queries, one for range queries, one for writes. There is much variance within each of the microbenchmark groups:
  • Point queries - most of the microbenchmarks get about 3X more QPS in 8.0 than 5.6. The exceptions are hot-points_range=100, point-query.pre_range=100 and point-query.range=100. Two of the exceptions select one row per query while the microbenchmarks that are 3X faster tend to have a large in-list.
  • Range queries - most of the microbenchmarks have a relative throughput between 0.8 and 1.2 with 8.0 compared to 5.6.51. There are two outliers that are more than 3X faster in 8.0 -- range-notcovered-si.pre_range=1000 and range-notcovered-si_range=1000 which use oltp_points_covered.lua. The two exceptions do a range scan on a non-covering secondary index so there will be more reads from the OS page cache for these.
  • Writes - there is not much variance in the microbenchmarks except for read-write* which use the classic sysbench transaction that includes range queries. Perhaps their improvement in 8.0 vs 5.6 is mostly do to the improvements in range queries, but their cousin (read-only*) which uses the same SQL excluding the writes doesn't show such an improvement. This is a mystery.
Summary statistics:

my5651_relmy5740_rel_ltomy8022_rel_ltomy8028_rel_ltomy8031_rel_ltomy8032_rel_lto
Point: avg1.712.672.752.712.70
Point: median1.582.882.983.153.19
Point: min1.031.021.010.990.98
Point: max5.113.733.763.703.37
Point: stddev0.8970.7940.8610.8440.840
Range: avg1.291.471.501.421.42
Range: median1.251.241.191.151.14
Range: min0.810.710.730.680.62
Range: max2.293.473.603.443.44
Range: stddev0.3680.8340.8970.8470.858
Write: avg1.221.091.071.061.06
Write: median1.181.021.011.001.01
Write: min0.980.890.950.940.95
Write: max1.521.551.331.331.33
Write: stddev0.2000.2200.1420.1340.132

Results: MySQL 8.0.31

The spreadsheet is here. See the my8031.redo.4g tab.

The graphs use relative throughput which is throughput for me / throughput for base case. When the relative throughput is > 1 then my results are better than the base case. When it is 1.10 then my results are ~10% better than the base case. The base case here is MySQL 8.0.31 using the rel_withdbg build.

There are three graphs per version which group the microbenchmarks by the dominant operation: one for point queries, one for range queries, one for writes. For each group of microbenchmarks:
  • Point queries - there is little variance across the microbenchmarks
  • Range queries - the full table scan test (scan_range=10) shows the best improvement from the rel_lto build. I don't understand the noisy result for the read-only_range=10000 microbenchmark. Perhaps buffer pool writeback was still in progress as that microbenchmark is run shortly after the write-heavy microbenchmarks.
  • Writes -  there is little variance across the microbenchmarks
Summary statistics:

rel_withdbgrel_o2rel_nativerelrel_o2_ltorel_native_ltorel_lto
Point: avg0.981.021.021.011.031.03
Point: median0.991.021.021.011.031.03
Point: min0.950.960.971.001.011.01
Point: max1.011.031.031.021.061.05
Point: stddev0.0170.0210.0180.0070.0110.011
Range: avg0.980.980.981.001.001.01
Range: median0.970.980.981.011.011.02
Range: min0.960.890.860.930.900.81
Range: max1.071.061.031.021.041.11
Range: stddev0.0280.0400.0380.0210.0340.061
Write: avg0.980.980.990.990.991.00
Write: median0.980.980.990.980.980.99
Write: min0.970.980.980.980.980.99
Write: max0.980.991.001.001.011.01
Write: stddev0.0030.0050.0060.0070.0110.008

    No comments:

    Post a Comment

    Speedb vs RocksDB on a large server

    I am happy to read about storage engines that claim to be faster than RocksDB. Sometimes the claims are true and might lead to ideas for mak...