I spent much time in 2024 documenting performance regressions from old to new versions of MySQL with InnoDB and MyRocks. More posts will be published in 2025, including this sequence of posts, but my work on that is winding down. Most of the problems are from many small regressions rather than a few big ones and it is very expensive to find and fix regressions long after they arrive.
Hopefully someone else will emerge to do work like this for MySQL going forward or upstream finds more resources to prevent new small regressions from arriving. The trend over the past decade has been great for Postgres. And if upstream wants to grow Heatwave, then avoiding regressions in the base product is one way to help with that.
The purpose of this post is to document the regressions from MySQL 5.6 through 8.0 for MyRocks and upstream InnoDB using sysbench with a cached workload.
tl;dr, v1
- Modern InnoDB at low concurrency gets about 70% of the QPS relative to older InnoDB (5.6.51). The regressions aren't as bad at high concurrency.
- Modern MyRocks gets between 80% and 95% of the QPS relative to older MyRocks (5.6.35) in most cases (high-concurrency writes is the exception where QPS is similar for old and new.
tl;dr, v2
- Both upstream MySQL and the FB MySQL project would benefit from changepoint detection (like Nyrkio) using sysbench microbenchmarks to detect regressions
- Regressions from MySQL 5.6 to 8.0 for InnoDB are worse at low concurrency than at high. MySQL has gotten more efficient at high concurrency workloads, although some of that benefit is lost from code bloat which is more visible at low concurrency.
- Regressions from MySQL 5.6 to 8.0 for MyRocks are better than InnoDB at low concurrency, but worse at high concurrency. I suspect that regressions at high concurrency for MyRocks would be better than InnoDB had I enabled the hyper clock cache.
Builds
- fbmy5635_rel_o2nofp_210407_f896415f_6190
- MyRocks 5.6.35 at git hash f896415f (2021/04/07) and RocksDB 6.19.0
- fbmy5635_rel_o2nofp_231016_4f3a57a1_870
- MyRocks 5.6.35 at git hash 4f3a57a1 (2023/10/16) and RocksDB 8.7.0
- fbmy8028_rel_o2nofp_231202_4edf1eec_870
- MyRocks 8.0.28 at git hash 4edf1eec (2023/12/02) and RocksDB 8.7.0
- fbmy8032_rel_o2nofp_end_241023_ba9709c9_971
- MyRocks 8.0.32 at git hash ba9709c9 (2024/10/23) and RocksDB 9.7.1
- fbmy8032_rel_o2nofp_241023_ba9709c9_971_bug1473_1481_1482_1506
- MyRocks 8.0.32 at git hash ba9709c9 (2024/10/23) and RocksDB 9.7.1 with fixes applied for bugs 1473, 1481, 1482 and 1506. There is also a workaround for bug 1505 via my.cnf changes. But note that bug 1506 is only for InnoDB.
- 1473
- is_thd_db_read_only_by_name accounts for ~2% of CPU time on CPU-bound and write-heavy microbenchmarks
- 1481
- ha_statistic_increment accounts for ~5% of CPU time on CPU-bound table scans
- 1482
- calls to clock_gettime in sql/optimizer.cc account for ~4% of CPU time on several microbenchmarks
- 1505
- the default for yield_check_frequency is zero which makes MySQL waste much time in calls to thd_wait_yield. A better default is 10. Too many my.cnf options isn't a big deal. Too many options with bad default values is a big deal. The workaround for this is to set yield_check_frequency=10 in my.cnf
- 1506
- this is limited to InnoDB in the FB MySQL tree. It is from changes in the concurrency ticket code and reduces CPU-bound table scan performance by ~20%.
- fbmy5635_rel_o2nofp_210407_f896415f_6190
- fbmy5635_rel_o2nofp_231016_4f3a57a1_870
- fbmy8028_rel_o2nofp_231202_4edf1eec_870
- fbmy8032_rel_o2nofp_end_241023_ba9709c9_971
- my.cnf.cza2_c32r128 (this adds yield_check_frequency=10)
- fbmy8032_rel_o2nofp_241023_ba9709c9_971_bug1473_1481_1482_1506
- my.cnf.cza2_c32r128 (this adds yield_check_frequency=10)
- c8r16
- The c8r16 name stands for 8 CPU cores and 16G of RAM. This is a Beelink SER4 with an AMD Ryzen 7 4700 CPU with SMT disabled, 8 cores, 16G of RAM, Ubuntu 22.04 and ext4 on 1 NVMe device.
- c8r32
- The c8r32 name stands for 8 CPU cores and 32G of RAM. This is an ASUS ExpertCenter PN53 with AMD Ryzen 7 7735HS, with SMT disabled, 8 cores, 32G RAM, Ubuntu 22.04 and ext4 on 1 NVMe device.
- c24r64
- The c24r64 name stands for 24 CPU cores and 64G of RAM. This is a SuperMicro SuperWorkstation 7049A-T with 2 sockets, 12 cores/socket, 64G RAM, one m.2 SSD (2TB, ext4). The CPUs are Intel Xeon Silver 4214R CPU @ 2.40GHz.
- c32r128
- The c32r128 name stands for 32 CPU cores and 128G of RAM. This is a Dell Precision 7865 Tower Workstation with 1 socket, 128G RAM, AMD Ryzen Threadripper PRO 5975WX with 32-Cores, 2 m.2 SSD (each 2TB, RAID SW 0, ext4).
- c8r16 - 1 thread, 1 table, 30M rows
- c8r32 - 1 thread, 1 table, 50M rows
- c24r64 - 1 thread and then 16 threads, 8 tables, 10M rows/table
- c32r128 - 1 thread and then 24 threads, 8 tables, 10M rows/table
Relative QPS
- rQPS is: (QPS for my version) / (QPS for base version)
- base version for InnoDB is 5.6.51 and for MyRocks is 5.6.35 (fbmy5635_rel_o2nofp_210407_f896415f_6190)
- my version is one of the other versions
- based on the median values, InnoDB in 8.0.32 gets ~72% of the QPS relative to 5.6.51
- the results here are better than above. For point queries and writes, upstream made InnoDB better for high-concurrency workloads which counters the performance loss from new CPU overheads. However, that isn't true for writes. The issue is made worse by some of the refactoring in InnoDB that landed after 8.0.28.
- the regressions here aren't as bad as they are above for InnoDB
- Based on the median values, MyRocks at higher concurrency here does better than at low concurrency above. I am not sure why. Perhaps had I enabled the hyper clock block cache the results here would be better than they are above for InnoDB at 24 threads.
- 5635.231016
- fbmy5635_rel_o2nofp_231016_4f3a57a1_870
- 8028.231202
- fbmy8028_rel_o2nofp_231202_4edf1eec_870
- 8032.241023.nofix
- fbmy8032_rel_o2nofp_end_241023_ba9709c9_971
- 8032.241023.fix
- fbmy8032_rel_o2nofp_241023_ba9709c9_971_bug1473_1481_1482_1506
- the worst regression is from hot-points where 5.6 gets almost 2X more QPS than 8.0
- for all tests except one, InnoDB in 8.0 gets less than 75% of the throughput relative to 5.6
- the regression for 8.0.28 in the third group from the right is from bug 102037
- Upstream MySQL has improvements for concurrent workloads in 5.7 and 8.0 which offset the CPU regressions.
- With 5.7.44, 5 tests are much faster than 5.6.51, 2 tests have similar QPS and 2 are slower
- With 8.0.32, 5 tests are much faster than 5.6.51 and 4 are slower
- Regressions from 5.6 to 8.0 are smaller here for MyRocks than above for InnoDB
- In 8.0.32, 7 tests are slower than 5.6.35 and 2 are slightly faster
- The regression for 8.0.28 in the third group from the right is from bug 102037
- The results here are about the same as the results above for MyRocks at 1 thread. Were I to enable the hyper clock cache they might be much better.
Notes for NT=1 (1 thread):
- The regressions are large for everything but long range scans. And for long-range scans while 5.7.44 is faster than 5.6, that benefit no longer exists in 8.0, especially in 8.0.30+.
- Results here are much better than above for InnoDB at 1 thread. But about half of the tests still have large regressions.
Notes for NT=1 (1 thread):
- Results here are similar to what I see above for InnoDB at 1 thread. I will explain more elsewhere but it has been hard to track down the root causes to anything other than code bloat, as in more cache and TLB activity.
- Results here at high concurrency are much better than above at low concurrency, with the exception of the results for the full scan (the last group on the right).
Notes for NT=1 (1 thread):
- All of the tests have large regressions in 8.0.32 except for update-index
- Results here are much better than above at 1 thread. Modern InnoDB is much better at high-concurrency workloads.
Notes for NT=1 (1 thread):
- MyRocks has large regressions at low-concurrency.
- The regressions here at high-concurrency are about 10% better than above at low-concurrency.
No comments:
Post a Comment