Saturday, November 9, 2024

Fixing some of the InnoDB scan perf regressions in a MySQL fork

I recently learned of Advanced MySQL, a MySQL fork, and ran my sysbench benchmarks for it. It fixed some, but not all, of the regressions for write heavy workloads that landed in InnoDB after MySQL 8.0.28.

In response to my results, the project lead filed a bug for performance regressions and then quickly came up with a diff. The bug in this case is for regressions that are most obvious during full table scans and the problems arrived in MySQL 8.0.29 and 8.0.30 -- see bug 111538 and this post. The bug is closed for upstream but the perf regressions remain so I am excited to see the community working to solve this problem.

tl;dr

  • Advanced MySQL with the fix removes much of the regression in scan performance
Builds

I tried 4 builds

  • my8028 - upstream MySQL 8.0.28
  • my8040 - upstream MySQL 8.0.40
  • my8040adv_pre - Advanced MySQL 8.0.40 without the fix (without d347cdb)
  • my8040adv_post - Advanced MySQL 8.0.40 with the fix (at d347cdb)
Hardware

The servers are

  • dell32
    • Dell Precision 7865 Tower Workstation with 1 socket, 128G RAM, AMD Ryzen Threadripper PRO 5975WX with 32-Cores, 2 m.2 SSD (each 2TB, RAID SW 0, ext4). 
  • ax162-s
    • AMD EPYC 9454P 48-Core Processor with SMT disabled, 128G RAM, Ubuntu 22.04 and ext4 on 2 NVMe devices with SW RAID 1. This is in the Hetzner cloud.
  • bee
    • Beelink SER 4700u with Ryzen 7 4700u, 16G RAM, Ubuntu 22.04 and ext4 on NVMe

Benchmark

I used sysbench and my usage is explained here. A full run has 42 microbenchmarks and most test only 1 type of SQL statement. The database is cached by InnoDB.

The benchmark is run with ...
  • dell32 - 8 tables, 10M rows per table and 24 threads
  • ax162-s - 8 tables, 10M rows per table and 40 threads
  • bee - 1 table, 30M rows and 1 thread
Each microbenchmark runs for 300 seconds if read-only and 600 seconds otherwise. Prepared statements were enabled.

Results: overview

All of the results use relative QPS (rQPS) where:
  • rQPS is: (QPS for my version / QPS for base version)
  • base version is the QPS from MySQL 8.0.28
  • my version is one of the other versions
Here I only share the results for the scan microbenchmark.

Results: dell32

Summary
  • Summary
    • QPS with the fix in Advanced MySQL is ~9% better than without the fix
    • QPS with the fix in Advanced MySQL is ~2% better than my8040.
    • I am not sure why my8040adv_pre did much worse than my8040
From the relative QPS results the QPS with my8040adv_pre was ~15% less than my8028. But my8040adv_post is only ~7% slower than my8028 so it removes half of the regression.

Relative to: my8028
col-1 : my8040
col-2 : my8040adv_pre
col-3 : my8040adv_post

col-1   col-2   col-3
0.91    0.85    0.93    scan

From vmstat and iostat metrics CPU overhead for my8040adv_pre was ~22% larger than my8028. But with the fix the CPU overhead for my8040adv_post is only ~8% larger than my8028. This is great.

--- absolute
cpu/o cs/o r/o rKB/o wKB/o o/s dbms
0.093496 3.256 0 0 0.006 246 my8028
0.106105 4.065 0 0 0.006 225 my8040
0.113878 4.344 0 0 0.006 208 my8040adv_pre
0.101104 3.978 0 0 0.006 228 my8040adv_post
--- relative to first result
1.13 1.25 1 1 1.00 0.91 my8040
1.22 1.33 1 1 1.00 0.85 my8040adv_pre
1.08 1.22 1 1 1.00 0.93 my8040adv_post

Results: ax162-s

Summary
  • QPS is ~18% larger with the fix in Advanced MySQL
  • CPU overhead is ~15% smaller with the fix
From the relative QPS results the QPS with my8040adv_pre was the same as my8040 and both were ~17% slower than my8028. But my8040adv_post is only ~2% slower than my8028 which is excellent.

Relative to: my8028
col-1 : my8040
col-2 : my8040adv_pre
col-3 : my8040adv_post

col-1   col-2   col-3
0.83    0.83    0.98    scan

From vmstat and iostat metrics CPU overhead for my8040 and my8040adv_pre were ~20% larger than my8028. But with the fix the CPU overhead for my8040adv_post is only ~3% larger than my8028. This is great.

--- absolute
cpu/o cs/o r/o rKB/o wKB/o o/s dbms
0.018767 0.552 0 0 0.052 872 my8028
0.022533 0.800 0 0 0.013 725 my8040
0.022499 0.808 0 0.001 0.034 727 my8040adv_pre
0.019305 0.731 0 0 0.03 851 my8040adv_post
--- relative to first result
1.20 1.45 1 1 0.25 0.83 my8040
1.20 1.46 1 inf 0.65 0.83 my8040adv_pre
1.03 1.32 1 1 0.58 0.98 my8040adv_post

Results: bee

Summary:
  • QPS is ~17% larger with the fix in Advanced MySQL
  • CPU overhead is ~15% smaller with the fix
I did not test my8040adv_pre on this server.

From the relative QPS results the QPS with my8040 is ~22% less than my8028. But QPS from my8040adv_post is only ~9% less than my8028. This is great.

Relative to: my8028
col-1 : my8040
col-2 : my8040adv_post

col-1   col-2
0.78    0.91    scan

From vmstat and iostat metrics CPU overhead for my8040 was ~28% larger than my8028. But with the fix the CPU overhead for my8040adv_post is only ~3% larger than my8028. This is great.

--- absolute
cpu/o           cs/o    r/o     rKB/o   wKB/o   o/s     dbms
0.222553        2.534   0       0.001   0.035   55      my8028
0.285792        7.622   0       0       0.041   43      my8040
0.246404        6.475   0       0       0.036   50      my8040adv_post
--- relative to first result
1.28            3.01    1       0.00    1.17    0.78    my8040
1.11            2.56    1       0.00    1.03    0.91    my8040adv_post

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...