Thursday, October 17, 2024

The impact of PGO, LTO and more for MySQL on a small CPU

This has results for MySQL 8.0.28 vs sysbench on a laptop-class CPU (AMD 4700u) to understand the impact of various compile-time optimizations including LTO and PGO. I will have results from a server-class CPU next weeok. I previously published results on this but then retracted some of them as they appeared bogus.

tl;dr

  • clang and gcc have similar performance with a few exceptions
    • clang does ~7% better than gcc with PGO
    • clang does ~25% better than gcc with PGO+LTO
    • oddly, gcc with PGO is ~3% faster than gcc with PGO+LTO
  • LTO makes things ~6% faster
  • PGO or PGO+LTO makes things between 1.4X and 1.8X faster
  • gcc and clang without -march=native -mtune=native are ~2% faster than with those options

Builds

I compiled upstream MySQL 8.0.28 from source. The my.cnf file is here

All work (builds & tests) are done on a server that uses Ubuntu 22.04.5. The gcc version is 11.4.0 and the clang version is 14.0.0-1ubuntu1.1.

The CMake command lines for all of the builds are here. All builds use CMAKE_BUILD_TYPE =Release. The builds are listed below and the name of the CMake command line file per build is cmk.80.$val where $val is the build name listed below:
  • rel_o2nofp
    • uses gcc, -O2 instead of -O3, adds -fno-omit-frame-pointer, works great for flamegraphs
  • rel_o2nofp_clang
    • uses clang, -O2 instead of -O3, adds -fno-omit-frame-pointer, works great for flamegraphs
  • rel
    • uses gcc, -O3, doesn't work great for flamegraphs
  • rel_clang
    • uses clang, -O3, doesn't work great for flamegraphs
  • rel_o2nofp_lto
    • like rel_o2nofp but adds -DWITH_LTO=ON to get link-time optimization
  • rel_o2nofp_lto_clang
    • like rel_o2nofp_clang but adds -DWITH_LTO=ON to get link-time optimization
  • rel_lto
    • like rel but adds -DWITH_LTO=ON to get link-time optimization
  • rel_lto_clang
    • like rel_clang but adds -DWITH_LTO=ON to get link-time optimization
  • rel_o2nofp_native
    • like rel_o2nofp but adds -march=native -mtune=native
  • rel_o2nofp_native_clang
    • like rel_o2nofp_clang but adds -march=native -mtune=native
  • rel_native
    • like rel but adds -march=native -mtune=native
  • rel_native_clang
    • like rel_clang but adds -march=native -mtune=native
  • rel_o2nofp_native_lto
    • like rel_o2nofp but adds -march=native -mtune=native and -DWITH_LTO=ON
  • rel_o2nofp_native_lto_clang
    • like rel_o2nofp_clang but adds -march=native -mtune=native and -DWITH_LTO=ON
  • rel_native_lto
    • like rel but adds -march=native -mtune=native and -DWITH_LTO=ON
  • rel_native_lto_clang
    • like rel_clang but adds -march=native -mtune=native and -DWITH_LTO=ON
  • rel_pgo_use
    • like rel but uses PGO via -fprofile-generate
  • rel_pgo_use_clang
    • like rel_clang but uses PGO via -fprofile-generate
  • rel_lto_pgo_use
    • like rel but uses PGO via -fprofile-generate and -DWITH_LTO=ON
  • rel_lto_pgo_use_clang
    • like rel_clang but uses PGO via -fprofile-generate and -DWITH_LTO=ON
For PGO I do the following:
  • compile mysqld with -fprofile-generate
  • run sysbench to generate profiles
  • recompile mysqld with -fprofile-use using those profiles
  • rerun sysbench to get the results that I report.
For PGO with gcc I use -fprofile-generate rather than -fprofile-generate=$path and when I run mysqld the gcda files are written into the subdirectories of the object files from when I compiled MySQL. That worked because I used the same host for compiling and testing.

For PGO with clang I use -fprofile-generate rather than -fprofile-generate=$path and when I ran mysqld the profdata files were empty (the profiling data). Then I prepended this to the mysqld command line, but the profdata files were still empty:
  LLVM_PROFILE_FILE="code-%p.profraw"

The problem appears to be that mysqld calls chdir($data-dir) at startup which confuses the clang profiling support. But making that path absolute fixes the problem:
    LLVM_PROFILE_FILE="$PWD/code-%p.profraw"

By mysqld command line I mean that I started mysqld via mysqld_safe, did ps to see the command line, and then used that command line in place of mysqld_safe.

Finally, once I had non-empty profdata files, I had to run llvm-profdata on them and the output (rel_pgo_gen_clang.profdata or rel_lto_pgo_gen_clang.profdata) was used as input when I compiled mysqld with clang and -fprofile-use.
    llvm-profdata merge -output=rel_pgo_gen_clang.profdata code-296681.profraw

Hardware

The server here is a Beelink SER4 with an AMD Ryzen 7 4700 CPU with SMT disabled, 8 cores, 16G of RAM and Ubuntu 22.04. The storage is 1 NVMe device.

The CPU used here (AMD 4700u) is described as a laptop class CPU. The server is configured to use the performance frequency governor and acpi-cpufreq scaling driver.

Benchmark

I used sysbench and my usage is explained here. A full run has 42 microbenchmarks and most test only 1 type of SQL statement. But here I skip the read-only tests that run prior to writes to save time. The database is cached by InnoDB.

The benchmark is run with 1 thread, 1 table and 30M rows. Each microbenchmark runs for 300 seconds if read-only and 600 seconds otherwise. Prepared statements were enabled.

The command lines for my helper script was:
    bash r.sh 1 30000000 300 600 nvme0n1 1 1 1

Results

All of the results are here using relative QPS (rQPS) where:
  • rQPS is: (QPS for my version / QPS for base version)
  • base version is the rel_o2nofp build
  • my version is one of the other builds
I then compute the median and average of the rQPS for each build. There is no result below for rel_o2nofp because that is the base version and the avg and median for it are 1.

avg     median  build
0.98    0.98    rel_o2nofp_clang
...
1.00    1.00    rel
0.98    0.98    rel_clang
...
1.05    1.05    rel_o2nofp_lto
1.06    1.06    rel_o2nofp_lto_clang
...
1.07    1.07    rel_lto
1.07    1.07    rel_lto_clang
...
0.99    0.99    rel_o2nofp_native
0.97    0.97    rel_o2nofp_native_clang
...
0.99    0.99    rel_native
0.98    0.98    rel_native_clang
...
1.05    1.04    rel_o2nofp_native_lto
1.05    1.05    rel_o2nofp_native_lto_clang
...
1.06    1.06    rel_native_lto
1.06    1.06    rel_native_lto_clang
...
1.42    1.48    rel_pgo_use
1.52    1.59    rel_pgo_use_clang
...
1.39    1.43    rel_lto_pgo_use
1.77    1.76    rel_lto_pgo_use_clang

Wednesday, October 16, 2024

Sysbench vs c-state on an AMD 4700u

I repeated CPU-bound sysbench on my smallest server while using cpupower idle-set to disable some of the c-states to understand the impact on performance.

With a lower-concurrency (1 thread) workload there was up to a 13% performance improvement when some of the c-states were disabled. 

With a higher-concurrency (6 threads) workload there is up to a 14% performance improvement for one of the microbenchmarks, but the average and median benefit is much less than it is for the lower-concurrency tests.

I don't know whether that benefit is worth the impact (higher power consumption) so I don't have an opinion on whether this is a good thing to do. Be careful.

Builds

I compiled upstream MySQL 8.0.28 from source. The my.cnf file is here

Hardware

The server here is a Beelink SER4 with an AMD Ryzen 7 4700 CPU with SMT disabled, 8 cores, 16G of RAM and Ubuntu 22.04. The storage is 1 NVMe device.

The CPU used here (AMD 4700u) is described as a laptop class CPU. The server is configured to use the performance frequency governor and acpi-cpufreq scaling driver.

c-states

For background reading start here.

From cpupower idle-info the c-states and their latencies are listed below. On this CPU the latency gap between C1 and C2 is large:
  • poll - latency=0
  • C1 - latency=1
  • C2 - latency=350
  • C3 - latency=400
The output from cpupower idle-info:

CPUidle driver: acpi_idle
CPUidle governor: menu
analyzing CPU 1:

Number of idle states: 4
Available idle states: POLL C1 C2 C3
POLL:
Flags/Description: CPUIDLE CORE POLL IDLE
Latency: 0
Usage: 581127542
Duration: 35202301723
C1:
Flags/Description: ACPI FFH MWAIT 0x0
Latency: 1
Usage: 115404404
Duration: 20416804588
C2:
Flags/Description: ACPI IOPORT 0x414
Latency: 350
Usage: 563498
Duration: 336593281
C3:
Flags/Description: ACPI IOPORT 0x415
Latency: 400
Usage: 13242213
Duration: 240735087110

Benchmark

I used sysbench and my usage is explained here. A full run has 42 microbenchmarks and most test only 1 type of SQL statement. But here I skip the read-only tests that run prior to writes to save time. The database is cached by InnoDB.

The benchmark is run at two levels of concurrency -- 1 thread, 6 threads. In each case there is 1 table, with 30M rows. Each microbenchmark runs for 300 seconds if read-only and 600 seconds otherwise. Prepared statements were enabled.

The command lines for my helper script was:
    bash r.sh 1 30000000 300 600 nvme0n1 1 1 1 6

The benchmark was run for 3 c-state configurations:
  • with all c-states enabled
    • named x.my8028_rel.z11a_bee.pk1.cstate.all below
  • with C1, C2 and C3 disabled via cpupower idle-set -D 1
    • named x.my8028_rel.z11a_bee.pk1.cstate.D1 below
  • with C2 and C3 disabled via cpupower idle-set -D 10
    • named x.my8028_rel.z11a_bee.pk1.cstate.D10 below
Results: 1 thread

The numbers below are the relative QPS which is: (QPS for me / QPS for base) where base is the result with all c-states enabled.
  • Disabling C1, C2, and C3 gives up to 13% more QPS
  • Disabling C1 and C2 gives up to 10% more QPS
Relative to: x.my8028_rel.z11a_bee.pk1.cstate.all
col-1 : x.my8028_rel.z11a_bee.pk1.cstate.D1
col-2 : x.my8028_rel.z11a_bee.pk1.cstate.D10

col-1   col-2
1.10    1.09    hot-points_range=100
1.01    1.00    point-query.pre_range=100
1.00    0.99    point-query_range=100
1.12    1.09    points-covered-pk_range=100
1.13    1.10    points-covered-si_range=100
1.10    1.08    points-notcovered-pk_range=100
1.10    1.08    points-notcovered-si_range=100
1.00    1.00    random-points_range=1000
1.10    1.08    random-points_range=100
1.02    1.01    random-points_range=10
1.02    1.00    range-covered-pk_range=100
1.01    1.00    range-covered-si_range=100
1.02    1.01    range-notcovered-pk_range=100
1.12    1.10    range-notcovered-si.pre_range=100
1.12    1.10    range-notcovered-si_range=100
1.00    1.01    read-only_range=10000
1.05    1.04    read-only_range=100
1.05    1.04    read-only_range=10
0.98    0.98    scan_range=100
1.04    1.02    delete_range=100
1.04    1.03    insert_range=100
1.06    1.05    read-write_range=100
1.07    1.06    read-write_range=10
1.09    1.03    update-index_range=100
1.04    1.02    update-inlist_range=100
1.03    1.01    update-nonindex_range=100
1.03    1.02    update-one_range=100
1.03    1.02    update-zipf_range=100
1.06    1.03    write-only_range=10000

Results: 6 threads

The numbers below are the relative QPS which is: (QPS for me / QPS for base) where base is the result with all c-states enabled.
  • Disabling C1, C2, and C3 gives up to 14% more QPS
  • Disabling C1 and C2 gives up to 12% more QPS
  • With the exception of the update-one microbenchmark, the benefit from disabling c-states here is less than it is above for the tests run with 1 client thread. My guess is that update-one is helped here because it suffers from the most contention (all updates are done to the same row).
Relative to: x.my8028_rel.z11a_bee.pk1.cstate.all
col-1 : x.my8028_rel.z11a_bee.pk1.cstate.D1
col-2 : x.my8028_rel.z11a_bee.pk1.cstate.D10

col-1   col-2
1.04    1.04    hot-points_range=100
1.02    1.00    point-query.pre_range=100
1.02    0.99    point-query_range=100
1.03    1.03    points-covered-pk_range=100
1.04    1.03    points-covered-si_range=100
1.04    1.03    points-notcovered-pk_range=100
1.04    1.04    points-notcovered-si_range=100
1.01    1.00    random-points_range=1000
1.04    1.03    random-points_range=100
1.02    1.01    random-points_range=10
1.00    1.00    range-covered-pk_range=100
1.01    1.01    range-covered-si_range=100
1.01    1.01    range-notcovered-pk_range=100
1.04    1.04    range-notcovered-si.pre_range=100
1.04    1.04    range-notcovered-si_range=100
1.01    1.01    read-only_range=10000
1.02    1.01    read-only_range=100
1.03    1.01    read-only_range=10
0.97    0.97    scan_range=100
1.01    1.00    delete_range=100
1.02    1.01    insert_range=100
1.04    1.02    read-write_range=100
1.04    1.03    read-write_range=10
1.03    1.02    update-index_range=100
1.02    1.00    update-inlist_range=100
1.00    0.99    update-nonindex_range=100
1.14    1.12    update-one_range=100
1.00    0.98    update-zipf_range=100
1.05    1.04    write-only_range=10000

Code bloat vs memory system bloat : why is something getting slower

As I document performance regressions over time in MySQL it helps to assign names for common problems that I see. While there are many problems in general including mutex contention and poor usage of IO, my current focus is on the following:

  • code bloat - the system uses more instructions per unit of work
  • memory system bloat - the instructions don't get executed as fast because there is more TLB and cache activity, IPC decreases and CPI increases.
Fortunately, I can use perf stat to measure all of these and what I see for MySQL 5.6 to 8.0 is that both code bloat and memory system bloat are to blame.

An example

I use the results from a recent run of CPU-bound sysbench on an AMD 7735HS CPU. I focus on the scan microbenchmark for InnoDB. And in call cases the relative numbers are relative to the result from InnoDB with MySQL 5.6.51. From the table of results I see:
  • the relative QPS (rQPS) for InnoDB in MySQL 8.0.39 is 0.75. This means that 8.0.39 gets 75% of the throughput relative to 5.6.51 (or 5.6.51 runs ~1.33X faster than 8.0.39).
  • the relative CPU overhead (cpu/o) for 8.0.39 is 1.36. For CPU-bound workloads I expect the relative CPU overhead to be (approximately) the inverse of the relative QPS -- unless there is too much mutex contention. And that is true here as 1/1.36 ~= 0.74.
  • the relative number of instructions per unit of work for 8.0.39 is 1.44. If code bloat weren't partially to blame for the perf regressions then the value for this would be ~1.0. But 1/1.44 ~= 0.69 and it is possible that most of the regression is from code bloat.
Disclaimer - using CPU and instruction overhead as I do here has some risks.

For CPU overhead I measure all CPU usage on a server. That includes my benchmark client, other things on the server and the DBMS process(es). Fortunately there aren't many other things on the server and the overhead from the benchmark client should be somewhat constant across DBMS versions. However the CPU used by the DBMS process(es) include things that matter more to performance time (CPU used processing a request) and things that matter less (CPU used by background tasks). 

For instruction overhead I use perf stat on the DBMS process. For Postgres, with a process per client, this is the process for one of the connections used by my benchmark. For MySQL, with a multithreaded server, this is the process and includes all of the threads (foreground and background). So this still captures some overheads that matter more (foreground) and some that matter less (background).

Human nature

Code bloat is hard to avoid in long-lived software projects. Some of it is human nature. People new to the project want to (or must) show impact and new features are viewed as more impactful than improving existing code. I assume that company culture is to blame in many cases, but this happens outside of companies as well.

Features are added to grow the market, but eventually you reach the point where the new features grow the market by a small amount with a much larger cost in complexity and performance.

Code bloat often leads to memory system bloat. All of that new code means instruction working set is spread out. So now we have clever link-time optimizations (LTO, Bolt) that try to better organize the hot paths to undo some of the damage. We also have an option to use huge pages for text and reduce iTLB activity. These help but are not solutions and have a large cost in manageability.

And on the data path we also have the option to use huge pages for some cases such as large allocations done for buffer pools (see here and here). This can help but is not a solution and has a large cost in manageability.

I am happy to link to content that describes these problems in more detail.

Monday, October 14, 2024

I recently published results with a summary of HW performance counters for sysbench with MySQL on four CPU types. The performance reports were published over several weeks and to make it easier to see the impact of performance regressions over time, this blog post has the relative QPS for all of the microbenchmarks. Having all of the numbers in one place makes it easier for me to spot things to investigate, like the drop in throughput on the AMD Threadripper for the update-index microbenchmark.

The blog posts with details on the HW perf counters are:

How I use sysbench is explained here. There are 42 microbenchmarks and most test only 1 type of SQL statement. The database is cached by MyRocks and InnoDB.

For each microbenchmark I continue to use relative QPS (rQPS) which is the following, the and base version is named in each of the seconds that follow:
     (QPS for my version / QPS for base version)

tl;dr

  • MySQL has large perf regressions from 5.6 to 8.0 for low-concurrency workloads
  • MySQL doesn't have large perf regressions from 5.6 to 8.0 for high-concurrency workloads, but performance in 8.0 would be a lot better if the CPU regressions were fixed. With 8.0 we benefit from many improvements that reduce mutex contention but a lot of the gains are cancelled out by CPU regressions.

Ryzen 7 4700u

Tests were run with 1 thread, 1 tables and 30M rows. The numbers are the relative QPS which is explained above and the base version is MyRocks 5.6.35 at git sha f89641gf with RocksDB 6.19 for MyRocks and MySQL 5.6.51 for InnoDB.

MyRocks

Relative to: x.fbmy5635_rel_o2nofp_210407_f896415f_6190.za1_bee.pk1
col-1 : x.fbmy5635_rel_o2nofp_231016_4f3a57a1_870.za1_bee.pk1
col-2 : x.fbmy8028_rel_o2nofp_220829_a35c8dfe_752.za1_bee.pk1
col-3 : x.fbmy8028_rel_o2nofp_231202_4edf1eec_870.za1_bee.pk1
col-4 : x.fbmy8032_rel_o2nofp_231204_e3a854e8_870.za1_bee.pk1
col-5 : x.fbmy8032_rel_o2nofp_240529_49b37dfe_921.za1_bee.pk1

col-1 col-2 col-3 col-4 col-5
0.90 0.72 0.72 0.80 0.84 hot-points_range=100
0.95 0.77 0.74 0.73 0.73 point-query.pre_range=100
0.96 0.75 0.74 0.72 0.73 point-query_range=100
0.99 0.81 0.82 0.88 0.88 points-covered-pk.pre_range=100
0.98 0.78 0.83 0.89 0.89 points-covered-pk_range=100
1.28 0.99 1.10 1.20 1.17 points-covered-si.pre_range=100
1.33 0.92 1.24 1.63 1.60 points-covered-si_range=100
0.97 0.81 0.81 0.87 0.87 points-notcovered-pk.pre_range=100
0.96 0.78 0.83 0.88 0.89 points-notcovered-pk_range=100
1.13 0.98 1.05 1.10 1.09 points-notcovered-si.pre_range=100
1.18 0.90 1.13 1.35 1.34 points-notcovered-si_range=100
0.95 0.44 0.44 0.87 0.86 random-points.pre_range=1000
0.97 0.81 0.82 0.88 0.88 random-points.pre_range=100
0.96 0.87 0.87 0.87 0.87 random-points.pre_range=10
0.97 0.45 0.47 0.90 0.90 random-points_range=1000
0.96 0.78 0.83 0.89 0.89 random-points_range=100
0.97 0.83 0.87 0.86 0.87 random-points_range=10
0.93 0.77 0.77 0.75 0.75 range-covered-pk.pre_range=100
0.99 0.76 0.80 0.78 0.78 range-covered-pk_range=100
0.95 0.78 0.76 0.75 0.75 range-covered-si.pre_range=100
0.94 0.77 0.78 0.81 0.78 range-covered-si_range=100
0.84 0.74 0.73 0.72 0.70 range-notcovered-pk.pre_range=100
0.93 0.74 0.76 0.77 0.75 range-notcovered-pk_range=100
0.95 0.90 0.91 0.91 0.90 range-notcovered-si.pre_range=100
0.95 0.85 0.92 0.93 0.93 range-notcovered-si_range=100
0.87 0.86 0.84 0.84 0.83 read-only.pre_range=10000
0.93 0.82 0.81 0.79 0.78 read-only.pre_range=100
0.97 0.79 0.81 0.78 0.77 read-only.pre_range=10
0.90 0.87 0.85 0.85 0.85 read-only_range=10000
0.94 0.78 0.82 0.80 0.79 read-only_range=100
0.97 0.77 0.80 0.80 0.77 read-only_range=10
0.96 0.73 0.73 0.77 0.73 scan_range=100
0.98 0.81 0.80 0.76 0.76 delete_range=100
0.89 0.63 0.62 0.58 0.56 insert_range=100
0.94 0.83 0.83 0.81 0.80 read-write_range=100
0.96 0.81 0.81 0.78 0.78 read-write_range=10
0.99 0.77 0.79 0.77 0.75 update-index_range=100
1.10 0.92 0.94 0.91 0.90 update-inlist_range=100
0.98 0.78 0.79 0.75 0.74 update-nonindex_range=100
0.92 0.76 0.76 0.72 0.70 update-one_range=100
0.98 0.78 0.79 0.75 0.74 update-zipf_range=100
1.04 0.88 0.90 0.84 0.84 write-only_range=10000

InnoDB

Relative to: x.my5651_rel_o2nofp.z11a_bee.pk1
col-1 : x.my5710_rel_o2nofp.z11a_bee.pk1
col-2 : x.my5744_rel_o2nofp.z11a_bee.pk1
col-3 : x.my8011_rel_o2nofp.z11a_bee.pk1
col-4 : x.my8028_rel_o2nofp.z11a_bee.pk1
col-5 : x.my8039_rel_o2nofp.z11a_bee.pk1

col-1 col-2 col-3 col-4 col-5
0.79 0.85 0.76 0.58 0.63 hot-points_range=100
0.87 0.86 0.78 0.71 0.66 point-query.pre_range=100
0.87 0.85 0.78 0.71 0.66 point-query_range=100
0.82 0.86 0.80 0.65 0.68 points-covered-pk.pre_range=100
0.82 0.85 0.80 0.66 0.68 points-covered-pk_range=100
0.86 0.89 0.85 0.72 0.76 points-covered-si.pre_range=100
0.87 0.89 0.86 0.73 0.76 points-covered-si_range=100
0.82 0.85 0.79 0.66 0.68 points-notcovered-pk.pre_range=100
0.82 0.86 0.79 0.66 0.68 points-notcovered-pk_range=100
0.83 0.86 0.81 0.73 0.72 points-notcovered-si.pre_range=100
0.83 0.86 0.81 0.74 0.73 points-notcovered-si_range=100
0.92 0.97 0.91 0.34 0.78 random-points.pre_range=1000
0.82 0.85 0.79 0.65 0.68 random-points.pre_range=100
0.79 0.78 0.72 0.64 0.61 random-points.pre_range=10
0.92 0.98 0.91 0.34 0.79 random-points_range=1000
0.82 0.85 0.79 0.66 0.68 random-points_range=100
0.79 0.78 0.72 0.64 0.61 random-points_range=10
0.85 0.80 0.82 0.70 0.65 range-covered-pk.pre_range=100
0.85 0.80 0.82 0.70 0.65 range-covered-pk_range=100
0.84 0.80 0.82 0.70 0.66 range-covered-si.pre_range=100
0.83 0.80 0.82 0.70 0.66 range-covered-si_range=100
0.82 0.81 0.74 0.69 0.65 range-notcovered-pk.pre_range=100
0.82 0.80 0.74 0.69 0.66 range-notcovered-pk_range=100
0.81 0.83 0.77 0.75 0.69 range-notcovered-si.pre_range=100
0.81 0.83 0.77 0.75 0.69 range-notcovered-si_range=100
1.24 1.21 1.13 1.03 0.98 read-only.pre_range=10000
0.97 0.95 0.90 0.81 0.74 read-only.pre_range=100
0.85 0.84 0.78 0.70 0.64 read-only.pre_range=10
1.25 1.22 1.14 1.03 0.98 read-only_range=10000
0.97 0.97 0.90 0.81 0.75 read-only_range=100
0.86 0.85 0.79 0.70 0.65 read-only_range=10
1.03 0.93 1.02 0.92 0.75 scan_range=100
0.79 0.75 0.68 0.56 0.52 delete_range=100
0.79 0.75 0.64 0.52 0.46 insert_range=100
0.97 0.97 0.87 0.79 0.73 read-write_range=100
0.87 0.87 0.77 0.69 0.63 read-write_range=10
1.16 1.28 1.34 1.17 1.01 update-index_range=100
0.80 0.96 0.81 0.71 0.63 update-inlist_range=100
0.82 0.81 0.70 0.60 0.54 update-nonindex_range=100
0.80 0.78 0.67 0.58 0.53 update-one_range=100
0.82 0.81 0.70 0.59 0.54 update-zipf_range=100
1.00 1.00 0.88 0.75 0.66 write-only_range=10000

Ryzen 7 7735HS

Tests were run with 1 thread, 1 tables and 50M rows. The numbers are the relative QPS which is explained above and the base version is MyRocks 5.6.35 at git sha f89641gf with RocksDB 6.19 for MyRocks and MySQL 5.6.51 for InnoDB.

MyRocks

Relative to: x.fbmy5635_rel_o2nofp_210407_f896415f_6190.za1_c8r32.pk1
col-1 : x.fbmy5635_rel_o2nofp_231016_4f3a57a1_870.za1_c8r32.pk1
col-2 : x.fbmy8028_rel_o2nofp_220829_a35c8dfe_752.za1_c8r32.pk1
col-3 : x.fbmy8028_rel_o2nofp_231202_4edf1eec_870.za1_c8r32.pk1
col-4 : x.fbmy8032_rel_o2nofp_231204_e3a854e8_870.za1_c8r32.pk1
col-5 : x.fbmy8032_rel_o2nofp_240529_49b37dfe_921.za1_c8r32.pk1

col-1 col-2 col-3 col-4 col-5
1.06 0.76 0.71 0.89 0.89 hot-points_range=100
0.94 0.83 0.82 0.78 0.78 point-query.pre_range=100
0.95 0.83 0.83 0.80 0.79 point-query_range=100
0.96 0.85 0.72 0.88 0.88 points-covered-pk.pre_range=100
1.04 0.92 0.73 0.94 0.94 points-covered-pk_range=100
1.83 1.56 1.44 1.71 1.68 points-covered-si.pre_range=100
3.43 2.60 1.74 2.42 2.19 points-covered-si_range=100
0.95 0.87 0.69 0.88 0.88 points-notcovered-pk.pre_range=100
1.02 0.93 0.73 0.92 0.93 points-notcovered-pk_range=100
1.23 1.35 1.11 1.18 1.17 points-notcovered-si.pre_range=100
2.07 2.03 1.50 1.77 1.73 points-notcovered-si_range=100
0.95 0.45 0.41 0.89 0.88 random-points.pre_range=1000
0.94 0.87 0.69 0.88 0.87 random-points.pre_range=100
0.95 0.97 0.92 0.91 0.91 random-points.pre_range=10
1.03 0.52 0.49 0.96 0.96 random-points_range=1000
1.01 0.93 0.73 0.90 0.91 random-points_range=100
0.99 1.01 0.95 0.94 0.93 random-points_range=10
0.95 0.88 0.87 0.86 0.83 range-covered-pk.pre_range=100
1.02 0.92 0.93 0.92 0.90 range-covered-pk_range=100
0.96 0.87 0.86 0.85 0.80 range-covered-si.pre_range=100
1.08 0.96 0.91 0.94 0.89 range-covered-si_range=100
0.89 0.87 0.84 0.84 0.82 range-notcovered-pk.pre_range=100
0.98 0.92 0.92 0.92 0.89 range-notcovered-pk_range=100
0.94 1.05 0.91 0.91 0.89 range-notcovered-si.pre_range=100
1.06 1.16 0.99 1.00 0.99 range-notcovered-si_range=100
0.90 0.91 0.88 0.90 0.87 read-only.pre_range=10000
0.93 0.91 0.89 0.88 0.86 read-only.pre_range=100
0.95 0.89 0.88 0.86 0.85 read-only.pre_range=10
0.93 0.89 0.90 0.92 0.89 read-only_range=10000
0.98 0.94 0.93 0.92 0.91 read-only_range=100
1.00 0.92 0.93 0.90 0.89 read-only_range=10
0.98 0.73 0.78 0.78 0.75 scan_range=100
0.94 1.02 0.83 0.80 0.80 delete_range=100
0.94 0.69 0.68 0.64 0.64 insert_range=100
0.96 0.93 0.90 0.88 0.87 read-write_range=100
0.96 0.92 0.88 0.86 0.85 read-write_range=10
0.96 0.86 0.84 0.81 0.81 update-index_range=100
0.95 0.86 0.80 0.79 0.79 update-inlist_range=100
0.95 0.85 0.82 0.80 0.79 update-nonindex_range=100
0.94 0.82 0.80 0.77 0.76 update-one_range=100
0.95 0.85 0.82 0.79 0.79 update-zipf_range=100
0.96 0.87 0.85 0.81 0.81 write-only_range=10000

InnoDB

Relative to: x.my5651_rel_o2nofp.z11a_c8r32.pk1
col-1 : x.my5710_rel_o2nofp.z11a_c8r32.pk1
col-2 : x.my5744_rel_o2nofp.z11a_c8r32.pk1
col-3 : x.my8011_rel_o2nofp.z11a_c8r32.pk1
col-4 : x.my8028_rel_o2nofp.z11a_c8r32.pk1
col-5 : x.my8039_rel_o2nofp.z11a_c8r32.pk1

col-1 col-2 col-3 col-4 col-5
0.85 0.87 0.82 0.60 0.68 hot-points_range=100
0.92 0.87 0.81 0.73 0.70 point-query.pre_range=100
0.92 0.87 0.80 0.74 0.70 point-query_range=100
0.91 0.87 0.89 0.68 0.76 points-covered-pk.pre_range=100
0.91 0.87 0.90 0.68 0.76 points-covered-pk_range=100
0.97 0.91 0.94 0.75 0.85 points-covered-si.pre_range=100
0.98 0.91 0.95 0.76 0.86 points-covered-si_range=100
0.93 0.88 0.89 0.68 0.77 points-notcovered-pk.pre_range=100
0.93 0.89 0.89 0.69 0.77 points-notcovered-pk_range=100
0.96 0.91 0.93 0.67 0.84 points-notcovered-si.pre_range=100
0.95 0.90 0.91 0.66 0.84 points-notcovered-si_range=100
1.06 1.02 1.05 0.38 0.92 random-points.pre_range=1000
0.94 0.89 0.89 0.69 0.78 random-points.pre_range=100
0.91 0.85 0.81 0.74 0.71 random-points.pre_range=10
1.07 1.03 1.05 0.37 0.92 random-points_range=1000
0.94 0.89 0.90 0.70 0.78 random-points_range=100
0.92 0.85 0.82 0.75 0.72 random-points_range=10
0.90 0.83 0.83 0.73 0.69 range-covered-pk.pre_range=100
0.90 0.83 0.83 0.73 0.69 range-covered-pk_range=100
0.89 0.82 0.82 0.75 0.69 range-covered-si.pre_range=100
0.88 0.83 0.81 0.75 0.69 range-covered-si_range=100
0.91 0.85 0.78 0.72 0.68 range-notcovered-pk.pre_range=100
0.91 0.84 0.78 0.72 0.68 range-notcovered-pk_range=100
0.92 0.88 0.88 0.77 0.77 range-notcovered-si.pre_range=100
0.92 0.88 0.88 0.78 0.77 range-notcovered-si_range=100
1.29 1.23 1.18 1.02 0.99 read-only.pre_range=10000
1.04 0.99 0.95 0.88 0.83 read-only.pre_range=100
0.95 0.90 0.86 0.79 0.75 read-only.pre_range=10
1.28 1.23 1.19 1.01 0.99 read-only_range=10000
1.04 0.99 0.95 0.87 0.83 read-only_range=100
0.96 0.92 0.87 0.80 0.76 read-only_range=10
0.98 0.89 0.97 0.84 0.71 scan_range=100
1.16 0.84 1.01 0.67 0.79 delete_range=100
0.89 0.82 0.74 0.63 0.58 insert_range=100
1.03 0.98 0.93 0.84 0.80 read-write_range=100
0.96 0.91 0.86 0.78 0.75 read-write_range=10
1.57 1.17 1.57 1.12 1.19 update-index_range=100
0.95 1.05 0.95 0.83 0.80 update-inlist_range=100
0.89 0.86 0.80 0.67 0.64 update-nonindex_range=100
0.90 0.86 0.81 0.68 0.64 update-one_range=100
0.90 0.87 0.81 0.68 0.64 update-zipf_range=100
1.02 0.93 0.90 0.78 0.75 write-only_range=10000

Intel Xeon Silver 4214R

Tests were run with 16 threads, 8 tables and 10M rows per table. The numbers are the relative QPS which is explained above and the base version is MyRocks 5.6.35 at git sha f89641gf with RocksDB 6.19 for MyRocks and MySQL 5.6.51 for InnoDB.

MyRocks

Relative to: x.fbmy5635_rel_o2nofp_210407_f896415f_6190.za1_c24r64.pk1
col-1 : x.fbmy5635_rel_o2nofp_231016_4f3a57a1_870.za1_c24r64.pk1
col-2 : x.fbmy8028_rel_o2nofp_220829_a35c8dfe_752.za1_c24r64.pk1
col-3 : x.fbmy8028_rel_o2nofp_231202_4edf1eec_870.za1_c24r64.pk1
col-4 : x.fbmy8032_rel_o2nofp_231204_e3a854e8_870.za1_c24r64.pk1
col-5 : x.fbmy8032_rel_o2nofp_240529_49b37dfe_921.za1_c24r64.pk1

col-1 col-2 col-3 col-4 col-5
1.02 0.91 0.96 0.89 0.94 hot-points_range=100
0.97 0.90 0.90 0.87 0.88 point-query.pre_range=100
0.99 0.91 0.91 0.88 0.89 point-query_range=100
0.95 0.84 0.85 0.90 0.89 points-covered-pk.pre_range=100
0.96 0.89 0.94 0.92 0.90 points-covered-pk_range=100
1.25 1.07 1.18 1.27 1.26 points-covered-si.pre_range=100
1.40 1.28 1.37 1.34 1.38 points-covered-si_range=100
0.94 0.84 0.85 0.89 0.89 points-notcovered-pk.pre_range=100
0.96 0.89 0.93 0.92 0.90 points-notcovered-pk_range=100
1.07 1.01 1.05 1.09 1.09 points-notcovered-si.pre_range=100
1.20 1.14 1.20 1.16 1.18 points-notcovered-si_range=100
0.94 0.52 0.53 0.92 0.92 random-points.pre_range=1000
0.94 0.84 0.84 0.90 0.88 random-points.pre_range=100
0.96 1.06 1.08 1.06 1.06 random-points.pre_range=10
0.94 0.62 0.64 0.92 0.90 random-points_range=1000
0.95 0.89 0.93 0.92 0.90 random-points_range=100
0.99 1.06 1.11 1.06 1.05 random-points_range=10
0.95 0.96 0.99 0.98 0.97 range-covered-pk.pre_range=100
1.00 1.00 1.04 1.02 1.00 range-covered-pk_range=100
0.96 0.97 1.01 1.00 0.98 range-covered-si.pre_range=100
1.01 1.02 1.05 1.04 1.02 range-covered-si_range=100
0.90 0.92 0.93 0.95 0.91 range-notcovered-pk.pre_range=100
0.94 0.95 0.97 0.98 0.94 range-notcovered-pk_range=100
0.93 0.93 0.93 0.92 0.93 range-notcovered-si.pre_range=100
0.96 0.96 1.00 0.95 0.93 range-notcovered-si_range=100
0.92 0.98 0.97 0.99 0.96 read-only.pre_range=10000
0.95 0.98 1.00 1.00 0.98 read-only.pre_range=100
0.97 0.97 1.00 0.99 0.98 read-only.pre_range=10
0.92 0.98 0.98 0.99 0.96 read-only_range=10000
0.98 1.01 1.03 1.01 1.00 read-only_range=100
1.01 1.02 1.05 1.02 1.02 read-only_range=10
0.94 0.69 0.77 0.77 0.70 scan_range=100
0.94 0.98 0.94 0.90 0.79 delete_range=100
0.95 0.84 0.83 0.81 0.79 insert_range=100
0.96 0.99 1.00 0.98 0.97 read-write_range=100
0.97 0.99 0.99 0.97 0.96 read-write_range=10
0.96 0.97 0.94 0.92 0.91 update-index_range=100
0.99 0.91 0.95 0.94 0.91 update-inlist_range=100
0.95 0.95 0.92 0.91 0.89 update-nonindex_range=100
0.94 0.99 0.96 0.93 0.92 update-one_range=100
0.95 0.96 0.93 0.91 0.90 update-zipf_range=100
0.96 1.01 0.99 0.97 0.95 write-only_range=10000

InnoDB

Relative to: x.my5651_rel_o2nofp.z11a_c24r64.pk1
col-1 : x.my5710_rel_o2nofp.z11a_c24r64.pk1
col-2 : x.my5744_rel_o2nofp.z11a_c24r64.pk1
col-3 : x.my8011_rel_o2nofp.z11a_c24r64.pk1
col-4 : x.my8028_rel_o2nofp.z11a_c24r64.pk1
col-5 : x.my8039_rel_o2nofp.z11a_c24r64.pk1

col-1 col-2 col-3 col-4 col-5
1.47 1.48 1.45 1.16 1.16 hot-points_range=100
0.99 0.97 0.93 0.87 0.84 point-query.pre_range=100
0.99 0.96 0.92 0.88 0.83 point-query_range=100
1.17 1.25 1.16 0.95 0.97 points-covered-pk.pre_range=100
1.16 1.22 1.15 0.93 0.96 points-covered-pk_range=100
0.87 0.89 0.85 0.75 0.75 points-covered-si.pre_range=100
0.88 0.89 0.87 0.75 0.77 points-covered-si_range=100
1.16 1.22 1.13 0.94 0.95 points-notcovered-pk.pre_range=100
1.14 1.19 1.12 0.92 0.93 points-notcovered-pk_range=100
0.86 0.88 0.84 0.75 0.74 points-notcovered-si.pre_range=100
0.88 0.88 0.86 0.76 0.74 points-notcovered-si_range=100
0.97 1.29 1.18 0.57 1.02 random-points.pre_range=1000
1.15 1.21 1.10 0.92 0.93 random-points.pre_range=100
0.97 0.97 0.90 0.82 0.77 random-points.pre_range=10
1.16 1.27 1.22 0.57 1.03 random-points_range=1000
1.14 1.19 1.12 0.92 0.94 random-points_range=100
0.97 0.96 0.89 0.82 0.77 random-points_range=10
0.91 0.82 0.87 0.77 0.73 range-covered-pk.pre_range=100
0.91 0.82 0.87 0.77 0.73 range-covered-pk_range=100
0.91 0.82 0.89 0.80 0.75 range-covered-si.pre_range=100
0.91 0.82 0.88 0.80 0.75 range-covered-si_range=100
0.90 0.82 0.81 0.75 0.70 range-notcovered-pk.pre_range=100
0.90 0.82 0.81 0.75 0.70 range-notcovered-pk_range=100
0.87 0.85 0.84 0.76 0.71 range-notcovered-si.pre_range=100
0.88 0.84 0.83 0.76 0.70 range-notcovered-si_range=100
1.22 1.28 1.22 1.11 0.89 read-only.pre_range=10000
1.15 1.11 1.08 1.01 0.96 read-only.pre_range=100
1.04 1.03 0.98 0.93 0.89 read-only.pre_range=10
1.35 1.27 1.24 1.11 1.07 read-only_range=10000
1.15 1.12 1.08 1.02 0.96 read-only_range=100
1.04 1.02 0.97 0.93 0.88 read-only_range=10
0.93 0.81 0.98 0.88 0.69 scan_range=100
1.20 1.23 1.17 1.18 1.12 delete_range=100
1.25 1.21 1.26 1.17 1.11 insert_range=100
1.15 1.18 1.06 1.05 0.99 read-write_range=100
1.09 1.14 0.99 1.01 0.95 read-write_range=10
5.18 5.55 3.88 4.06 3.61 update-index_range=100
1.33 1.55 1.64 1.65 1.62 update-inlist_range=100
1.24 1.25 1.27 1.21 1.16 update-nonindex_range=100
1.02 1.04 1.05 1.03 0.97 update-one_range=100
1.25 1.28 1.30 1.23 1.19 update-zipf_range=100
1.32 1.35 1.17 1.21 1.09 write-only_range=10000

AMD Threadripper Pro 5975WX

Tests were run with 24 threads, 8 tables and 10M rows per table. The numbers are the relative QPS which is explained above and the base version is MyRocks 5.6.35 at git sha f89641gf with RocksDB 6.19 for MyRocks and MySQL 5.6.51 for InnoDB.

MyRocks

Relative to: x.fbmy5635_rel_o2nofp_210407_f896415f_6190.za1_c32r128.pk1
col-1 : x.fbmy5635_rel_o2nofp_231016_4f3a57a1_870.za1_c32r128.pk1
col-2 : x.fbmy8028_rel_o2nofp_220829_a35c8dfe_752.za1_c32r128.pk1
col-3 : x.fbmy8028_rel_o2nofp_231202_4edf1eec_870.za1_c32r128.pk1
col-4 : x.fbmy8032_rel_o2nofp_231204_e3a854e8_870.za1_c32r128.pk1
col-5 : x.fbmy8032_rel_o2nofp_240529_49b37dfe_921.za1_c32r128.pk1

col-1 col-2 col-3 col-4 col-5
0.80 0.83 0.76 0.92 0.88 hot-points_range=100
0.97 0.88 0.88 0.85 0.84 point-query.pre_range=100
0.97 0.87 0.87 0.85 0.85 point-query_range=100
0.97 0.88 0.86 0.97 0.96 points-covered-pk.pre_range=100
0.96 0.88 0.88 0.94 1.00 points-covered-pk_range=100
1.29 1.15 1.20 1.38 1.33 points-covered-si.pre_range=100
1.30 0.94 1.16 1.26 1.23 points-covered-si_range=100
0.96 0.88 0.85 0.96 0.94 points-notcovered-pk.pre_range=100
0.96 0.89 0.88 0.94 1.00 points-notcovered-pk_range=100
1.10 1.05 1.07 1.15 1.14 points-notcovered-si.pre_range=100
1.14 0.91 1.05 1.11 1.08 points-notcovered-si_range=100
0.94 0.52 0.53 0.93 0.91 random-points.pre_range=1000
0.96 0.88 0.86 0.96 0.94 random-points.pre_range=100
0.97 1.18 1.19 1.18 1.15 random-points.pre_range=10
0.96 0.65 0.67 0.92 1.00 random-points_range=1000
0.96 0.89 0.88 0.94 1.00 random-points_range=100
0.98 1.12 1.15 1.15 1.17 random-points_range=10
0.97 1.08 1.11 1.10 1.06 range-covered-pk.pre_range=100
0.97 1.02 1.09 1.08 1.02 range-covered-pk_range=100
0.97 1.11 1.14 1.12 1.06 range-covered-si.pre_range=100
0.97 1.05 1.11 1.09 1.05 range-covered-si_range=100
0.91 1.04 1.05 1.03 1.02 range-notcovered-pk.pre_range=100
0.92 0.99 1.05 1.02 1.00 range-notcovered-pk_range=100
0.94 0.96 0.95 0.96 0.93 range-notcovered-si.pre_range=100
0.96 0.93 0.94 0.95 1.01 range-notcovered-si_range=100
0.98 1.00 0.98 0.97 0.97 read-only.pre_range=10000
0.96 1.01 1.01 0.99 0.98 read-only.pre_range=100
0.98 1.01 1.02 0.99 0.98 read-only.pre_range=10
0.98 0.96 0.95 0.96 0.93 read-only_range=10000
0.95 0.99 1.01 1.00 0.99 read-only_range=100
0.97 0.99 1.02 1.00 1.00 read-only_range=10
0.97 0.64 0.71 0.62 0.60 scan_range=100
0.96 0.97 0.94 0.92 0.85 delete_range=100
0.96 0.83 0.80 0.78 0.77 insert_range=100
0.97 1.02 1.01 0.99 0.98 read-write_range=100
0.97 1.01 1.01 0.98 0.99 read-write_range=10
0.98 0.97 0.94 0.92 0.92 update-index_range=100
1.04 1.01 0.97 0.96 0.95 update-inlist_range=100
0.97 0.96 0.93 0.90 0.90 update-nonindex_range=100
0.92 0.98 0.94 0.92 0.92 update-one_range=100
0.97 0.96 0.93 0.90 0.90 update-zipf_range=100
0.98 1.02 1.00 0.97 0.96 write-only_range=10000

InnoDB

Relative to: x.my5651_rel_o2nofp.z11a_c32r128.pk1
col-1 : x.my5710_rel_o2nofp.z11a_c32r128.pk1
col-2 : x.my5744_rel_o2nofp.z11a_c32r128.pk1
col-3 : x.my8011_rel_o2nofp.z11a_c32r128.pk1
col-4 : x.my8028_rel_o2nofp.z11a_c32r128.pk1
col-5 : x.my8039_rel_o2nofp.z11a_c32r128.pk1

col-1 col-2 col-3 col-4 col-5
2.24 2.22 2.08 1.60 1.64 hot-points_range=100
1.03 1.00 0.93 0.88 0.82 point-query.pre_range=100
1.04 1.01 0.94 0.89 0.83 point-query_range=100
1.71 1.77 1.66 1.35 1.37 points-covered-pk.pre_range=100
1.71 1.77 1.66 1.36 1.39 points-covered-pk_range=100
0.88 0.89 0.84 0.73 0.73 points-covered-si.pre_range=100
0.88 0.89 0.84 0.73 0.74 points-covered-si_range=100
1.66 1.74 1.61 1.33 1.35 points-notcovered-pk.pre_range=100
1.66 1.74 1.62 1.34 1.37 points-notcovered-pk_range=100
0.85 0.88 0.82 0.74 0.72 points-notcovered-si.pre_range=100
0.85 0.89 0.82 0.74 0.72 points-notcovered-si_range=100
1.70 1.85 1.72 0.80 1.47 random-points.pre_range=1000
1.66 1.75 1.62 1.33 1.36 random-points.pre_range=100
1.01 1.01 0.93 0.87 0.82 random-points.pre_range=10
1.70 1.85 1.72 0.80 1.48 random-points_range=1000
1.66 1.75 1.62 1.35 1.37 random-points_range=100
1.01 1.02 0.94 0.87 0.83 random-points_range=10
0.93 0.90 0.88 0.80 0.76 range-covered-pk.pre_range=100
0.94 0.89 0.88 0.81 0.76 range-covered-pk_range=100
0.92 0.88 0.87 0.82 0.77 range-covered-si.pre_range=100
0.92 0.88 0.87 0.82 0.77 range-covered-si_range=100
0.94 0.90 0.84 0.78 0.74 range-notcovered-pk.pre_range=100
0.94 0.90 0.84 0.78 0.74 range-notcovered-pk_range=100
0.85 0.86 0.82 0.73 0.67 range-notcovered-si.pre_range=100
0.85 0.87 0.82 0.74 0.67 range-notcovered-si_range=100
0.95 0.94 0.94 1.11 1.07 read-only.pre_range=10000
1.16 1.13 1.08 1.01 0.96 read-only.pre_range=100
1.10 1.07 1.02 0.97 0.91 read-only.pre_range=10
0.95 0.94 0.97 1.14 1.10 read-only_range=10000
1.16 1.13 1.08 1.02 0.96 read-only_range=100
1.10 1.08 1.02 0.97 0.92 read-only_range=10
0.98 0.90 1.03 0.92 0.75 scan_range=100
1.41 1.51 1.44 1.52 1.42 delete_range=100
1.51 1.52 1.59 1.50 1.41 insert_range=100
1.13 1.06 1.06 1.02 0.96 read-write_range=100
1.21 1.12 1.14 1.10 1.04 read-write_range=10
3.12 3.62 1.44 3.21 1.77 update-index_range=100
1.53 1.82 1.99 2.00 2.02 update-inlist_range=100
1.25 1.31 1.33 1.30 1.23 update-nonindex_range=100
0.98 1.01 1.02 1.04 0.98 update-one_range=100
1.20 1.26 1.28 1.26 1.19 update-zipf_range=100
1.38 1.44 1.29 1.33 1.24 write-only_range=10000

The impact of PGO, LTO and more for MySQL on a small CPU

This has results for MySQL 8.0.28 vs sysbench on a laptop-class CPU (AMD 4700u) to understand the impact of various compile-time optimizatio...