This has results for MySQL 8.0.28 vs sysbench on a laptop-class CPU (AMD 4700u) to understand the impact of various compile-time optimizations including LTO and PGO. I will have results from a server-class CPU next weeok. I previously published results on this but then retracted some of them as they appeared bogus.
tl;dr
- clang and gcc have similar performance with a few exceptions
- clang does ~7% better than gcc with PGO
- clang does ~25% better than gcc with PGO+LTO
- oddly, gcc with PGO is ~3% faster than gcc with PGO+LTO
- LTO makes things ~6% faster
- PGO or PGO+LTO makes things between 1.4X and 1.8X faster
- gcc and clang without -march=native -mtune=native are ~2% faster than with those options
Builds
I compiled upstream MySQL 8.0.28 from source. The my.cnf file is here.
All work (builds & tests) are done on a server that uses Ubuntu 22.04.5. The gcc version is 11.4.0 and the clang version is 14.0.0-1ubuntu1.1.
The CMake command lines for all of the builds are here. All builds use CMAKE_BUILD_TYPE =Release. The builds are listed below and the name of the CMake command line file per build is cmk.80.$val where $val is the build name listed below:
- rel_o2nofp
- uses gcc, -O2 instead of -O3, adds -fno-omit-frame-pointer, works great for flamegraphs
- rel_o2nofp_clang
- uses clang, -O2 instead of -O3, adds -fno-omit-frame-pointer, works great for flamegraphs
- rel
- uses gcc, -O3, doesn't work great for flamegraphs
- rel_clang
- uses clang, -O3, doesn't work great for flamegraphs
- rel_o2nofp_lto
- like rel_o2nofp but adds -DWITH_LTO=ON to get link-time optimization
- rel_o2nofp_lto_clang
- like rel_o2nofp_clang but adds -DWITH_LTO=ON to get link-time optimization
- rel_lto
- like rel but adds -DWITH_LTO=ON to get link-time optimization
- rel_lto_clang
- like rel_clang but adds -DWITH_LTO=ON to get link-time optimization
- rel_o2nofp_native
- like rel_o2nofp but adds -march=native -mtune=native
- rel_o2nofp_native_clang
- like rel_o2nofp_clang but adds -march=native -mtune=native
- rel_native
- like rel but adds -march=native -mtune=native
- rel_native_clang
- like rel_clang but adds -march=native -mtune=native
- rel_o2nofp_native_lto
- like rel_o2nofp but adds -march=native -mtune=native and -DWITH_LTO=ON
- rel_o2nofp_native_lto_clang
- like rel_o2nofp_clang but adds -march=native -mtune=native and -DWITH_LTO=ON
- rel_native_lto
- like rel but adds -march=native -mtune=native and -DWITH_LTO=ON
- rel_native_lto_clang
- like rel_clang but adds -march=native -mtune=native and -DWITH_LTO=ON
- rel_pgo_use
- like rel but uses PGO via -fprofile-generate
- rel_pgo_use_clang
- like rel_clang but uses PGO via -fprofile-generate
- rel_lto_pgo_use
- like rel but uses PGO via -fprofile-generate and -DWITH_LTO=ON
- rel_lto_pgo_use_clang
- like rel_clang but uses PGO via -fprofile-generate and -DWITH_LTO=ON
For PGO I do the following:
- compile mysqld with -fprofile-generate
- run sysbench to generate profiles
- recompile mysqld with -fprofile-use using those profiles
- rerun sysbench to get the results that I report.
For PGO with gcc I use -fprofile-generate rather than -fprofile-generate=$path and when I run mysqld the gcda files are written into the subdirectories of the object files from when I compiled MySQL. That worked because I used the same host for compiling and testing.
For PGO with clang I use -fprofile-generate rather than -fprofile-generate=$path and when I ran mysqld the profdata files were empty (the profiling data). Then I prepended this to the mysqld command line, but the profdata files were still empty:
For PGO with clang I use -fprofile-generate rather than -fprofile-generate=$path and when I ran mysqld the profdata files were empty (the profiling data). Then I prepended this to the mysqld command line, but the profdata files were still empty:
LLVM_PROFILE_FILE="code-%p.profraw"
The problem appears to be that mysqld calls chdir($data-dir) at startup which confuses the clang profiling support. But making that path absolute fixes the problem:
The problem appears to be that mysqld calls chdir($data-dir) at startup which confuses the clang profiling support. But making that path absolute fixes the problem:
LLVM_PROFILE_FILE="$PWD/code-%p.profraw"
By mysqld command line I mean that I started mysqld via mysqld_safe, did ps to see the command line, and then used that command line in place of mysqld_safe.
Finally, once I had non-empty profdata files, I had to run llvm-profdata on them and the output (rel_pgo_gen_clang.profdata or rel_lto_pgo_gen_clang.profdata) was used as input when I compiled mysqld with clang and -fprofile-use.
llvm-profdata merge -output=rel_pgo_gen_clang.profdata code-296681.profraw
Hardware
The server here is a Beelink SER4 with an AMD Ryzen 7 4700 CPU with SMT disabled, 8 cores, 16G of RAM and Ubuntu 22.04. The storage is 1 NVMe device.
The CPU used here (AMD 4700u) is described as a laptop class CPU. The server is configured to use the performance frequency governor and acpi-cpufreq scaling driver.
Benchmark
I used sysbench and my usage is explained here. A full run has 42 microbenchmarks and most test only 1 type of SQL statement. But here I skip the read-only tests that run prior to writes to save time. The database is cached by InnoDB.
The benchmark is run with 1 thread, 1 table and 30M rows. Each microbenchmark runs for 300 seconds if read-only and 600 seconds otherwise. Prepared statements were enabled.
The command lines for my helper script was:
bash r.sh 1 30000000 300 600 nvme0n1 1 1 1
Results
All of the results are here using relative QPS (rQPS) where:
- rQPS is: (QPS for my version / QPS for base version)
- base version is the rel_o2nofp build
- my version is one of the other builds
I then compute the median and average of the rQPS for each build. There is no result below for rel_o2nofp because that is the base version and the avg and median for it are 1.
avg median build
0.98 0.98 rel_o2nofp_clang
...
1.00 1.00 rel
0.98 0.98 rel_clang
...
1.05 1.05 rel_o2nofp_lto
1.06 1.06 rel_o2nofp_lto_clang
...
1.07 1.07 rel_lto
1.07 1.07 rel_lto_clang
...
0.99 0.99 rel_o2nofp_native
0.97 0.97 rel_o2nofp_native_clang
...
0.99 0.99 rel_native
0.98 0.98 rel_native_clang
...
1.05 1.04 rel_o2nofp_native_lto
1.05 1.05 rel_o2nofp_native_lto_clang
...
1.06 1.06 rel_native_lto
1.06 1.06 rel_native_lto_clang
...
1.42 1.48 rel_pgo_use
1.52 1.59 rel_pgo_use_clang
...
1.39 1.43 rel_lto_pgo_use
1.77 1.76 rel_lto_pgo_use_clang
All good ideas, I just need to find spare time or SmallDatum LLC needs to find a client to sponsor this work on ARM. I have done some work on ARM in the past
ReplyDeletehttps://smalldatum.blogspot.com/2023/02/sysbench-arm-x86-public-cloud.html
https://smalldatum.blogspot.com/2023/01/the-insert-benchmark-on-arm-and-x86.html