I am a non-expert in many build tools -- CMake for MySQL, autoconf for Postgres, scons for MongoDB and Maven for Linkbench. While working to confirm my MySQL builds are OK I used sysbench to compare several of them and was confused by the results. This is part 3 of my adventure - parts 1 and 2 are here and here.
tl;dr
- RelWithDebInfo uses link time optimization by default
- Release does not use link time optimization by default
- Performance is better with link time optimization
- Link time optimization helped point queries more than range queries or writes
- RelWithDebInfo uses -O2, link time optimization (-flto) and -fstack-protector-strong
- Release uses -O3 but does not use link time optimization by default
Then I remembered that RelWithDebInfo used -flto while Release did not. I ignored that last week, but it turned out the explain the difference.
Why don't Release builds use link time optimization? Yura Sorokin explained this to me. In CMakeLists.txt there is this code that sets WITH_PACKAGE_FLAGS_DEFAULT to ON for RelWithDebInfo and OFF for Release. When that is set to ON then the output from dpkg-buildflags is added to compile and linker command lines (dpkg-buildflags --get $X for X in CPPFLAGS, CFLAGS, CXXFLAGS, LDFLAGS). And on Ubuntu 22.04 I see:
$ dpkg-buildflags --get CPPFLAGS
-Wdate-time -D_FORTIFY_SOURCE=2
$ dpkg-buildflags --get CFLAGS
-g -O2 -ffile-prefix-map=/home/mdcallag/git/mytools/bench/sysbench.lua/r.1tab.1thr.feb23.repro=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security
$ dpkg-buildflags --get CXXFLAGS
-g -O2 -ffile-prefix-map=/home/mdcallag/git/mytools/bench/sysbench.lua/r.1tab.1thr.feb23.repro=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security
$ dpkg-buildflags --get LDFLAGS
-Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro
Thus, I get -flto by default with RelWithDebInfo but not with Release. To enable link time optimization for Release I can do one of:
- RelWithDebInfo - CMAKE_BUILD_TYPE=RelWithDebInfo, LTO, -O2
- Release - CMAKE_BUILD_TYPE=Release, -O3, does not use LTO
- Release+LTO - CMAKE_BUILD_TYPE=Release, LTO, -O3
- Release+LTO+O2 - uses CMAKE_BUILD_TYPE=Release, LTO, -O2
- Release+LTO+Native - CMAKE_BUILD_TYPE=Release, LTO, -march=native, -mtune=native
- range-notcovered-si.pre_range=100 - uses oltp_range_covered.lua on a non-covering secondary index, runs prior to write heavy tests
- range-notcovered-si_range=100 - uses oltp_range_covered.lua on a non-covering secondary index, runs after write heavy tests
- delete_range=100 - uses oltp_delete.lua, the delete statement is here
- update-index_range=100 - uses oltp_update_index.lua, the update statement is here and requires secondary index maintenance
cmake .. \-DCMAKE_BUILD_TYPE=RelWithDebInfo \-DWITH_SSL=system \-DWITH_ZLIB=bundled \-DMYSQL_MAINTAINER_MODE=0 \-DENABLED_LOCAL_INFILE=1 \-DCMAKE_INSTALL_PREFIX=$1 \-DWITH_BOOST=$PWD/../boost \-DWITH_NUMA=ON \-DWITH_ROUTER=OFF \-DWITH_MYSQLX=OFF \-DWITH_UNIT_TESTS=OFF
BF=" -g1 "CF=" $BF "CXXF=" $BF "cmake .. \-DCMAKE_BUILD_TYPE=Release \-DWITH_SSL=system \-DWITH_ZLIB=bundled \-DMYSQL_MAINTAINER_MODE=0 \-DENABLED_LOCAL_INFILE=1 \-DCMAKE_INSTALL_PREFIX=$1 \-DWITH_BOOST=$PWD/../boost \-DCMAKE_CXX_FLAGS="$CXXF" -DCMAKE_C_FLAGS="$CF" \-DWITH_NUMA=ON \-DWITH_ROUTER=OFF \-DWITH_MYSQLX=OFF \-DWITH_UNIT_TESTS=OFF
BF=" -g1 "CF=" $BF "CXXF=" $BF "cmake .. \-DCMAKE_BUILD_TYPE=Release \-DWITH_SSL=system \-DWITH_ZLIB=bundled \-DMYSQL_MAINTAINER_MODE=0 \-DENABLED_LOCAL_INFILE=1 \-DCMAKE_INSTALL_PREFIX=$1 \-DWITH_BOOST=$PWD/../boost \-DCMAKE_CXX_FLAGS="$CXXF" -DCMAKE_C_FLAGS="$CF" \-DWITH_LTO=ON \-DWITH_NUMA=ON \-DWITH_ROUTER=OFF -DWITH_MYSQLX=OFF -DWITH_UNIT_TESTS=OFF
BF=" -g1 "CF=" $BF "CXXF=" $BF "cmake .. \-DCMAKE_BUILD_TYPE=Release \-DWITH_SSL=system \-DWITH_ZLIB=bundled \-DMYSQL_MAINTAINER_MODE=0 \-DENABLED_LOCAL_INFILE=1 \-DCMAKE_INSTALL_PREFIX=$1 \-DWITH_BOOST=$PWD/../boost \-DCMAKE_CXX_FLAGS="$CXXF" -DCMAKE_C_FLAGS="$CF" \-DCMAKE_C_FLAGS_RELEASE="-O2 -DNDEBUG" \-DCMAKE_CXX_FLAGS_RELEASE="-O2 -DNDEBUG" \-DWITH_LTO=ON \-DWITH_NUMA=ON \-DWITH_ROUTER=OFF -DWITH_MYSQLX=OFF -DWITH_UNIT_TESTS=OFF
BF=" -march=native -mtune=native -g1 "CF=" $BF "CXXF=" $BF "cmake .. \-DCMAKE_BUILD_TYPE=Release \-DWITH_SSL=system \-DWITH_ZLIB=bundled \-DMYSQL_MAINTAINER_MODE=0 \-DENABLED_LOCAL_INFILE=1 \-DCMAKE_INSTALL_PREFIX=$1 \-DWITH_BOOST=$PWD/../boost \-DCMAKE_CXX_FLAGS="$CXXF" -DCMAKE_C_FLAGS="$CF" \-DWITH_LTO=ON \-DWITH_NUMA=ON \-DWITH_ROUTER=OFF -DWITH_MYSQLX=OFF -DWITH_UNIT_TESTS=OFF
< -D_FORTIFY_SOURCE=2< -ffat-lto-objects< -ffile-prefix-map=/foobar/build.rel_withdbg=.< -flto=auto< -fstack-protector-strong< -g< -O2---> -O3
< -D_FORTIFY_SOURCE=2< -ffat-lto-objects< -ffile-prefix-map=/foobar/build.rel_withdbg=.< -fstack-protector-strong< -g< -flto=auto---> -flto< -O2---> -O3
< -D_FORTIFY_SOURCE=2< -ffat-lto-objects< -ffile-prefix-map=/foobar/build.rel_withdbg=.< -fstack-protector-strong< -g< -flto=auto---> -flto
< -D_FORTIFY_SOURCE=2< -ffat-lto-objects< -ffile-prefix-map=/foobar/build.rel_withdbg=.< -fstack-protector-strong< -g---> -march=native> -mtune=native< -flto=auto---> -flto< -O2---> -O3
Eventually I will consider PGO. But I have so many questions. I have also been burned by odd perf results from code compiled with the wrong PGO profiles and suspect I am not alone -- there is more complexity.
ReplyDeleteOne question is the training workload from which the PGO profiles are collected. I am extra reluctant to collect profiles from workload A for a benchmark of workload A. I might be willing to collect them from multiple workloads, and then create one binary used for all workloads.
But if you have one workload that is widely deployed then doing a mysqld binary per workload might be OK.
Would be nice if more people were to publish on this. And note that my focus isn't peak performance -- I am mostly looking for performance and efficiency changes over time.