The RocksDB benchmark took, db_bench, includes several microbenchmarks to test the performance for hash and checksum functions, compression and decompression. The microbenchmarks measure the latency for these operations per block and the typical block size for me is 4kb or 8kb. A script that I use to run these is here.
The goal for this work is to determine whether there are compiler and other software perf bugs that can be fixed. One such bug has already been found and fixed for clang. These tests can also help me find bugs in the Makefiles used by RocksDB and opportunities to improve the compiler flags.
Disclaimer
- these are microbenchmarks run in a tight loop which can distort results
- it would be great to learn that some of these problems can be fixed via compiler options
tl;dr
- Hopefully perf for crc32c with clang on x86 will improve once the bug fix reaches Ubuntu 22
- Perf for xxh3 on Arm can be improved because c6i.2xl is ~2.4X to ~5X faster than c7g.2xl
- Perf for xxh3 on Arm with gcc can be improved because clang is ~1.6X faster than gcc
- RocksDB uses xxh3 from the dev branch as of Aug, 2021 and c6i.2xl (x86) is ~1.5X faster than c7g.2xl (Graviton3) with that code. With latest code from the dev branch c6i.2xl is only ~1.14X faster -- when xxh3 at 4kb is the metric. Scroll down to Update 1 for more details.
- If compiling RocksDB on ARM you might want to edit CXXFLAGS and CFLAGS in Makefile (see here). That is hardwired to -march=armv8-a+crc+crypto and you might want to try -march=native or -mcpu=native. I tried all of these, while that did not change xxh3 perf the current hardwired value might not be great for modern ARM.
Hardware
I tested several CPUs using RocksDB compiled with gcc and clang and share a few interesting results. In all cases I used Ubuntu 22.04 with gcc 11.3.0 and clang 14.0.0. The servers tested are:
- Intel at home
- Intel NUC8i7beh (i7-8559u) with turbo boost disabled via BIOS
- AMD at home
- Beelink SER 4700u with Ryzen 7 4700u with CPU frequency boost disabled via: echo '0' > /sys/devices/system/cpu/cpufreq/boost
- x86 on AWS
- c6i.2xlarge with Intel Xeon Platinum 8375C CPU @ 2.90GHz with hyperthreading disabled
- Arm on AWS
- c7g with Graviton 3
Compiler command lines:
make DISABLE_WARNING_AS_ERROR=1 DEBUG_LEVEL=0 V=1 VERBOSE=1 -j8 static_lib db_bench
make CC=clang CXX=clang++ DISABLE_WARNING_AS_ERROR=1 DEBUG_LEVEL=0 V=1 VERBOSE=1 -j8 static_lib db_bench
The results are here:
- RocksDB uses xxhash.h from the xxHash dev branch with the last update from Aug 6, 2021 per this commit which gets xxHash as of this commit.
- Using benchHash from xxHash repo, xxh3 perf on c7g.2xl improved between release branch and latest on dev branch. Release is at version 0.8.1, last commit was from Nov 29, 2021.
Using benchHash from xxHash repo and looking at xxh3 for 4kb (the 4th number on the line that starts with "xxh3", the number is MB/s), compiled with gcc -O3, all for c7g.2xl (ARM):
- 14596 from release branch
- 21705 from latest on dev branch at git hash 4ebd833a2
- 14745 from dev branch at git hash 2c611a76f which is what RocksDB uses
- 21863 from dev branch at git hash 620facc5 which is an ARM specific optimization from Aug, 2022. There were other diffs before and after this one that also help xxh3 on ARM. For reference, here is perf for the diff (c4359b17) immediately preceding 620facc5.
No comments:
Post a Comment