Small Datum: RocksDB benchmarks: small server, leveled compaction

I shared benchmark results for RocksDB a few weeks ago and there was a suggestion for me to repeat tests using different (older) values for format_version. Then while replacing a failed SSD, I also updated the OS and changed a few kernel-related config options. Thus, I ended up repeating all tests.

This post has results from a small server with leveled compaction. Results from a large server and from universal compaction are in progress.

tl;dr - on a small server with a low concurrency workload

older values of format_version (2 thru 5) don't impact QPS
auto hyperclock cache makes read-heavy tests up to 15% faster
for a cached database

QPS drops by 5% to 15% from RocksDB 6.0.2 to 9.7.2
QPS hhas been stable since 8.0

for an IO-bound database with buffered IO

bug 12038 hurts QPS for overwrite (will be fixed soon in 9.7)
QPS for fillseq has been stable
QPS for read-heavy tests is 15% to 20% better in RocksDB 9.7.2 vs 6.0.2

for an IO-bound database with O_DIRECT

QPS for fillseq is ~11% less in 9.7.2 vs 6.0.2 but has been stable since 7.0. My vague memory is that the issue is new CPU overhead from better error checking.
QPS for overwrite is stable
QPS for read-heavy tests is 16% to 38% better in RocksDB 9.7.1 vs 6.0.2

Hardware

The small server is named SER7 and is a Beelink SER7 7840HS (see here) with 8 cores, AMD SMT disabled, a Ryzen 7 7840HS CPU, Ubuntu 22.04. Storage is ext4 with data=writeback and 1 NVMe device.

The storage device has 128 for max_hw_sectors_kb and max_sectors_kb. This is relevant for bug 12038 which will be fixed real soon in a 9.7 patch release.

Builds

I compiled db_bench from source on all servers. I used versions:

6.x - 6.0.2, 6.10.4, 6.20.4, 6.29.5
7.x - 7.0.4, 7.3.2, 7.6.0, 7.10.2
8.x - 8.0.0, 8.3.3, 8.6.7, 8.9.2, 8.11.4
9.x - 9.0.1, 9.1.2, 9.2.2, 9.3.2, 9.4.1, 9.5.2, 9.6.1 and 9.7.2 at git sha b5cde68b8a

Benchmark

All tests used the default value for compaction_readahead_size. For all versions tested I used the default values for the block cache (LRU) and format_version. For 9.6.1 I repeated tests using the hyperclock cache (default. is LRU) and format_version =2, =3, =4 and =5 (default is =6).

I used my fork of the RocksDB benchmark scripts that are wrappers to run db_bench. These run db_bench tests in a special sequence -- load in key order, read-only, do some overwrites, read-write and then write-only. The benchmark was run using 1 thread for the small server and 8 threads for the medium server. How I do benchmarks for RocksDB is explained here and here. The command line to run the tests is:

# Small server, SER7: use 1 thread, 20M KV pairs for cached, 400M for IO-bound

bash x3.sh 1 no 1800 c8r32 20000000 400000000 byrx iobuf iodir

The tests on the charts are named as:

fillseq -- load in key order with the WAL disabled
revrangeww -- reverse range while writing, do short reverse range scans as fast as possible while another thread does writes (Put) at a fixed rate
fwdrangeww -- like revrangeww except do short forward range scans
readww - like revrangeww except do point queries
overwrite - do overwrites (Put) as fast as possible

Workloads

There are three workloads, all of which use one client (thread):

byrx - the database is cached by RocksDB
iobuf - the database is larger than memory and RocksDB uses buffered IO
iodir - the database is larger than memory and RocksDB uses O_DIRECT

A spreadsheet with all results is here and performance summaries with more details are linked below:

Relative QPS

The numbers in the spreadsheet and on the y-axis in the charts that follow are the relative QPS which is (QPS for $me) / (QPS for $base). When the value is greater than 1.0 then $me is faster than $base. When it is less than 1.0 then $base is faster (perf regression!).

The base version is RocksDB 6.0.2 for the all versions tests and 9.6.1 with my standard configuration for the 9.6 variations tests.

Results: byrx with 9.6 variations

The byrx tests use a cached database. The performance summary is here. This has results for RocksDB 9.6.1 using my standard configuration and the variations are:

fv2 - uses format_version=2 instead of the default (=6)
fv3 - uses format_version=3
fv4 - uses format_version=4
fv5 - uses formatio_version=5
ahcc - uses auto_hyper_clock_cache instead of the default (LRU)

This chart shows the relative QPS for RocksDB 9.6.1 with a given configuration relative to 9.6.1 with my standard configuration. The y-axis doesn't start at 0 to improve readability.

Summary:

Using different values of format_version don't have a large impact here
Using auto hyperclock instead of LRU improves read-heavy QPS by up to 15%

Results: byrx with all versions

The byrx tests use a cached database. The performance summary is here.

This chart shows the relative QPS for a given version of RocksDB 6.0.2. The y-axis doesn't start at 0 to improve readability.

Summary:

QPS drops by 5% to 15% from RocksDB 6.0.2 to 9.7.2
Performance has been stable since 8.0
For overwrite the excellent result in RocksDB 6.0.2 comes at the cost of bad write stalls (see pmax here)

Results: iobuf with all versions

The iobuf tests use a database larger than memory with buffered IO. The performance summary is here.

This chart shows the relative QPS for a given version of RocksDB 6.0.2. The y-axis doesn't start at 0 to improve readability.

Summary:

bug 12038 explains the regression for overwrite (fixed soon in 9.7)
QPS for fillseq has been stable
QPS for read-heavy tests is 15% to 20% better in RocksDB 9.7.2 vs 6.0.2

Results: iodir with all versions

The iobuf tests use a database larger than memory with O_DIRECT. The performance summary is here.

This chart shows the relative QPS for a given version of RocksDB 6.0.2. The y-axis doesn't start at 0 to improve readability.

Summary:

QPS for fillseq is ~11% less in 9.7.2 vs 6.0.2 but has been stable since 7.0. My vague memory is that the issue is new CPU overhead from better error checking.
QPS for overwrite is stable
QPS for read-heavy tests is 16% to 38% better in RocksDB 9.7.1 vs 6.0.2

Small Datum

Thursday, October 24, 2024

RocksDB benchmarks: small server, leveled compaction

No comments:

Post a Comment

Trying out Advanced MySQL