Small Datum: RocksDB benchmarks: large server, universal compaction

Monday, November 25, 2024

RocksDB benchmarks: large server, universal compaction

This post has results from a large server with universal compaction from the same server for which I recently shared leveled compaction results. The results are boring (no large regressions) but a bit more exciting than the ones for leveled compaction because there is more variance. A somewhat educated guess is that variance more likely with universal.

tl;dr

there are some small regressions for cached workloads (see byrx below)
there are some small to medium improvements for IO-bound workloads (see iodir and iobuf)
modern RocksDB would look better were I to use the Hyper Clock block cache, but here I don't to test similar code across all versions

Hardware

The server is an ax162-s from Hetzner with an AMD EPYC 9454P processor, 48 cores, AMD SMT disabled and 128G RAM. The OS is Ubuntu 22.04. Storage is 2 NVMe devices with SW RAID 1 and ext4.

Builds

I compiled db_bench from source on all servers. I used versions:

6.x - 6.0.2, 6.10.4, 6.20.4, 6.29.5
7.x - 7.0.4, 7.3.2, 7.6.0, 7.10.2
8.x - 8.0.0, 8.3.3, 8.6.7, 8.9.2, 8.11.4
9.x - 9.0.1, 9.1.2, 9.2.2, 9.3.2, 9.4.1, 9.5.2, 9.6.1 and 9.7.3

Benchmark

All tests used the default value for compaction_readahead_size and the block cache (LRU).

I used my fork of the RocksDB benchmark scripts that are wrappers to run db_bench. These run db_bench tests in a special sequence -- load in key order, read-only, do some overwrites, read-write and then write-only. The benchmark was run using 40 threads. How I do benchmarks for RocksDB is explained here and here. The command line to run the tests is: bash x3.sh 40 no 1800 c48r128 100000000 2000000000 byrx iobuf iodir

The tests on the charts are named as:

fillseq -- load in key order with the WAL disabled
revrangeww -- reverse range while writing, do short reverse range scans as fast as possible while another thread does writes (Put) at a fixed rate
fwdrangeww -- like revrangeww except do short forward range scans
readww - like revrangeww except do point queries
overwrite - do overwrites (Put) as fast as possible

Workloads

There are three workloads, all of which use 40 threads:

byrx - the database is cached by RocksDB (100M KV pairs)
iobuf - the database is larger than memory and RocksDB uses buffered IO (2B KV pairs)
iodir - the database is larger than memory and RocksDB uses O_DIRECT (2B KV pairs)

A spreadsheet with all results is here and performance summaries with more details are here for byrx, iobuf and iodir.

Relative QPS

The numbers in the spreadsheet and on the y-axis in the charts that follow are the relative QPS which is (QPS for $me) / (QPS for $base). When the value is greater than 1.0 then $me is faster than $base. When it is less than 1.0 then $base is faster (perf regression!).

The base version is RocksDB 6.0.2.

Results: byrx

The byrx tests use a cached database. The performance summary is here.

The chart shows the relative QPS for a given version relative to RocksDB 6.0.2. There are two charts and the second narrows the range for the y-axis to make it easier to see regressions.

Summary:

fillseq has new CPU overhead in 7.0 from code added for correctness checks and QPS has been stable since then
QPS for other tests has been stable, with some variance, since late 6.x

Results: iobuf

The iodir tests use an IO-bound database with buffered. The performance summary is here.

The chart shows the relative QPS for a given version relative to RocksDB 6.0.2. There are two charts and the second narrows the range for the y-axis to make it easier to see regressions.

Summary:

fillseq has been stable since 7.6
readww has always been stable
overwrite improved in 7.6 and has been stable since then
fwdrangeww and revrangeww improved in late 6.0 and have been stable since then

Results: iodir

The iodir tests use an IO-bound database with O_DIRECT. The performance summary is here.

The chart shows the relative QPS for a given version relative to RocksDB 6.0.2. There are two charts and the second narrows the range for the y-axis to make it easier to see regressions.

Summary: