Thursday, October 24, 2024

RocksDB benchmarks: small server, leveled compaction

I shared benchmark results for RocksDB a few weeks ago and there was a suggestion for me to repeat tests using different (older) values for format_version. Then while replacing a failed SSD, I also updated the OS and changed a few kernel-related config options. Thus, I ended up repeating all tests.

This post has results from a small server with leveled compaction. Results from a large server and from universal compaction are in progress.

tl;dr - on a small server with a low concurrency workload

  • older values of format_version (2 thru 5) don't impact QPS
  • auto hyperclock cache makes read-heavy tests up to 15% faster
  • for a cached database
    • QPS drops by 5% to 15% from RocksDB 6.0.2 to 9.7.2
    • QPS has been stable since 8.0
  • for an IO-bound database with buffered IO
    •  bug 12038 hurts QPS for overwrite (will be fixed soon in 9.7)
    • QPS for fillseq has been stable
    • QPS for read-heavy tests is 15% to 20% better in RocksDB 9.7.2 vs 6.0.2
  • for an IO-bound database with O_DIRECT
    • QPS for fillseq is ~11% less in 9.7.2 vs 6.0.2 but has been stable since 7.0. My vague memory is that the issue is new CPU overhead from better error checking.
    • QPS for overwrite is stable
    • QPS for read-heavy tests is 16% to 38% better in RocksDB 9.7.1 vs 6.0.2

Hardware

The small server is named SER7 and is a Beelink SER7 7840HS (see here) with 8 cores, AMD SMT disabled, a Ryzen 7 7840HS CPU, Ubuntu 22.04. Storage is ext4 with data=writeback and 1 NVMe device. 

The storage device has 128 for max_hw_sectors_kb and max_sectors_kb. This is relevant for bug 12038 which will be fixed real soon in a 9.7 patch release.

Builds

I compiled db_bench from source on all servers. I used versions:
  • 6.x - 6.0.2, 6.10.4, 6.20.4, 6.29.5
  • 7.x - 7.0.4, 7.3.2, 7.6.0, 7.10.2
  • 8.x - 8.0.0, 8.3.3, 8.6.7, 8.9.2, 8.11.4
  • 9.x - 9.0.1, 9.1.2, 9.2.2, 9.3.2, 9.4.1, 9.5.2, 9.6.1 and 9.7.2 at git sha b5cde68b8a
Benchmark

All tests used the default value for compaction_readahead_size. For all versions tested I used the default values for the block cache (LRU) and format_version. For 9.6.1 I repeated tests using the hyperclock cache (default. is LRU) and format_version =2, =3, =4 and =5 (default is =6). 

I used my fork of the RocksDB benchmark scripts that are wrappers to run db_bench. These run db_bench tests in a special sequence -- load in key order, read-only, do some overwrites, read-write and then write-only. The benchmark was run using 1 thread for the small server and 8 threads for the medium server. How I do benchmarks for RocksDB is explained here and here. The command line to run the tests is:

    # Small server, SER7: use 1 thread, 20M KV pairs for cached, 400M for IO-bound
    bash x3.sh 1 no 1800 c8r32 20000000 400000000 byrx iobuf iodir

The tests on the charts are named as:
  • fillseq -- load in key order with the WAL disabled
  • revrangeww -- reverse range while writing, do short reverse range scans as fast as possible while another thread does writes (Put) at a fixed rate
  • fwdrangeww -- like revrangeww except do short forward range scans
  • readww - like revrangeww except do point queries
  • overwrite - do overwrites (Put) as fast as possible
Workloads

There are three workloads, all of which use one client (thread):

  • byrx - the database is cached by RocksDB
  • iobuf - the database is larger than memory and RocksDB uses buffered IO
  • iodir - the database is larger than memory and RocksDB uses O_DIRECT

A spreadsheet with all results is here and performance summaries with more details are linked below:
Relative QPS

The numbers in the spreadsheet and on the y-axis in the charts that follow are the relative QPS which is (QPS for $me) / (QPS for $base). When the value is greater than 1.0 then $me is faster than $base. When it is less than 1.0 then $base is faster (perf regression!).

The base version is RocksDB 6.0.2 for the all versions tests and 9.6.1 with my standard configuration for the 9.6 variations tests.

Results: byrx with 9.6 variations

The byrx tests use a cached database. The performance summary is here. This has results for RocksDB 9.6.1 using my standard configuration and the variations are:
  • fv2 - uses format_version=2 instead of the default (=6)
  • fv3 - uses format_version=3
  • fv4 - uses format_version=4
  • fv5 - uses formatio_version=5
  • ahcc - uses auto_hyper_clock_cache instead of the default (LRU)
This chart shows the relative QPS for RocksDB 9.6.1 with a given configuration relative to 9.6.1 with my standard configuration. The y-axis doesn't start at 0 to improve readability.

Summary:
  • Using different values of format_version don't have a large impact here
  • Using auto hyperclock instead of LRU improves read-heavy QPS by up to 15%
Results: byrx with all versions

The byrx tests use a cached database. The performance summary is here

This chart shows the relative QPS for a given version of RocksDB 6.0.2. The y-axis doesn't start at 0 to improve readability.

Summary:
  • QPS drops by 5% to 15% from RocksDB 6.0.2 to 9.7.2
  • Performance has been stable since 8.0
  • For overwrite the excellent result in RocksDB 6.0.2 comes at the cost of bad write stalls (see pmax here)

Results: iobuf with all versions

The iobuf tests use a database larger than memory with buffered IO. The performance summary is here.

This chart shows the relative QPS for a given version of RocksDB 6.0.2. The y-axis doesn't start at 0 to improve readability.

Summary:
  • bug 12038 explains the regression for overwrite (fixed soon in 9.7)
  • QPS for fillseq has been stable
  • QPS for read-heavy tests is 15% to 20% better in RocksDB 9.7.2 vs 6.0.2
Results: iodir with all versions

The iodir tests use a database larger than memory with O_DIRECT. The performance summary is here.

This chart shows the relative QPS for a given version of RocksDB 6.0.2. The y-axis doesn't start at 0 to improve readability.

Summary:
  • QPS for fillseq is ~11% less in 9.7.2 vs 6.0.2 but has been stable since 7.0. My vague memory is that the issue is new CPU overhead from better error checking.
  • QPS for overwrite is stable
  • QPS for read-heavy tests is 16% to 38% better in RocksDB 9.7.1 vs 6.0.2

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock

This has benchmark results for RocksDB using a big (48-core) server. I ran tests to document the impact of the the block cache type (LRU vs ...