Thursday, November 16, 2023

Checking RocksDB 8.x for performance regressions on a large server, part 2

This post has results for performance regressions in all versions of 8.x using a large server. In a previous post I shared results for RocksDB 7.x and 8.x on the same hardware. Here I have results for new versions of RocksDB 8.7 and 8.8.

tl;dr

  • Things mostly look good
  • There are a few known problems
  • There are a few possible regressions that will take more time to figure out

Builds

I compiled with gcc RocksDB 8.0.0, 8.1.1, 8.2.1, 8.3.3, 8.4.4, 8.5.4, 8.6.7, 8.7.3 and 8.8.0 which are the latest patch releases.

Benchmark

All tests used a server with 40 cores, 80 HW threads, 2 sockets, 256GB of RAM and many TB of fast NVMe SSD with Linux 5.1.2, XFS and SW RAID 0 across 6 devices.

Everything used the LRU block cache and the default value for compaction_readahead_size.

I used my fork of the RocksDB benchmark scripts that are wrappers to run db_bench. These run db_bench tests in a special sequence -- load in key order, read-only, do some overwrites, read-write and then write-only. The benchmark was repeated using 12 and 24 threads. How I do benchmarks for RocksDB is explained here and here.

The benchmark was repeated in three setups:
  • cached - database fits in the RocksDB block cache
  • iobuf - IO-bound, working set doesn't fit in memory, uses buffered IO
  • iodir - IO-bound, working set doesn't fit in memory, uses O_DIRECT
A spreadsheet with all results is here and performance summaries are here.

Results: cached

The charts use relative QPS which is: (QPS for my version / QPS for RocksDB 8.8.0). The y-axis usually doesn't start at zero to improve readability at the risk of improving hype-ability.

From 8.0 to 8.8 
  • fillseq QPS is stable
  • fwdrange QPS has much variance. This is a known issue with the LRU block cache on multi-socket servers (hello NUMA).
  • read QPS for readrandom and multireadrandom is down by ~5%. This might be a regression.
  • read QPS for *whilewriting is down by ~2%. This might be a regression.
  • overwrite QPS is stable to up by ~3%
Results: IO-bound with buffered IO

The charts use relative QPS which is: (QPS for my version / QPS for RocksDB 8.8.0). The y-axis usually doesn't start at zero to improve readability at the risk of improving hype-ability.

From 8.0 to 8.8 
  • fillseq QPS is stable
  • fwdrange QPS has much variance. This is a known issue with the LRU block cache on multi-socket servers (hello NUMA).
  • readrandom and multireadrandom QPS are stable
  • read QPS for *whilewriting might be down by 4% and might be a regression
  • overwrite QPS is down by ~5% and might be the result of compaction_readahead_size being larger than max_sectors_kb as explained here.
Results: IO-bound with O_DIRECT

From 8.0 to 8.8 
  • fillseq QPS is stable
  • fwdrange QPS has much variance. This is a known issue with the LRU block cache on multi-socket servers (hello NUMA).
  • readrandom and multireadrandom QPS are stable
  • read QPS for *whilewriting is stable
  • overwrite QPS is up by 3%

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...