My primary tool for doing RocksDB performance tests is db_bench, but as always there are layers of shell scripts to automate the process. First, I will define two terms. A benchmark is a sequence of benchmark steps. A common pattern for big data benchmarks is load and then query where load and query are the steps. I do small data (OLTP) so the common pattern for me is load and then several read+write steps. To save time the load step can be replaced by copying from a backup or snapshot.
The layers of shell scripts are:
- tools/benchmark.sh - runs one benchmark step by invoking db_bench
- tools/benchmark_compare.sh - runs a benchmark by invoking a sequence of benchmark steps
- x.sh - selects configuration options based on HW size then calls benchmark_compare.sh
- x3.sh - selects configuration options based on workload (IO-bound vs cached) then calls x.sh
- byrx is short for cached by RocksDB and the database fits in the RocksDB block cache
- byos is short for cached by OS and the database fits in the OS page cache but is larger than the RocksDB block cache. This simulates fast storage for reads and lets me increase stress on the RocksDB block cache code.
- iobuf is short for IO-bound with buffered IO. The database is larger than RAM and RocksDB uses buffered IO.
- iodir is short for IO-bound with O_DIRECT. The database is larger than RAM and RocksDB uses O_DIRECT for user reads and compaction.
Note that I have pending changes for benchmark.sh and benchmark_compare.sh that are not yet pushed upstream.
- fillseq - loads the database in key order. There is not much compaction debt when this finishes.
- revrange - reverse range scans, the real use for this is to let compacton catch up
- fwdrange - forward range scans, this has too much noise that I have yet to explain
- readrandom - point queries
- multireadrandom - more point queries, but enhanced by io_uring
- fragment the LSM tree, prior to this keys of SST files do not overlap
- overwritesome - does overwrite with –num set to 10% of the keys
- flush_mt_l0 - flushes the memtable, flushes the L0 then waits for compaction to catch up
- read+write - perf is reported for reads, the background writer has a 2MB/s rate limit
- revrangewhilewriting - short, reverse range scans
- fwdrangewhilewriting - short, forward range scans
- readwhilewriting - point queries
- overwrite - the writer does not have a rate limit. If there were more write-only tests that followed then I would use overwriteandwait which waits for compaction to finish when overwrite ends.