Sunday, April 3, 2022

Examples of the RocksDB trivial move optimization

I recently wrote two posts (here and here) on the trivial move optimization in RocksDB and now have a third post that provides examples via db_bench to demonstrate where it works, the impact of using compression on some levels and the complexity introduced by dynamic leveled. All of the examples use db_bench and the fillseq benchmark that inserts keys in ascending order. The four examples are:

  • dynamic level enabled, compression disabled
  • dynamic level disabled, compression disabled
  • dynamic level enabled, compression enabled
  • dynamic level disabled, compression enabled
Configuration

Dynamic leveled is explained by this post, enabled by this configuration option and set on the db_bench command line by --level_compaction_dynamic_level_bytes.

Compression is enabled by the db_bench flags --min_level_to_compress=2 --compression_type=lz4.

Dynamic level enabled, compression disabled

The db_bench command line and output are here. This shows the best case as there are no writes for levels beyond L0 -- Write(GB)=0 and Moved(GB) > 0. Trivial move cannot be done for writes into L0 (memtable flush) because the LSM has to write data at least once.

Dynamic level disabled, compression disabled

The db_bench command line and output are here. This shows the best case as there are no writes for levels beyond L0 -- Write(GB)=0 and Moved(GB) > 0.  Trivial move cannot be done for writes into L0 (memtable flush) because the LSM has to write data at least once.

Dynamic level enabled, compression enabled

The db_bench command line and output are here. This shows the impact of compression. When Ln and Ln+1 don't have the same style of compression then trivial move cannot be done from Ln to Ln+1. This is complicated by dynamic leveled because the LSM tree starts with L0 then grows into L0,L6 then grows into L0,L5,L6 and finally reaches L0,L4,L5,L6. Note I am using the level names displayed in compaction IO statistics and I consider these to be physical names. The logical names would be L0,L1,L2,L3. This is one source of complexity created by a great feature (dynamic leveled).

When the tree has data in L0,L5,L6 then the --min_level_to_compress option means that only L6 should be compressed. When the tree reaches L0,L4,L5,L6 then L5 and L6 should be compressed. This is another source of complexity from dynamic leveled.

This explains why both Write(GB) and Moved(GB) are greater than zero for L5 and L6. That doesn't occur with dynamic level disabled.

  • For L0 all bytes written are from memtable flush, Write(GB) > 0 & Moved(GB)=0.
  • For L4 only trivial move is done, Write(GB)=0 & Moved(GB) > 0
  • For L5 both Write(GB) > 0 & Moved(GB) > 0. 
    • With L0,L5,L6  trivial moves are done from L0 to L5 with uncompressed data
    • With L0,L4,L5,L6 writes are done from L4 to L5 because L4 is uncompressed and L5 uses lz4
  • For L6 both Write(GB) > 0 & Moved(GB) > 0. 
    • With L0,L6 trivial moves are done from L0 to L6 with uncompressed data
    • With L0,L5,L6 writes are done from L5 to L6 because L5 is uncompressed and L6 uses lz4
    • With L0,L4,L5,L6 trivial moves are done from L5 to L6 because both use lz4


Dynamic level disabled, compression enabled

The db_bench command line and output are here. When Ln and Ln+1 don't have the same style of compression then trivial move cannot be done from Ln to Ln+1. However dynamic leveled is disabled so the compaction IO statistics are easier to read.

  • For L0 all bytes written are from memtable flush, Write(GB) > 0 & Moved(GB)=0.
  • For L1 only trivial move is done, Write(GB)=0 & Moved(GB) > 0
  • For L2 only writes are done, Write(GB) > 0 & Moved(GB)=0,  because L1 is uncompressed & L2 is
  • For L3 only trivial move is done, Write(GB)=0 & Moved(GB) > 0, because L2 and L3 use lz4







No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...