This is a review of Differentiated Key-Value Storage Management for Balanced I/O Performance that was published in ATC 2021. This is a wonderful paper that resolves several of my concerns about key-value separation (here and here). The authors have experience with key-value separation via Titan which is part of TiDB.
The key idea in the paper is to use leveled compaction for keys and tiered compaction for values. Write-amplification is reduced by not using leveled compaction for values. Read-amplification is reduced, relative to classic key-value separation (see WiscKey), by using tiered compaction for values rather than logs. With classic key-value separation the worst case read-amp for a scan is the need to do a random read from the log for every qualifying key as that random read can include a block decompression and a storage IO. With DiffKV the use of tiered compaction for values provides some ordering to avoid that worst case.
While I enjoyed the paper, it didn't have performance results with compression enabled and I am curious about the impact of decompression overhead on point and range read latency.
A summary of the approach:
- small values are stored inline in the LSM tree
- large values are stored in logs, called vLogs. The paper explains a few improvements to make GC faster in this case. But mostly this is classic key-value separation.
- medium values use the vTree (tiered compaction for values)
- the user defines the two size limits that determines whether a value is small, medium or large
This is similar to tiered compaction for two reasons. First, there are multiple sorted groups per level. Second, when merges are done to move values from level N to N+1, the merge process only reads data from level N and then writes it to level N+1. It doesn't read and rewrite data already on level N+1.
I am curious whether there is a need to also trigger GC based on vTables that have excessive space-amplification.