Posts

Showing posts from November, 2018

Review of TRIAD: Creating Synergies Between Memory, Disk and Log in Log Structured Key-Value Stores

This is review of TRIAD which  was published in USENIX ATC 2017. It explains how to reduce write amplification for RocksDB leveled compaction although the ideas are useful for many LSM implementations. I share a review here because the paper has good ideas. It isn't easy to keep up with all of the LSM research, even when limiting the search to papers that reference RocksDB , and I didn't notice this paper until recently. TRIAD reduces write amplification for an LSM with leveled compaction and with a variety of workloads gets up to 193% more throughput, up to 4X less write amplification and spends up to 77% less time doing compaction and flush. Per the RUM Conjecture improvements usually come at a cost and the cost in this case is more cache amplification (more memory overhead/key) and possibly more read amplification . I assume this is a good tradeoff in many cases. The paper explains the improvements via 3 components -- TRIAD-MEM, TRIAD-DISK and TRIAD-LOG -- that combin

Converting an LSM to a B-Tree and back again

I wonder if it is possible to convert an LSM to a B-Tree. The goal is to do it online and in-place -- so I don't want two copies of the database while the conversion is in progress. I am interested in data structures for data management that adapt dynamically to improve performance and efficiency for a given workload.  Workloads change in the short and long term. I hope that data structures can be adapt to the change and converting between an LSM and a B-Tree is one way to adapt. This is more likely to be useful when the data structure supports some kind of partitioning in the hope that different workloads can be isolated to different partitions -- and then some can use an LSM while others use a B-Tree. LSM to B-Tree A B-Tree is one tree. An LSM is a sequence of trees. Each sorted run in the LSM is a tree. With leveled compaction in RocksDB there are a few sorted runs in level 0 (L0) and then one sorted run in each of L1, L2 up to the max level (Lmax).  A B-Tree persists