Monday, May 18, 2020

Avoiding reads in write-optimized index structures

This is a summary of features that avoid storage read IO in OSS write-optimized index structures. My scope is limited to MySQL and Tarantool -- I know there are similar features in other DBMS and am happy for comments on that:
  • RocksDB merge operator logically does read-modify-write but physically does a Put into the memtable and the read is deferred until user queries or compaction. I hope that MyRocks eventually takes advantage of the merge operator.
  • Read free replication in TokuDB and MyRocks.
  • Fast Updates in TokuDB do something similar to the merge operator for update and insert
  • Non-unique secondary index maintenance is read free in MyRocks
  • Deferred secondary index update in Tarantool is a new approach to avoiding reads for write-heavy workloads. It is worth reading. I wish it had been published by VLDB.
  • MyRocks also avoids some storage reads when validating unique constraints (for PK and unique secondary indexes) on inserts and some updates because the Get() that it does can use a bloom filter. This benefit is larger when there is a bloom filter on the max level of the LSM tree, but that has other costs is is frequently not done.
  • MyRocks uses the SingleDelete feature in RocksDB to allow tombstones to be removed earlier which decreases the CPU overhead for reads after a batch of writes.
  • A b-tree can also reduce index maintenance related storage read IO. See the InnoDB change buffer.

An incomplete list of relevant research:
  • DiffIndex - this is more about distributed systems, but still interesting
  • Techniques for reducing index maintenance overhead in an LSM by Luo & Carey (VLDB 2019)
  • Deferred Lightweight Indexing (DELI)


  1. The algorithm described by Tarantool is the nearly identical as the previously published EDBT paper at We also had a paper at VLDB 2019 that uses a similar idea but exploits a primary key index to cleanup secondary indexes more efficiently (

    1. Thanks for interesting links. I only glanced quickly, but isn't the EDBT14 paper a tradeoff: You can choose synchronously updated secondary indexes (requires a read) or you can avoid the read and use eventually consistent secondary indexes?

    2. Do we need a RUM Conjecture for dist sys?

  2. Sorry that I posted the wrong link for the first paper. The correct one should be Deferred Lightweight Indexing for Log-Structured Key-Value Stores (