Monday, May 18, 2020

Avoiding reads in write-optimized index structures

This is a summary of features that avoid storage read IO in OSS write-optimized index structures. My scope is limited to MySQL and Tarantool -- I know there are similar features in other DBMS and am happy for comments on that:
  • RocksDB merge operator logically does read-modify-write but physically does a Put into the memtable and the read is deferred until user queries or compaction. I hope that MyRocks eventually takes advantage of the merge operator.
  • Read free replication in TokuDB and MyRocks.
  • Fast Updates in TokuDB do something similar to the merge operator for update and insert
  • Non-unique secondary index maintenance is read free in MyRocks
  • Deferred secondary index update in Tarantool is a new approach to avoiding reads for write-heavy workloads. It is worth reading. I wish it had been published by VLDB.
  • MyRocks also avoids some storage reads when validating unique constraints (for PK and unique secondary indexes) on inserts and some updates because the Get() that it does can use a bloom filter. This benefit is larger when there is a bloom filter on the max level of the LSM tree, but that has other costs is is frequently not done.
  • MyRocks uses the SingleDelete feature in RocksDB to allow tombstones to be removed earlier which decreases the CPU overhead for reads after a batch of writes.
  • A b-tree can also reduce index maintenance related storage read IO. See the InnoDB change buffer.
Updates

An incomplete list of relevant research:
  • DiffIndex - this is more about distributed systems, but still interesting
  • Techniques for reducing index maintenance overhead in an LSM by Luo & Carey (VLDB 2019)
  • Deferred Lightweight Indexing (DELI)

5 comments:

  1. The algorithm described by Tarantool is the nearly identical as the previously published EDBT paper at https://tristartom.github.io/docs/edbt14.pdf. We also had a paper at VLDB 2019 that uses a similar idea but exploits a primary key index to cleanup secondary indexes more efficiently (http://www.vldb.org/pvldb/vol12/p531-luo.pdf).

    ReplyDelete
    Replies
    1. Thanks for interesting links. I only glanced quickly, but isn't the EDBT14 paper a tradeoff: You can choose synchronously updated secondary indexes (requires a read) or you can avoid the read and use eventually consistent secondary indexes?

      Delete
    2. Do we need a RUM Conjecture for dist sys?

      Delete
  2. Sorry that I posted the wrong link for the first paper. The correct one should be Deferred Lightweight Indexing for Log-Structured Key-Value Stores (https://ieeexplore.ieee.org/document/7152467)

    ReplyDelete

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...