Posts

Showing posts from May, 2019

index+log: implementations

My take on index+log systems like WiscKey is that they are neither better nor worse than an LSM - it all depends on your workload. But I am certain that we know much more about an LSM than about the index+log systems. Hopefully that changes over time as some of them are thriving. The first index+log system that I read about was Berkeley DB Java Edition . The design paper is worth reading. Since then there have been a few more implementations and papers that I describe here. This list is probably incomplete: Bitcask, ForestDB, WiscKey, HashKV, TitanDB and RocksDB BlobDB. At this point the systems that are getting into production, TitanDB and BadgerDB, use an LSM for the index. I wonder if an index structure that supports update-in-place would be better especially when the index must be cached  because I expect the CPU read-amp for an LSM to be  about 2X larger than for a b-tree and a b-tree supports update-in-place which makes it easier to relocate values during GC. While I like

index+log: open issues

This post is about open issues for the index+log family of index structures . See past posts here and here  for more details on index+log. The success of LSM has lead to several solutions that use an LSM for the index. I am curious if that is a good choice when implementing index+log from scratch. It is a great choice when starting with an LSM and then adding index+log. Open Issues The open issues for index+log include: Is GC multi-threaded? Does value log GC require index queries? Must the index be cached? Is block compression possible? With an LSM as the index how is relocating an entry in the value log supported? Is GC multi-threaded? I hope so. It hasn't been in some of the index+log papers/systems. Does value log GC require index queries? Yes, in most of the index+log papers/systems this is required to determine whether a value is live when scanning the value log during GC HashKV proposed a solution that doesn't require such queries. It is a wonde

Index+log, v2

I put most index structures into one of three categories -- page-based, LSM or index+log. My focus is on databases larger than memory and I might be ignoring categories used for main memory DBMS. Over the past decade index+log  has gotten more attention and this is my second attempt at explaining it. My definition of index+log is simple -- data is appended to a log on each write, index entries point into the log and GC scans the log to copy live values and discard others. The log can be one file or many log segments. GC might have to search the index to determine whether a value is live. The value written to the log might include the key. Bitcask was the first index+log system that I noticed but I assume there are earlier examples and just found one -- Berkeley DB Java Edition . While WiscKey made popular the term key-value separation and is presented as an LSM variant, I put it in the index+log category. Other interesting index+log systems include RocksDB BlobDB ,  TitanDB ,  F

CRUM conjecture - read, write, space and cache amplification

The RUM Conjecture asserts that an index structure can't be optimal for all of read, write and space. I will ignore whether optimal is about performance or efficiency (faster is better vs efficient-er is better). I want to use CRUM in place of RUM where C stands for database cache. The C in CRUM is the amount of memory per key-value pair (or row) the DBMS needs so that either a point query or the first row from a range query can be retrieved with at most X storage reads. The C can also be reported as the minimal database : memory ratio to achieve at most X storage reads per point query. My points here are: There are 4 amplification factors - read, write, space and cache CRUM is for comparing index structure efficiency and performance Read and write amplification have CPU and IO parts Write amplification has immediate and deferred parts Many X is faster than Y papers and articles neglect to quantify the tradeoffs made in pursuit of performance. I hope that changes and