Monday, May 14, 2018

Geek code for database algorithms

I like to read academic papers on database systems but I usually don't have time to do more than browse. If only there were a geek code for this. Part of the geek code would explain the performance vs efficiency tradeoff. While it helps to know that something new is faster, I want to know the cost of faster. Does it require more storage (tiered vs leveled compaction)? Does it hurt SSD endurance (update-in-place vs write-optimized)? Read, write, space and cache amplification are a framework for explaining the tradeoffs.

The next part of the geek code is to group algorithms into one of page-based, LSM, index+log or something else. I suspect that few will go into the something else group. These groups can be used for both tree-based and hash-based algorithms, so I am redefining LSM to mean log structured merge rather than log structured merge tree.


  1. Your redefinition isn't even a redefinition, since it should be "LSM Tree"

    1. I agree but wrote this because some people (maybe including me) assume the "tree" part