Monday, May 14, 2018

Geek code for database algorithms

I like to read academic papers on database systems but I usually don't have time to do more than browse. If only there were a geek code for this. Part of the geek code would explain the performance vs efficiency tradeoff. While it helps to know that something new is faster, I want to know the cost of faster. Does it require more storage (tiered vs leveled compaction)? Does it hurt SSD endurance (update-in-place vs write-optimized)? Read, write, space and cache amplification are a framework for explaining the tradeoffs.

The next part of the geek code is to group algorithms into one of page-based, LSM, index+log or something else. I suspect that few will go into the something else group. These groups can be used for both tree-based and hash-based algorithms, so I am redefining LSM to mean log structured merge rather than log structured merge tree.

3 comments:

  1. Your redefinition isn't even a redefinition, since it should be "LSM Tree"

    ReplyDelete
    Replies
    1. I agree but wrote this because some people (maybe including me) assume the "tree" part

      Delete
  2. I had the framework to check ssd endurance inside ssd. It uses S.M.A.R.T infor from SSD, which is provided by vendros. If you want, I can share that with you.

    ReplyDelete

Battle of the Mallocators

If you use RocksDB and want to avoid OOM then use jemalloc or tcmalloc and avoid glibc malloc. That was true in 2015 and remains true in 202...