I like to read academic papers on database systems but I usually don't have time to do more than browse. If only there were a geek code for this. Part of the geek code would explain the performance vs efficiency tradeoff. While it helps to know that something new is faster, I want to know the cost of faster. Does it require more storage (tiered vs leveled compaction)? Does it hurt SSD endurance (update-in-place vs write-optimized)? Read, write, space and cache amplification are a framework for explaining the tradeoffs.
The next part of the geek code is to group algorithms into one of page-based, LSM, index+log or something else. I suspect that few will go into the something else group. These groups can be used for both tree-based and hash-based algorithms, so I am redefining LSM to mean log structured merge rather than log structured merge tree.