Thursday, March 12, 2020

Review of Benchmarking RocksDB Key-Value Workloads at Facebook

At some level all database engines have a key-value API even if they aren't exposed to the user. One name for this in a SQL DBMS is the data layer. The data layer is RocksDB for MyRocks & MongoRocks and then WiredTiger for MongoDB.

FAST 20 has a paper on characterizing RocksDB key-value workloads at Facebook. One of the workloads is UDB and UDB uses MyRocks, a SQL DBMS.

Benchmarks are useful to me but there is always room for improvement. Two common problems are the lack of load variance and the lack of complexity in access patterns. By the lack of load variance I mean that the workload doesn't change over time which fails to capture the daily spikes that occur for web-scale workloads as different parts of the world wake and sleep. They also fail to include daily and intermittent operational tasks like backup, schema change, data migration and archiving old data.

This paper explains the complexity in access patterns for three RocksDB use cases and shows how to reproduce them in benchmarks. UDB was one of the workloads so this work includes an update of the analysis shared in the Linkbench paper from 2013.

The primary contributions of the paper are the analysis of production workloads and the discovery that hot key ranges, rather than hot keys, is required to model production workloads. I appreciate that some of the work behind this paper has already been shared via RocksDB and that more might be shared as enhancements to YCSB.

Things I liked about the paper:
  • When generating skew in a benchmark use the concept of hot key ranges rather than hot keys. Linkbench and YCSB use hot keys distributed across the key space.
  • Explains the UDB workload which was last explained in the Linkbench paper
  • Explains two other production workloads
  • Provides some detail on the diurnal workload spikes in UDB
  • Explains the distribution of key sizes. While it states that UDB keys are small, by small it means a typical value is 16 to 30 bytes. Of course, the covering secondary index used by UDB is an outlier at ~64 bytes.
  • Explains that values are usually small, <= 128 bytes
  • Documents temporal access patterns
  • Demonstrates that using hot key ranges rather than hot keys can more accurately model production workloads

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...