Wednesday, March 9, 2022

The db_bench mixgraph benchmark

The mixgraph workload for db_bench was first described in this paper. The paper's first author was on the RocksDB team and is now a professor at ASU (see here and here). I am happy that he spent time making RocksDB better and is now doing research. A few years near production and in R&D is a great education for researchers.

Mixgraph

The mixgraph workload is interesting because it is more complex than the other db_bench workloads. By workload I mean the value foo in db_bench --benchmarks=$foo. Most, or perhaps all, of the workloads are listed here. By perhaps all I acknowledge there is some tech debt in db_bench and I hope to reduce that this year. Use --benchmarks=mixgraph to run the mixgraph benchmark. 

The mixgraph workload mimics social graph OLTP. It is implemented as part of db_bench and uses the RocksDB API directly. This is a simpler version of LinkBench. The workload uses a configurable mix of Put, Get and Seek+Next operations (see here). It doesn't use transactions (yet). The access pattern distributions are configurable. The value size distributions are configurable. There is much that is configurable via options, which is good but adds complexity. The default value for most of these options is zero and I suspect that isn't a good default for several of them. For example, if --iter_k and --iter_sigma aren't explicitly set then scan_length will be zero and Next won't get called after Seek. In other cases, the implication of changing an option requires math and stats. While the default for the max value length is 1024 the value length per Put is determined by a Pareto distribution and is ~35 with --value_k=0.2615 --value_sigma=25.45. I created issue 9672 to improve monitoring for mixgraph to make it easier to learn the implications of such changes. Regardless, start by reading the appendix in this paper if you want to use mixgraph. All of the db_bench options for mixgraph are listed here.

Another problem, that I will try to fix soon, is that mixgraph shouldn't be run with more than one thread, so use db_bench --benchmarks=mixgraph --threads=1 or don't specify --threads as the default value is 1. See issue 9651 for details.

Using mixgraph

I created shell scripts to make it easy for me to run mixgraph and collection performance data. They are here. The scripts provide the four configurations listed in the appendix (all_dist, prefix_dist, all_random, prefix_random) and then one more, hwperf, from elsewhere. With these scripts the average value length is ~35 bytes. The average scan length, Seeks per Next, varies by configuration. It is zero for hwperf and ~560 for the others when mix_max_scan_len is 10000. The scan length is zero for hwperf because it doesn't set --iter_k or --iter_sigma. For this reason I suggest you use the other configurations.

I was able to get between 10,000 and 20,000 operations/s from a single thread depending on the configuration for an IO-bound benchmark and a server with a fast SSD.

I spend (too) much time running that use uniform distributions for access patterns. With mixgraph you can get a workload with a non-uniform distribution and I was curious how that impacted the block cache hit rate. This table shows the block cache hit rate as a function of block cache size. The block cache size varies from 1 to 128 GB. The database size is 574 GB. That table also has many other metrics. The results make it clear that all_random and prefix_random are more likely to use uniform random access distributions while all_dist and prefix_dist are not.

I was curious about the IO overhead for Get vs Seek+Next and repeated tests limited to Get or Seek+Next operations. The former is called Get-only and the latter is called Seek-only in the results. The results are here and interesting details are:

  • block cache hit rate for Get-only is much lower than for Seek-only. I assume that each Next call is a block cache access and thus many Next calls per Seek inflate the hit rate for that.
  • operations/second for Get-only is much higher than for Seek-only because Seek-only does ~560 Next calls per Seek, except for hwperf.












RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...