Monday, January 30, 2017

Compaction stalls: something to make better in RocksDB

In previous results that I shared for the insert benchmark it was obvious that MyRocks throughput is steady when the workload transitions from in-memory to IO-bound. The reason is that non-unique secondary index maintenance is read-free for MyRocks so there are no stalls for storage reads of secondary index pages. Even with the change buffer, InnoDB eventually is slowed by storage reads and by page writeback.

It was less obvious that MyRocks has more variance on both the in-memory and IO-bound insert benchmark tests. I try to be fair when explaining storage engine performance so I provide a few more details here and results for InnoDB in MySQL 5.7.10 & 5.6.26 along with MyRocks from our fork of MySQL 5.6. The binlog was enabled for all tests, fsync-on-commit was disabled and 16 clients inserted 500m or 2b rows into 16 tables in PK order. Each table has 3 secondary indexes. For MyRocks I made one configuration change from the earlier result. I changed level0_slowdown_writes_trigger from 10 to 20 to reduce compaction stalls. This has a potential bad side effect of making queries slower, but this test was insert-only.

For both graphs the y-axis is the average insert rate per 5-second interval and the x-axis is the interval number. I used mstat to collect the data.

The goal is to match InnoDB in MySQL 5.7 in quality of service while providing much better throughput. We have some work to do. On the bright side, this is an opportunity for someone to make RocksDB better.


The first graph is for the in-memory workload where all data is cached by the storage engine, nothing is read from storage, but many storage writes are done to persist the changes. InnoDB with MySQL 5.7 is much faster than with 5.6. It also has the least variance. MyRocks has the most variance and that is largely from compaction stalls when there are too many files in level 0 of the LSM tree.


The second graph is for the IO-bound workload. The graph ends first for MyRocks because it sustains the highest average insert rate. But it also has a thick line because of variance. InnoDB from MySQL 5.6 also has a lot of variance.


  1. Great post, I love to see charts presented in such a way as seeing the bands of expected iops is useful. From the look of things for my rocks, there are two distinct areas where consistency suffers. I agree that the behavior looks like stalls from compaction. I imagine that 99th (and other) percentile latency spikes around the same time.

    As a db for things that do a lot of analytics or churn through larger workloads that are write heavy via a lot of workers, this is ok. I am not sure if the read latency and QPS is affected while a compaction is happening. That would be an interesting question, as it is more representative of the workload of a large job kicking off while production user facing reads are happening.

    I am somewhat impressed with the burst performance for 5.7 : with memory available 3m inserts can happen in 100s. As per your previous post, of course, in a longer running job my rocks does things very consistently.

    1. I have results for reads concurrent with insert benchmark writes. I have yet to publish it. The summary is that MyRocks does much better than InnoDB. The problem for InnoDB is that dirty page write-back falls behind so a query might need to read a page and to get a free-page from the buffer pool a dirty page must be written back. So the read is sometimes stalled by the writes.