Thursday, December 10, 2015

IO-bound linkbench for MongoDB 3.2

I previously shared Linkbench results for MongoDB 3.2.0 with a cached database. Here I provide results for a database larger than cache using SSD and a disk array to compare RocksDB with the WiredTiger B-Tree. The performance summary is:
  • the peak load rate is 2X better with WiredTiger in 3.2 vs 3.0
  • the load rate for WiredTiger is much better than for RocksDB
  • the load rate for WiredTiger and RocksDB does not get slower with disk vs SSD or with a cached database vs an uncached database. For RocksDB this occurs because secondary index maintenance doesn't require page reads. This might be true for WiredTiger only because the secondary index pages fit in cache.
  • the peak query rates were between 2X and 3X better for RocksDB vs WiredTiger

Configuration

The previous post explains the benchmark and test hardware. The test was repeated for 1, 4, 8, 16 and 24 concurrent clients for the disk array test and 1, 4, 8, 12, 16, 20 and 24 concurrent clients for the SSD test.

Load performance

I only show the insert rate graph for SSD. The results with the disk array are similar. The insert rate is better for WiredTiger because it supports more concurrency internally courtesy of extremely impressive engineering. We have work in progress to make this much better for RocksDB.

Query performance

These display the query rates for 1, 4, 8, 16 and 24 concurrent clients using a disk array and then SSD. RocksDB does better than WiredTiger on both disk and SSD. RocksDB uses less random IO when writing changes back to storage and the benefit from this is larger with disk than with an SSD so the speedup for RocksDB is larger with the disk array.


Efficiency

This includes absolute and relative efficiency metrics from the tests run with 16 concurrent clients and SSD. The values are from vmstat and iostat run for the duration of the test. The absolute metrics are the per-second rates. The relative metrics are the per-second rates divided by the operation rate which measures HW consumed per insert or query. The operation rate is either the load rate (IPS) or the query rate (QPS).

The columns are:
  • cs.sec - average context switch rate
  • cpu.sec - average CPU load (system + user CPU time)
  • cs.op - context switch rate / operation rate
  • cpu.Kop - (CPU load / operation rate) X 1000
  • r.sec - average rate for iostat r/s
  • rkb.sec - average rate for iostat rKB/s
  • wkb.sec - average rate for iostat wKB/s
  • r.op - r.sec / operation rate
  • rkb.op - rkb.sec / operation rate
  • wkb.op - w.sec / operation rate


Load Efficiency

Conclusions from efficiency on the load step:
  • The context switch and CPU overheads are larger with RocksDB. This might be from mutex contention
  • I need more precision to show this but the relative write rate is much better for WiredTiger
  • The relative read rate is much better for WiredTiger. I suspect that some data is being read during compaction by RocksDB.

cs.sec  cpu.sec cs.op   cpu.Kop   r.sec   rkb.sec wkb.sec r.op    rkb.op  wkb.op  engine
182210  43      21.8    5.176     7.7     41      100149  0.001   0.005   0.000   RocksDB
108230  68      4.3     2.721     0.5     4       58327   0.000   0.000   0.000   WiredTiger

Query Efficiency

Conclusions from efficiency on the query step:
  • The CPU overheads are similar
  • The read and write overheads are larger for WiredTiger. RocksDB sustains more QPS because it does less IO for an IO-bound workload.

cs.sec  cpu.sec cs.op   cpu.Kop   r.sec   rkb.sec wkb.sec r.op    rkb.op  wkb.op  engine
87075   53      4.6     2.772     6394.2  52516   27295   0.338   2.774   1.442   RocksDB
65889   40      5.5     3.320     6309.8  88675   50559   0.529   7.437   4.240   WiredTiger

Results for disk

Results at 1, 4, 8, 16 and 24 concurrent clients for RocksDB and WiredTiger. IPS is the average insert rate during the load phase and QPS is the average query rate during the query phase.

clients IPS     QPS   RocksDB
1       3781    281
4       12397   1244
8       16717   1946
16      19116   2259
24      17627   2458

clients IPS     QPS   WiredTiger
1       5080    227
4       18369   726
8       35272   843
16      55341   808
24      64577   813

Results for SSD

Results at 1, 4, 8, 12, 16, 20 and 24 concurrent clients for RocksDB and WiredTiger. IPS is the average insert rate during the load phase and QPS is the average query rate during the query phase.

clients IPS     QPS   RocksDB
1       3772    945
4       12475   3171
8       16689   6023
12      18079   8075
16      18248   9632
20      18328   10440
24      17327   10500

clients IPS     QPS   WiredTiger
1       5077    843
4       18511   2627
8       35471   4374
12      43105   5435
16      55108   6067
20      62380   5928
24      64190   5762

Response time

This has per-operation response time metrics that are printed by Linkbench at the end of a test run. These are from the SSD test with 16 clients. While the throughput is about 1.5X better for RocksDB the p99 latencies tend to be 2X better with it. It isn't clear whether the stalls are from WiredTiger or storage.

For RocksDB:

ADD_NODE count = 893692  p50 = [0.2,0.3]ms  p99 = [3,4]ms  max = 262.96ms  mean = 0.37ms
UPDATE_NODE count = 2556755  p50 = [0.8,0.9]ms  p99 = [10,11]ms  max = 280.701ms  mean = 1.199ms
DELETE_NODE count = 351389  p50 = [0.8,0.9]ms  p99 = [11,12]ms  max = 242.851ms  mean = 1.303ms
GET_NODE count = 4484357  p50 = [0.5,0.6]ms  p99 = [9,10]ms  max = 262.863ms  mean = 0.798ms
ADD_LINK count = 3119609  p50 = [1,2]ms  p99 = [13,14]ms  max = 271.504ms  mean = 2.211ms
DELETE_LINK count = 1038625  p50 = [0.6,0.7]ms  p99 = [13,14]ms  max = 274.327ms  mean = 1.789ms
UPDATE_LINK count = 2779251  p50 = [1,2]ms  p99 = [13,14]ms  max = 265.854ms  mean = 2.354ms
COUNT_LINK count = 1696924  p50 = [0.3,0.4]ms  p99 = [3,4]ms  max = 262.514ms  mean = 0.455ms
MULTIGET_LINK count = 182741  p50 = [0.7,0.8]ms  p99 = [6,7]ms  max = 237.901ms  mean = 1.023ms
GET_LINKS_LIST count = 17592675  p50 = [0.8,0.9]ms  p99 = [11,12]ms  max = 26278.336ms  mean = 1.631ms
REQUEST PHASE COMPLETED. 34696018 requests done in 3601 seconds. Requests/second = 9632

For WiredTiger:

ADD_NODE count = 562034  p50 = [0.2,0.3]ms  p99 = [0.6,0.7]ms  max = 687.348ms  mean = 0.322ms
UPDATE_NODE count = 1609307  p50 = [1,2]ms  p99 = [20,21]ms  max = 1331.321ms  mean = 1.761ms
DELETE_NODE count = 222067  p50 = [1,2]ms  p99 = [20,21]ms  max = 1116.159ms  mean = 1.813ms
GET_NODE count = 2827037  p50 = [0.8,0.9]ms  p99 = [19,20]ms  max = 1119.06ms  mean = 1.51ms
ADD_LINK count = 1963502  p50 = [2,3]ms  p99 = [27,28]ms  max = 1176.684ms  mean = 3.324ms
DELETE_LINK count = 654387  p50 = [1,2]ms  p99 = [21,22]ms  max = 1292.405ms  mean = 2.761ms
UPDATE_LINK count = 1752325  p50 = [2,3]ms  p99 = [30,31]ms  max = 4783.055ms  mean = 3.623ms
COUNT_LINK count = 1068844  p50 = [0.3,0.4]ms  p99 = [4,5]ms  max = 1264.399ms  mean = 0.705ms
MULTIGET_LINK count = 114870  p50 = [1,2]ms  p99 = [17,18]ms  max = 466.058ms  mean = 1.717ms
GET_LINKS_LIST count = 11081761  p50 = [1,2]ms  p99 = [21,22]ms  max = 19840.669ms  mean = 2.624ms

REQUEST PHASE COMPLETED. 21856135 requests done in 3602 seconds. Requests/second = 6067 

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...