Thursday, July 9, 2015

Linkbench, MongoDB and a disk array

This has results for Linkbench with MongoDB and a server with a disk array. The Linkbench configurations are similar to the tests done for cached and uncached databases, but the server here uses a disk array with 6 10K RPM SAS disks and HW RAID 0. The previous tests used PCIe flash. The server with a disk array has 24 hyperthread cores and is at least one generation older than the PCIe flash server which has 40 hyperthread cores. Both have 144G RAM.

Results for cached database

The test client is LinkbenchX and the configuration is described in a previous post. The test used 10 threads for loading and 20 for queries. After loading there were 12 1-hour runs of the query test and results are reported for the 2nd and 12th hour.

The results below are for the test with maxid1=20M set in FBWorkload.properties for both servers (disk array, PCIe flash from a previous post). The load rate is similar between the servers with a disk array and PCIe flash. The query rate is better for the server with PCIe flash, but that might be due more to more cores and newer CPUs than to storage performance. The load and query rates are better for WiredTiger than for RocksDB on both servers.

The server names are explained in a previous post. The oplog was enabled for all tests.

--- results for disk array
load    load    load    2h      2h      12h     12h     24h     24h
time    rate    size    qps     size    qps     size    qps     size    server
5333    16519   14g     9649    17g     9689    22g                     mongo.rocks.log
3015    29223   16g     17715   30g     15654   40g                     mongo.wt.log
35253   2499    65g     4918    68g     4642    78g                     mongo.mmap.log

--- results for PCIe flash
5015    17565   14g     14442   17g     13925   24g     13506   29g     mongo.rocks.log
3601    28020   16g     25488   34g     22992   45g                     mongo.wt.log

Results for uncached database

The results below are for the test with maxid1=1B for both servers (disk array, PCIe flash from a previous post). The load rate is similar between disk and flash and is also similar to the rates above for the cached database. Multiple secondary indexes are maintained during the load but IO latency does not have a significant impact on the load rate, even for WiredTiger which does more random IO than RocksDB.

The query rates are significantly lower for the disk array than for PCIe flash. IO latency is significant for queries. However, RocksDB does better than WiredTiger on the disk array possibly because it uses less random IO for writes which leaves more random IO capacity to serve reads.

Update - I repeated tests for the disk-array server with different configurations and the results are better. For WiredTiger I set storage.syncPeriodSecs=600 which changes the default checkpoint interval from 60 to 600 seconds. The benefit should be fewer disk writes and QPS improved by more than 30% with that change. For RocksDB I used the default configuration and QPS improved by more than 20% compared to the non-default configuration I had been using (Igor did a good job choosing the defaults). For all engines I used a smaller block cache -- 32G rather than 70G -- to save more space for compressed blocks in the OS filesystem cache. Results for all engines improved with a smaller block cache.

--- results for disk array
load    load    load    2h      2h      12h     12h     24h     24h     48h     48h
time    rate    size    qps     size    qps     size    qps     size    qps     size    server
298901  14625   606g    580     585g    611     588g    604     590g    600     596g    mongo.tokumx.log, 70G
297159  14711   597g    731     585g    786     588g    782     592g    736     598g    mongo.tokumx.log, 32G
178923  24432   694g    343     704g    333     728g                                    mongo.wt.log, default, 70G
176432  24777   696g    449     709g    434     738g    423     749g    418     757g    mongo.wt.log, non-default, 32G
271569  16097   631g    448     631g    477     632g    452     633g    471     635g    mongo.rocks.log, non-default, 70G
274780  15909   628g    458     628g    592     629g    574     631g    569     633g    mongo.rocks.log, default, 32G

--- results for PCIe flash
251688  17368   630g    9670    633g    6762    644g    6768    656g    mongo.rocks.log
175740  24874   695g    8533    766g    8019    791g    7718    806g    mongo.wt.log

This has the mean response time in milliseconds for each of the query types in Linkbench. The data below is from the 12th 1-hour run for the disk array (the first pair of numbers) and then PCIe flash (the second pair of numbers. The most frequent operation is GET_LINKS_LIST followed by MULTIGET_LINK. On the disk array because the response time for these two operations is better for RocksDB and that explains why it gets more QPS than WiredTiger. For PCIe flash the response time for GET_LINKS_LIST is lower for WiredTiger which explains the better QPS. The response time for all of the write operations is better for RocksDB than WiredTiger on disk and flash, but those operations are less frequent. WiredTiger does more reads from disk & flash during writes as b-tree leaf pages must be read before being written.

The QPS for RocksDB is higher than WiredTiger for the 2nd 1-hour run with the PCIe server but lower after the 12th 1-hour run. The mean response time for the GET_LINKS_LIST operation almost doubles and the cause might be the range read penalty from an LSM.




                12th 1-hour run         2nd 1-hour run  12th 1-hour run
                disk    disk    -       flash   flash   flash   flash   
                wired   rocks   -       wired   rocks   wired   rocks   
ADD_NODE        0.717   0.444           0.327   0.300   0.324   0.276
UPDATE_NODE     23.0    21.5            1.217   1.083   1.240   0.995
DELETE_NODE     22.9    21.6            1.260   0.675   1.285   1.018   
GET_NODE        22.7    20.9            0.913   2.355   0.941   0.622
ADD_LINK        47.6    23.5            2.988   1.610   3.142   2.255
DELETE_LINK     31.9    21.6            2.063   1.610   2.407   1.701
UPDATE_LINK     51.1    25.8            3.238   2.507   3.407   2.395   
COUNT_LINK      16.7    10.9            0.686   0.571   0.739   0.547
MULTIGET_LINK   22.6    18.9            1.599   1.195   1.603   1.136

GET_LINKS_LIST  35.6    27.6            1.910   2.181   2.056   3.945

And the data below is example output from the end of one test, RocksDB on flash for the 12 1-hour run.

ADD_NODE count = 627235  p25 = [0.2,0.3]ms  p50 = [0.2,0.3]ms  p75 = [0.2,0.3]ms  p95 = [0.3,0.4]ms  p99 = [1,2]ms  max = 263.83ms  mean = 0.276ms
UPDATE_NODE count = 1793589  p25 = [0.7,0.8]ms  p50 = [0.8,0.9]ms  p75 = [0.9,1]ms  p95 = [1,2]ms  p99 = [4,5]ms  max = 301.453ms  mean = 0.995ms
DELETE_NODE count = 246225  p25 = [0.7,0.8]ms  p50 = [0.8,0.9]ms  p75 = [0.9,1]ms  p95 = [1,2]ms  p99 = [4,5]ms  max = 265.012ms  mean = 1.018ms
GET_NODE count = 3150740  p25 = [0.4,0.5]ms  p50 = [0.5,0.6]ms  p75 = [0.5,0.6]ms  p95 = [0.9,1]ms  p99 = [3,4]ms  max = 301.078ms  mean = 0.622ms
ADD_LINK count = 2189319  p25 = [1,2]ms  p50 = [2,3]ms  p75 = [2,3]ms  p95 = [3,4]ms  p99 = [7,8]ms  max = 317.292ms  mean = 2.255ms
DELETE_LINK count = 727942  p25 = [0.4,0.5]ms  p50 = [0.6,0.7]ms  p75 = [2,3]ms  p95 = [3,4]ms  p99 = [6,7]ms  max = 320.13ms  mean = 1.701ms
UPDATE_LINK count = 1949970  p25 = [1,2]ms  p50 = [2,3]ms  p75 = [2,3]ms  p95 = [3,4]ms  p99 = [7,8]ms  max = 393.483ms  mean = 2.395ms
COUNT_LINK count = 1190142  p25 = [0.3,0.4]ms  p50 = [0.4,0.5]ms  p75 = [0.5,0.6]ms  p95 = [0.8,0.9]ms  p99 = [2,3]ms  max = 296.65ms  mean = 0.547ms
MULTIGET_LINK count = 127871  p25 = [0.7,0.8]ms  p50 = [0.9,1]ms  p75 = [1,2]ms  p95 = [1,2]ms  p99 = [4,5]ms  max = 272.272ms  mean = 1.136ms
GET_LINKS_LIST count = 12353781  p25 = [0.5,0.6]ms  p50 = [1,2]ms  p75 = [1,2]ms  p95 = [3,4]ms  p99 = [38,39]ms  max = 2360.432ms  mean = 3.945ms
REQUEST PHASE COMPLETED. 24356814 requests done in 3601 seconds. Requests/second = 6762

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...