Cached database
The first test is with a cached database. The test pattern is to load the database and then do 12 1-hour query tests in a loop. The database always fit in RAM. At the end of the 12th hour the database size was 40 GB for WiredTiger, 22 GB for RocksDB, 30 GB for TokuMX and 78 GB for mmapv1. I used Snappy compression for WiredTiger/RocksDB and QuickLZ for TokuMX.The graph below has the average QPS per 1-hour interval.
This is the data for the graph:
wiredtiger,18279,17715,16740,16585,16472,15924,15703,15632,15783,15401,15872,15654
rocksdb,10892,9649,9580,9639,9860,9981,9316,9535,9578,9682,9437,9689
tokumx,11078,6881,5832,5132,5864,5434,5495,5340,5168,5505,4763,4924
mmapv1,5066,4918,4821,4758,4629,4666,4589,4613,4663,4626,4563,4642
add_link update_link get_links_list
wiredtiger 1.361 1.422 0.768
rocksdb 1.702 1.789 1.460
tokumx 1.538 1.674 3.929
mmapv1 4.788 5.230 2.657
Database larger than RAM
The test was repeated with a database that does not fit in RAM. The test was not run for mmapv1 because I didn't have enough disk space or patience to wait for the load to finish. At the end of the 12th hour the database size was 728 GB for WiredTiger, 632 GB for RocksDB and 588 GB for TokuMX. It is interesting that the TokuMX database was smaller than RocksDB here but larger than RocksDB for the cached test.The graph below has the average QPS per 1-hour interval.
This is the data for the graph:
tokumx,439,580,622,625,638,617,598,613,631,609,610,611
rocksdb,387,448,479,468,468,477,471,483,475,473,471,477
wiredtiger,297,343,345,333,333,331,320,335,326,339,324,333
add_link update_link get_links_list
tokumx 23.499 25.903 22.987
rocksdb 21.704 23.883 25.835
wiredtiger 47.557 51.122 35.648
TokuMX is the most IO efficient for this workload based on the data below. That explains why it sustains the highest QPS because disk IO is the bottleneck. I used data from iostat (r/s, w/s, rKB/s and wKB/s) and divided those rates by the average QPS with all data taken from the 12th 1-hour run. I assume that disk reads done by queries dominate reads done from compaction. TokuMX does less IO per query than RocksDB and WiredTiger. Both TokuMX and RocksDB write much less data per query than WiredTiger.
TokuMX is the most IO efficient for this workload based on the data below. That explains why it sustains the highest QPS because disk IO is the bottleneck. I used data from iostat (r/s, w/s, rKB/s and wKB/s) and divided those rates by the average QPS with all data taken from the 12th 1-hour run. I assume that disk reads done by queries dominate reads done from compaction. TokuMX does less IO per query than RocksDB and WiredTiger. Both TokuMX and RocksDB write much less data per query than WiredTiger.
read/query read-KB/query write-KB/query
tokumx 1.612 14.588 2.495
rocksdb 2.135 20.234 2.512
wiredtiger 2.087 26.675 12.110
Configuration
This has a few more details on the MongoDB configuration I used. The oplog was enabled for all engines. This is the configuration file and startup script for TokuMX. The block cache was ~70G for all engines.
dbpath = /home/mongo/data
logpath = /home/mongo/log
logappend = true
fork = true
slowms = 2000
oplogSize = 2000
expireOplogHours = 2
numactl --interleave=all \
bin/mongod \
--config $PWD/mongo.conf \
--setParameter="defaultCompression=quicklz" \
--setParameter="defaultFanout=128" \
--setParameter="defaultReadPageSize=16384" \
--setParameter="fastUpdates=true" \
--cacheSize=$1 \
--replSet foobar \
--checkpointPeriod=900
And this is the configuration file for other engines:
processManagement:
fork: true
systemLog:
destination: file
path: /home/mongo/log
logAppend: true
storage:
syncPeriodSecs: 60
dbPath: /home/mongo/data
journal:
enabled: true
mmapv1:
journal:
commitIntervalMs: 100
operationProfiling.slowOpThresholdMs: 2000
replication.oplogSizeMB: 2000
storage.wiredTiger.collectionConfig.blockCompressor: snappy
storage.wiredTiger.engineConfig.journalCompressor: none
storage.rocksdb.compression: snappy
storage.rocksdb.configString: "write_buffer_size=16m;max_write_buffer_number=4;max_background_compactions=6;max_background_flushes=3;target_file_size_base=16m;soft_rate_limit=2.9;hard_rate_limit=3;max_bytes_for_level_base=128m;stats_dump_period_sec=60;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=12;level0_stop_writes_trigger=20;max_grandparent_overlap_factor=8;max_bytes_for_level_multiplier=8"
No comments:
Post a Comment