Cached database
The first test is with a cached database. The test pattern is to load the database and then do 12 1-hour query tests in a loop. The database always fit in RAM. At the end of the 12th hour the database size was 40 GB for WiredTiger, 22 GB for RocksDB, 30 GB for TokuMX and 78 GB for mmapv1. I used Snappy compression for WiredTiger/RocksDB and QuickLZ for TokuMX.The graph below has the average QPS per 1-hour interval.
This is the data for the graph:
wiredtiger,18279,17715,16740,16585,16472,15924,15703,15632,15783,15401,15872,15654
rocksdb,10892,9649,9580,9639,9860,9981,9316,9535,9578,9682,9437,9689
tokumx,11078,6881,5832,5132,5864,5434,5495,5340,5168,5505,4763,4924
mmapv1,5066,4918,4821,4758,4629,4666,4589,4613,4663,4626,4563,4642
                add_link        update_link     get_links_list
wiredtiger      1.361           1.422           0.768
rocksdb         1.702           1.789           1.460
tokumx          1.538           1.674           3.929
mmapv1          4.788           5.230           2.657
Database larger than RAM
The test was repeated with a database that does not fit in RAM. The test was not run for mmapv1 because I didn't have enough disk space or patience to wait for the load to finish. At the end of the 12th hour the database size was 728 GB for WiredTiger, 632 GB for RocksDB and 588 GB for TokuMX. It is interesting that the TokuMX database was smaller than RocksDB here but larger than RocksDB for the cached test.The graph below has the average QPS per 1-hour interval.
This is the data for the graph:
tokumx,439,580,622,625,638,617,598,613,631,609,610,611
rocksdb,387,448,479,468,468,477,471,483,475,473,471,477
wiredtiger,297,343,345,333,333,331,320,335,326,339,324,333
                add_link        update_link     get_links_list
tokumx          23.499          25.903          22.987
rocksdb         21.704          23.883          25.835
wiredtiger      47.557          51.122          35.648
TokuMX is the most IO efficient for this workload based on the data below. That explains why it sustains the highest QPS because disk IO is the bottleneck. I used data from iostat (r/s, w/s, rKB/s and wKB/s) and divided those rates by the average QPS with all data taken from the 12th 1-hour run. I assume that disk reads done by queries dominate reads done from compaction. TokuMX does less IO per query than RocksDB and WiredTiger. Both TokuMX and RocksDB write much less data per query than WiredTiger.
TokuMX is the most IO efficient for this workload based on the data below. That explains why it sustains the highest QPS because disk IO is the bottleneck. I used data from iostat (r/s, w/s, rKB/s and wKB/s) and divided those rates by the average QPS with all data taken from the 12th 1-hour run. I assume that disk reads done by queries dominate reads done from compaction. TokuMX does less IO per query than RocksDB and WiredTiger. Both TokuMX and RocksDB write much less data per query than WiredTiger.
                read/query      read-KB/query   write-KB/query
tokumx          1.612           14.588          2.495
rocksdb         2.135           20.234          2.512
wiredtiger      2.087           26.675          12.110
Configuration
This has a few more details on the MongoDB configuration I used. The oplog was enabled for all engines. This is the configuration file and startup script for TokuMX. The block cache was ~70G for all engines.
dbpath = /home/mongo/data
logpath = /home/mongo/log
logappend = true
fork = true
slowms = 2000
oplogSize = 2000
expireOplogHours = 2
numactl --interleave=all \
bin/mongod \
    --config $PWD/mongo.conf \
    --setParameter="defaultCompression=quicklz" \
    --setParameter="defaultFanout=128" \
    --setParameter="defaultReadPageSize=16384" \
    --setParameter="fastUpdates=true" \
    --cacheSize=$1 \
    --replSet foobar \
    --checkpointPeriod=900
And this is the configuration file for other engines:
processManagement:
  fork: true
systemLog:
  destination: file
  path: /home/mongo/log
  logAppend: true
storage:
  syncPeriodSecs: 60
  dbPath: /home/mongo/data
  journal:
    enabled: true
  mmapv1:
    journal:
      commitIntervalMs: 100
operationProfiling.slowOpThresholdMs: 2000
replication.oplogSizeMB: 2000
storage.wiredTiger.collectionConfig.blockCompressor: snappy
storage.wiredTiger.engineConfig.journalCompressor: none
storage.rocksdb.compression: snappy
storage.rocksdb.configString: "write_buffer_size=16m;max_write_buffer_number=4;max_background_compactions=6;max_background_flushes=3;target_file_size_base=16m;soft_rate_limit=2.9;hard_rate_limit=3;max_bytes_for_level_base=128m;stats_dump_period_sec=60;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=12;level0_stop_writes_trigger=20;max_grandparent_overlap_factor=8;max_bytes_for_level_multiplier=8"








