Friday, August 14, 2015

Reducing write amplification in MongoRocks

After evaluating results for MongoRocks and TokuMX I noticed that the disk write rate per insert was ~3X larger for MongoRocks than TokuMX. I ran tests with different configurations to determine whether I could reduce the write rate for MongoRocks during the insert benchmark tests on a server with slow storage. I was able to get a minor gain in throughput and a large reduction in write amplification by using Snappy compression on all levels of the LSM.


The test was to load 500M documents with a 256 byte pad column using 10 threads. There were 3 secondary indexes to be maintained. In the default configuration RocksDB uses leveled compaction with no compression for L0 and L1 and Snappy compression for L2 and beyond. I used QuickLZ for TokuMX 2.0.1 and assume that all database file writes were compressed by it. From other testing I learned that I need to increase the default value used for bytes_per_sync with MongoRocks to trigger calls to sync_file_range less frequently. Here I tried two configurations for MongoRocks:
  • config1 - Use a larger memtable (200M versus 64M) and larger L1 (1500M versus 512M).
  • config2- Use a larger memtable (400M versus 64M), larger L1 (1500M versus 512M) and Snappy compression for all levels. 


This describes the configurations that I tested:
  • tokumx - TokuMX 2.0.1
  • rocksdb.def - MongoRocks with default configuration
  • rocksdb.config1 - MongoRocks with larger L1 & memtable
  • rocksdb.config2 - MongoRocks with Snappy compression for all levels


From the results below the value for wkb/i (bytes written per insert) dropped almost in half for MongoRocks when Snappy compression was used for all levels. The reduction was smaller for the config that used a larger L1. Both of the changes caused a small improvement in the insert rate. The insert rate and write amplification were better for TokuMX. The table below has these columns:
  • r/s - average rate for iostat r/s
  • rmb/s, wmb/s - average rate for iostat rMB/s, wMB/s
  • r/i - iostat reads per document inserted
  • rkb/i, wkb/i - iostat rKB, wKB per document inserted
  • us+sy - average rate for vmstat us + sy
  • (us+sy)/i - us+sy divided by the insert rate
  • ips - average insert rate
r/s     rmb/s   wmb/s   r/i        rkb/i    wkb/i    us+sy   cs/i   (us+sy)/i   ips     engine
728.4   9.9      87.4   0.016343   0.228    2.007    40.1    4      0.000899    44571   tokumx
77.8    0.9     176.6   0.003123   0.036    7.254    22.3    7      0.000895    24923   rocksdb.def
556.3   7.6     149.8   0.022126   0.308    6.101    20.8    7      0.000826    25143   rocksdb.config1
285.3   3.6     105.1   0.010865   0.142    4.099    20.4    6      0.000775    26258   rocksdb.config2

Configuration options

This is the change for mongo.conf with rocksdb.config2:
storage.rocksdb.configString: "bytes_per_sync=16m;max_background_flushes=3;max_background_compactions=12;max_write_buffer_number=4;max_bytes_for_level_base=1500m;target_file_size_base=200m;level0_slowdown_writes_trigger=12;write_buffer_size=400m;compression_per_level=kSnappyCompression:kSnappyCompression:kSnappyCompression:kSnappyCompression:kSnappyCompression:kSnappyCompression:kSnappyCompression"

This is the change for mongo.conf with rocksdb.config1:
storage.rocksdb.configString: "bytes_per_sync=16m;max_background_flushes=3;max_background_compactions=12;max_write_buffer_number=4;max_bytes_for_level_base=1500m;target_file_size_base=200m;level0_slowdown_writes_trigger=12;write_buffer_size=200m"


  1. Quick clarification - rocksdb.config2 is also with larger L1 & memtable, correct? In fact, even larger than config1, but compression probably brings it down to same or smaller.

    I ask because later on it is referred that "The reduction was smaller for the config that used a larger L1." but both config2 and config1 have larger L1. Should it say reduction was smaller for uncompressed L0 and L1?


    1. Probably. I will return to MongoRocks performance tests real soon.