Small Datum: Reducing write amplification in MongoRocks

Friday, August 14, 2015

Reducing write amplification in MongoRocks

After evaluating results for MongoRocks and TokuMX I noticed that the disk write rate per insert was ~3X larger for MongoRocks than TokuMX. I ran tests with different configurations to determine whether I could reduce the write rate for MongoRocks during the insert benchmark tests on a server with slow storage. I was able to get a minor gain in throughput and a large reduction in write amplification by using Snappy compression on all levels of the LSM.

Configuration

The test was to load 500M documents with a 256 byte pad column using 10 threads. There were 3 secondary indexes to be maintained. In the default configuration RocksDB uses leveled compaction with no compression for L0 and L1 and Snappy compression for L2 and beyond. I used QuickLZ for TokuMX 2.0.1 and assume that all database file writes were compressed by it. From other testing I learned that I need to increase the default value used for bytes_per_sync with MongoRocks to trigger calls to sync_file_range less frequently. Here I tried two configurations for MongoRocks:

config1 - Use a larger memtable (200M versus 64M) and larger L1 (1500M versus 512M).
config2- Use a larger memtable (400M versus 64M), larger L1 (1500M versus 512M) and Snappy compression for all levels.

Binaries

This describes the configurations that I tested:

tokumx - TokuMX 2.0.1
rocksdb.def - MongoRocks with default configuration
rocksdb.config1 - MongoRocks with larger L1 & memtable
rocksdb.config2 - MongoRocks with Snappy compression for all levels

Results

From the results below the value for wkb/i (bytes written per insert) dropped almost in half for MongoRocks when Snappy compression was used for all levels. The reduction was smaller for the config that used a larger L1. Both of the changes caused a small improvement in the insert rate. The insert rate and write amplification were better for TokuMX. The table below has these columns:

r/s - average rate for iostat r/s
rmb/s, wmb/s - average rate for iostat rMB/s, wMB/s
r/i - iostat reads per document inserted
rkb/i, wkb/i - iostat rKB, wKB per document inserted
us+sy - average rate for vmstat us + sy
(us+sy)/i - us+sy divided by the insert rate
ips - average insert rate

r/s rmb/s wmb/s r/i rkb/i wkb/i us+sy cs/i (us+sy)/i ips engine
728.4 9.9 87.4 0.016343 0.228 2.007 40.1 4 0.000899 44571 tokumx
77.8 0.9 176.6 0.003123 0.036 7.254 22.3 7 0.000895 24923 rocksdb.def
556.3 7.6 149.8 0.022126 0.308 6.101 20.8 7 0.000826 25143 rocksdb.config1
285.3 3.6 105.1 0.010865 0.142 4.099 20.4 6 0.000775 26258 rocksdb.config2

Configuration options

This is the change for mongo.conf with rocksdb.config2:
storage.rocksdb.configString: "bytes_per_sync=16m;max_background_flushes=3;max_background_compactions=12;max_write_buffer_number=4;max_bytes_for_level_base=1500m;target_file_size_base=200m;level0_slowdown_writes_trigger=12;write_buffer_size=400m;compression_per_level=kSnappyCompression:kSnappyCompression:kSnappyCompression:kSnappyCompression:kSnappyCompression:kSnappyCompression:kSnappyCompression"

This is the change for mongo.conf with rocksdb.config1:
storage.rocksdb.configString: "bytes_per_sync=16m;max_background_flushes=3;max_background_compactions=12;max_write_buffer_number=4;max_bytes_for_level_base=1500m;target_file_size_base=200m;level0_slowdown_writes_trigger=12;write_buffer_size=200m"

2 comments:

AnonymousFebruary 16, 2017 at 9:41 AM
Quick clarification - rocksdb.config2 is also with larger L1 & memtable, correct? In fact, even larger than config1, but compression probably brings it down to same or smaller.

I ask because later on it is referred that "The reduction was smaller for the config that used a larger L1." but both config2 and config1 have larger L1. Should it say reduction was smaller for uncompressed L0 and L1?

Thanks!
ReplyDelete
Replies

Add comment