Friday, August 14, 2015

Reducing write amplification in MongoRocks

After evaluating results for MongoRocks and TokuMX I noticed that the disk write rate per insert was ~3X larger for MongoRocks than TokuMX. I ran tests with different configurations to determine whether I could reduce the write rate for MongoRocks during the insert benchmark tests on a server with slow storage. I was able to get a minor gain in throughput and a large reduction in write amplification by using Snappy compression on all levels of the LSM.

Configuration

The test was to load 500M documents with a 256 byte pad column using 10 threads. There were 3 secondary indexes to be maintained. In the default configuration RocksDB uses leveled compaction with no compression for L0 and L1 and Snappy compression for L2 and beyond. I used QuickLZ for TokuMX 2.0.1 and assume that all database file writes were compressed by it. From other testing I learned that I need to increase the default value used for bytes_per_sync with MongoRocks to trigger calls to sync_file_range less frequently. Here I tried two configurations for MongoRocks:
  • config1 - Use a larger memtable (200M versus 64M) and larger L1 (1500M versus 512M).
  • config2- Use a larger memtable (400M versus 64M), larger L1 (1500M versus 512M) and Snappy compression for all levels. 

Binaries

This describes the configurations that I tested:
  • tokumx - TokuMX 2.0.1
  • rocksdb.def - MongoRocks with default configuration
  • rocksdb.config1 - MongoRocks with larger L1 & memtable
  • rocksdb.config2 - MongoRocks with Snappy compression for all levels

Results

From the results below the value for wkb/i (bytes written per insert) dropped almost in half for MongoRocks when Snappy compression was used for all levels. The reduction was smaller for the config that used a larger L1. Both of the changes caused a small improvement in the insert rate. The insert rate and write amplification were better for TokuMX. The table below has these columns:
  • r/s - average rate for iostat r/s
  • rmb/s, wmb/s - average rate for iostat rMB/s, wMB/s
  • r/i - iostat reads per document inserted
  • rkb/i, wkb/i - iostat rKB, wKB per document inserted
  • us+sy - average rate for vmstat us + sy
  • (us+sy)/i - us+sy divided by the insert rate
  • ips - average insert rate
r/s     rmb/s   wmb/s   r/i        rkb/i    wkb/i    us+sy   cs/i   (us+sy)/i   ips     engine
728.4   9.9      87.4   0.016343   0.228    2.007    40.1    4      0.000899    44571   tokumx
77.8    0.9     176.6   0.003123   0.036    7.254    22.3    7      0.000895    24923   rocksdb.def
556.3   7.6     149.8   0.022126   0.308    6.101    20.8    7      0.000826    25143   rocksdb.config1
285.3   3.6     105.1   0.010865   0.142    4.099    20.4    6      0.000775    26258   rocksdb.config2

Configuration options


This is the change for mongo.conf with rocksdb.config2:
storage.rocksdb.configString: "bytes_per_sync=16m;max_background_flushes=3;max_background_compactions=12;max_write_buffer_number=4;max_bytes_for_level_base=1500m;target_file_size_base=200m;level0_slowdown_writes_trigger=12;write_buffer_size=400m;compression_per_level=kSnappyCompression:kSnappyCompression:kSnappyCompression:kSnappyCompression:kSnappyCompression:kSnappyCompression:kSnappyCompression"

This is the change for mongo.conf with rocksdb.config1:
storage.rocksdb.configString: "bytes_per_sync=16m;max_background_flushes=3;max_background_compactions=12;max_write_buffer_number=4;max_bytes_for_level_base=1500m;target_file_size_base=200m;level0_slowdown_writes_trigger=12;write_buffer_size=200m"

2 comments:

  1. Quick clarification - rocksdb.config2 is also with larger L1 & memtable, correct? In fact, even larger than config1, but compression probably brings it down to same or smaller.

    I ask because later on it is referred that "The reduction was smaller for the config that used a larger L1." but both config2 and config1 have larger L1. Should it say reduction was smaller for uncompressed L0 and L1?

    Thanks!

    ReplyDelete
    Replies
    1. Probably. I will return to MongoRocks performance tests real soon.

      Delete

Postgres 18 beta3, large server, sysbench

This has performance results for Postgres 18 beta3, beta2, beta1, 17.5 and 17.4 using the sysbench benchmark and a large server. The working...