Tuesday, July 7, 2015

LinkBenchX and MongoDB on a small server

I previously reported results for Linkbench and LinkbenchX running on high-end commodity servers.
Here I report results for much lower-end hardware. I have two Intel NUC systems with 8G of RAM, 1 disk and 1 SSD. They are small, quiet and were easy to setup. There is an active NUC community hosted by Intel with useful answers to many questions. I managed to get the systems running without asking for help from Domas. That is rare.

I ran LinkbenchX as described in the previous post with a few minor changes because this hardware is smaller. First, I used 4 load threads and 4 request threads compared to 10 load threads and 20 request threads on the larger hardware (set loaders=4 and requesters=4 in LinkConfigMongoDBV2.properties). Then I ran tests for smaller databases using maxid1=2M for the cached database and maxid1=20M for the uncached database (set in FBWorkload.properties). Finally I added one option to storage.rocksdb.configString to put SST index and bloom filter data in the RocksDB cache to make it subject to the block cache limits. While this can hurt performance it also gets RocksDB to respect storage.rocksdb.cacheSizeGB. Without this option the data for SST index and filter data is always in memory when the SST file is open and as the database grows this can use a lot of memory, especially when running on a server with 8GB of RAM. Without this option the memory consumed for index and filter data also looks like a memory leak in RocksDB until you realize what is going on (yes, I have wasted too much time on this problem). The extra option is:
block_based_table_factory={cache_index_and_filter_blocks=1}

Results

The tests here were run with the database on the single disk. The oplog was enabled for the test but sync-on-commit was disabled. Tests were done with maxid1=2M for the cached database and maxid1=20m for the uncached database. Unfortunately the database was always not cached for mmapv1 because it uses so much more space for the same data compared to WiredTiger and RocksDB. 

The results below include:
  • load time - the number of seconds for the load test
  • load rate - the average insert rate during the load test
  • load size - the database size in GB when the load ended
  • 2h qps - the average QPS during the 2nd 1-hour query test
  • 2h size - the database size in GB after the 2nd 1-hour query test
  • 12h qps - the average QPS during the 12th 1-hour query test
  • 12h size - the database size in GB after the 12th 1-hour query test
The QPS for RocksDB is a lot better than WiredTiger in the cached database test. After looking at the iostat data from the test I see that WiredTiger didn't cache the database for the 12h result below. The WiredTiger database was 5G, the test server has 8G of RAM and the WiredTiger block cache gets 4G of RAM. Assuming the database compressed by 2X then the uncompressed database is 10G, the 4G block cache can store 40% of it and the OS filesystem cache gets at most 4G. From vmstat data I see that the memory.cache column grows to ~4G.

Sizing a cache for InnoDB with direct IO is easy. Give it as much memory as possible and hope that the background tasks that share the HW don't use too much memory. But then InnoDB supported compression and now we had a problem of figuring out how to share the InnoDB buffer pool between compressed and uncompressed pages. There is some clever code in InnoDB that tries to figure this out based on whether a workload is CPU or IO bound. Well, we have the same problem with WiredTiger and RocksDB. Because they use buffered IO the OS filesystem cache is the cache for compressed pages and the WiredTiger/RocksDB block cache is the cache for uncompressed pages. Neither WiredTiger or RocksDB has code yet to dynamically adjust the amount of memory used for compressed versus uncompressed pages but I am certain that it is easier to dynamically resize the block cache in them compared to InnoDB.

For now RocksDB and WiredTiger default to using 50% of system RAM for the block cache. I suspect that in many cases, like when the database is larger than RAM, that it is better to use much less than 50% of system RAM for their block caches. I will save my hand waving math for another post and will leave myself a note to repeat the tests below with the cache set to use 20% of RAM.

uncached database, maxid1=20m, loaders=4, requesters=4

load    load    load    2h      2h      12h     12h
time    rate    size    qps     size    qps     size    server
16252   5421    14g     113     14g     129     14g     mongo.rocks.log
12037   7319    15g     105     16g     97      17g     mongo.wt.log
19062   4494    69g     50      68g     46      68g     mongo.mmap.log

cached database, maxid1=2m, loaders=4, requesters=4

load    load    load    2h      2h      12h     12h
time    rate    size    qps     size    qps     size    server
1629    5886    2.2g    3405    3.0g    3147    3.9g    mongo.rocks.log
12774   7530    2.5g    2966    4.1g    1996    5.0g    mongo.wt.log
2058    4659    14g     1491    14g     627     18g     mongo.mmap.log

Hardware

The Intel NUC systems have:

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...