Tuesday, March 11, 2014

Insert benchmark for InnoDB, MongoDB and TokuMX and flash storage

This work is introduced with a few disclaimers in an earlier post. For these tests I ran the insert benchmark client in two steps: first to load 100M documents/rows into an empty database and then to load another 400M documents/rows. The test used 1 client thread, 1 collection/table and the query threads were disabled. The replication log (oplog/binlog) was disabled for all tests. I assume that all of the products would suffer with that enabled.

Note that the database in this test was fully cached by InnoDB and TokuMX and almost fully cached for MongoDB. This means there will be no disk reads for InnoDB during index maintenance, no disk reads for TokuMX during compaction and few disk reads for MongoDB  during index maintenance. Future posts have results for databases much larger than RAM.

Performance summary:

  • database size - InnoDB matches TokuMX when using 2X compression alas this comes at a cost in the insert rate. Without compression TokuMX is better than InnoDB and both are much better than MongoDB. I ran tests using the readable attribute names in iibench-mongodb like "cashregisterid" and then repeated the tests after changing the code to use names that were between 2 and 5 bytes. Using shorter names did not have a big impact on database size because most of the space is used in indexes not for document storage. After one test with 500M documents and shorter names the space used is 38.5% for document storage and 62% for the indexes.
  • write efficiency - MongoDB wrote about 10X more bytes to storage than InnoDB and about 20X more than TokuMX. But the MongoDB database was only 2X to 4X larger than the others. It will rewrite the same page to disk much more frequently than InnoDB. The tests used the default for syncdelay (60) and it tries to flush all dirty database files to disk every 60 seconds via calls to msync. InnoDB is able to keep dirty pages in the buffer pool for a much longer time. But increasing syncdelay to 300 did not have a significant impact on the bytes written rate -- that reduced the rate by between 10% and 20%. I encountered intermittent stalls on writes with MongoDB 2.4.9 that occur when an msync is in progress and an insert needs to allocate a new file. That is fixed in 2.6 and PMP helped me to debug it.
  • insert rate - uncompressed InnoDB did better than TokoMX and compressed InnoDB matched it. All did better than MongoDB. The problem for MongoDB was write efficiency. The problem for TokuMX is that the random write penalty for InnoDB is greatly reduced by a fast storage device. This advantage is reduced when the database gets larger than RAM and completely lost when slower storage is used which will be described in a future post.

I enabled fsync on commit for all tests. That used j:1 for inserts for MongoDB and TokuMX. That also used journalCommitInterval=2 for MongoDB and logFlushPeriod=0 for TokuDB. This means there can be a 2/3 * 1/2 millisecond wait for group commit on average with MongoDB. But tests run with j:0 (not reported here) showed that forcing the journal to disk has little impact on performance. For InnoDB I used innodb_flush_log_at_trx_commit=1. I disable the InnoDB doublewrite buffer for most of the tests. That sacrifices protection from torn page writes. MongoDB doesn't have an option to protect against torn page writes. TokuMX isn't at risk from torn page writes. Otherwise I used the default options for MongoDB and TokuMX. They are much easier to configure than MySQL. I won't describe all of the MySQL settings but I use direct IO. The test server has 144G of RAM, 24 cores with HT enabled and fast flash storage that can do more than 50,000 16kb random IOPs.

The powerOf2Sizes option was not enabled for this test. From reading source code that should only change how space is allocated for documents in the heap and it won't help or hurt with index fragmentation. I have reports to share for powerOf2Sizes in future blog posts.

The tests used iibench-mongodb in Java for TokuMX and MongoDB and the python version for InnoDB. I tested MySQL 5.6.12, TokuMX 1.4.0 and MongoDB 2.4.9 in the following configurations.
  • inno-16k-dw0 - InnoDB with 16kb pages and the doublewrite buffer disabled
  • inno-16k-dw1 - InnoDB with 16kb pages and the doublewrite buffer enabled
  • inno-8k-dw0 - InnoDB with 8kb pages and the doublewrite buffer disabled
  • inno-8k-dw0 - InnoDB with 8kb pages and the doublewrite buffer disabled
  • inno-8k-dw0-zlib - InnoDB with 8kb pages, the doublewrite buffer disabled, and 2X compression enabled
  • tokumx - TokuMX with logFlushPeriod=0 and writes done with j:1
  • mongo24 - MongoDB 2.4.9 with journalCommitInterval=2 and writes done with j:1
The table below summarizes the results after inserting 100M documents/rows.

DB-size  Bytes-per-doc Write-rate  Bytes-written Test-seconds  Inserts/sec Server
17.0 GB      182        20.8 MB/s    80.7 GB       3895          25674     innodb-16k-dw0
17.0 GB      182        35.4 MB/s   140.1 GB       3963          25231     innodb-16k-dw1
16.0 GB      171        34.9 MB/s   118.0 GB       3375          29622     innodb-8k-dw0
 8.5 GB       91        21.3 MB/s   162.8 GB       7661          13048     innodb-8k-dw0-zlib
 9.3 GB       99        15.0 MB/s    99.0 GB       6586          15182     tokumx
42.0 GB      450       171.1 MB/s  1825.4 GB      10679           9364     mongo24

The table below summarizes the results after inserting another 400M documents/rows.

DB-size  Bytes-per-doc Write-rate  Bytes-written Test-seconds  Inserts/sec Server
 75 GB     161          42.2 MB/s    667 GB       15846          25244     innodb-16k-dw0
 75 GB     161          81.7 MB/s   1300 GB       15926          25117     innodb-16k-dw1
 75 GB     161          72.9 MB/s    988 GB       13562          29492     innodb-8k-dw0
 40 GB      85          24.2 MB/s    730 GB       30130          13276     innodb-8k-dw0-zlib
 46 GB      98          15.3 MB/s    417 GB       27241          14684     tokumx
154 GB     330         188.6 MB/s   9838 GB       52233           7658     mongo24

5 comments:

  1. Any idea as to why the bytes-written for InnoDB with 2x compression is so much higher than uncompressed?

    ReplyDelete
  2. I assume you are asking about the results for 0 -> 100M rows. If yes, then my educated guess is that InnoDB has yet to settle down and choose the right ratio in the buffer pool between uncompressed and compressed pages. At the 100M -> 500M measurement bytes-written is less for 2X compression. InnoDB has clever logic to figure out how much RAM to use for uncompressed pages from compressed tables. Any pages from a compressed table in the buffer pool must be compressed and optionally can also have an uncompressed copy. There are a few metrics used to compute the ratio (decompression cost, IO rate).

    ReplyDelete
  3. Unrelated, will you be adding more compressible data to your test schema on a future experiment? The majority of compressible data in this experiment is the fields names themselves, which MongoDB and TokuMX store but InnoDB does not.

    ReplyDelete
  4. Mark, are you planning to also include TokuDB in this benchmark? It seems to me, it could write the least bytes to disk from all these engines.

    ReplyDelete
  5. I assume that TokuDB won't be very different from TokuMX. I am not testing it here.

    ReplyDelete

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...