Note that the database in this test was fully cached by InnoDB and TokuMX and almost fully cached for MongoDB. This means there will be no disk reads for InnoDB during index maintenance, no disk reads for TokuMX during compaction and few disk reads for MongoDB during index maintenance. Future posts have results for databases much larger than RAM.
Performance summary:
- database size - InnoDB matches TokuMX when using 2X compression alas this comes at a cost in the insert rate. Without compression TokuMX is better than InnoDB and both are much better than MongoDB. I ran tests using the readable attribute names in iibench-mongodb like "cashregisterid" and then repeated the tests after changing the code to use names that were between 2 and 5 bytes. Using shorter names did not have a big impact on database size because most of the space is used in indexes not for document storage. After one test with 500M documents and shorter names the space used is 38.5% for document storage and 62% for the indexes.
- write efficiency - MongoDB wrote about 10X more bytes to storage than InnoDB and about 20X more than TokuMX. But the MongoDB database was only 2X to 4X larger than the others. It will rewrite the same page to disk much more frequently than InnoDB. The tests used the default for syncdelay (60) and it tries to flush all dirty database files to disk every 60 seconds via calls to msync. InnoDB is able to keep dirty pages in the buffer pool for a much longer time. But increasing syncdelay to 300 did not have a significant impact on the bytes written rate -- that reduced the rate by between 10% and 20%. I encountered intermittent stalls on writes with MongoDB 2.4.9 that occur when an msync is in progress and an insert needs to allocate a new file. That is fixed in 2.6 and PMP helped me to debug it.
- insert rate - uncompressed InnoDB did better than TokoMX and compressed InnoDB matched it. All did better than MongoDB. The problem for MongoDB was write efficiency. The problem for TokuMX is that the random write penalty for InnoDB is greatly reduced by a fast storage device. This advantage is reduced when the database gets larger than RAM and completely lost when slower storage is used which will be described in a future post.
I enabled fsync on commit for all tests. That used j:1 for inserts for MongoDB and TokuMX. That also used journalCommitInterval=2 for MongoDB and logFlushPeriod=0 for TokuDB. This means there can be a 2/3 * 1/2 millisecond wait for group commit on average with MongoDB. But tests run with j:0 (not reported here) showed that forcing the journal to disk has little impact on performance. For InnoDB I used innodb_flush_log_at_trx_commit=1. I disable the InnoDB doublewrite buffer for most of the tests. That sacrifices protection from torn page writes. MongoDB doesn't have an option to protect against torn page writes. TokuMX isn't at risk from torn page writes. Otherwise I used the default options for MongoDB and TokuMX. They are much easier to configure than MySQL. I won't describe all of the MySQL settings but I use direct IO. The test server has 144G of RAM, 24 cores with HT enabled and fast flash storage that can do more than 50,000 16kb random IOPs.
The powerOf2Sizes option was not enabled for this test. From reading source code that should only change how space is allocated for documents in the heap and it won't help or hurt with index fragmentation. I have reports to share for powerOf2Sizes in future blog posts.
The tests used iibench-mongodb in Java for TokuMX and MongoDB and the python version for InnoDB. I tested MySQL 5.6.12, TokuMX 1.4.0 and MongoDB 2.4.9 in the following configurations.
- inno-16k-dw0 - InnoDB with 16kb pages and the doublewrite buffer disabled
- inno-16k-dw1 - InnoDB with 16kb pages and the doublewrite buffer enabled
- inno-8k-dw0 - InnoDB with 8kb pages and the doublewrite buffer disabled
- inno-8k-dw0 - InnoDB with 8kb pages and the doublewrite buffer disabled
- inno-8k-dw0-zlib - InnoDB with 8kb pages, the doublewrite buffer disabled, and 2X compression enabled
- tokumx - TokuMX with logFlushPeriod=0 and writes done with j:1
- mongo24 - MongoDB 2.4.9 with journalCommitInterval=2 and writes done with j:1
The table below summarizes the results after inserting 100M documents/rows.
DB-size Bytes-per-doc Write-rate Bytes-written Test-seconds Inserts/sec Server
17.0 GB 182 20.8 MB/s 80.7 GB 3895 25674 innodb-16k-dw0
17.0 GB 182 35.4 MB/s 140.1 GB 3963 25231 innodb-16k-dw1
16.0 GB 171 34.9 MB/s 118.0 GB 3375 29622 innodb-8k-dw0
8.5 GB 91 21.3 MB/s 162.8 GB 7661 13048 innodb-8k-dw0-zlib
9.3 GB 99 15.0 MB/s 99.0 GB 6586 15182 tokumx
42.0 GB 450 171.1 MB/s 1825.4 GB 10679 9364 mongo24
The table below summarizes the results after inserting another 400M documents/rows.
DB-size Bytes-per-doc Write-rate Bytes-written Test-seconds Inserts/sec Server
75 GB 161 42.2 MB/s 667 GB 15846 25244 innodb-16k-dw0
75 GB 161 81.7 MB/s 1300 GB 15926 25117 innodb-16k-dw1
75 GB 161 72.9 MB/s 988 GB 13562 29492 innodb-8k-dw0
40 GB 85 24.2 MB/s 730 GB 30130 13276 innodb-8k-dw0-zlib
46 GB 98 15.3 MB/s 417 GB 27241 14684 tokumx
154 GB 330 188.6 MB/s 9838 GB 52233 7658 mongo24
Any idea as to why the bytes-written for InnoDB with 2x compression is so much higher than uncompressed?
ReplyDeleteI assume you are asking about the results for 0 -> 100M rows. If yes, then my educated guess is that InnoDB has yet to settle down and choose the right ratio in the buffer pool between uncompressed and compressed pages. At the 100M -> 500M measurement bytes-written is less for 2X compression. InnoDB has clever logic to figure out how much RAM to use for uncompressed pages from compressed tables. Any pages from a compressed table in the buffer pool must be compressed and optionally can also have an uncompressed copy. There are a few metrics used to compute the ratio (decompression cost, IO rate).
ReplyDeleteUnrelated, will you be adding more compressible data to your test schema on a future experiment? The majority of compressible data in this experiment is the fields names themselves, which MongoDB and TokuMX store but InnoDB does not.
ReplyDeleteMark, are you planning to also include TokuDB in this benchmark? It seems to me, it could write the least bytes to disk from all these engines.
ReplyDeleteI assume that TokuDB won't be very different from TokuMX. I am not testing it here.
ReplyDelete