Thursday, April 10, 2014

MongoDB, TokuMX and InnoDB for concurrent inserts

I used the insert benchmark with concurrent insert threads to understand performance limits in MongoDB, TokuMX and InnoDB. The database started empty and eventually was much larger than RAM. The benchmark requires many random writes for secondary index maintenance for an update-in-place b-tree used by MongoDB and InnoDB. The test server has fast flash storage. The work per transaction for this test is inserting 1000 documents/rows where each document/row is small (100 bytes) and has 3 secondary indexes to maintain. The test used 10 client connections to run these transactions concurrently and each client uses a separate collection/table. The performance summaries listed below are based on the context for this test -- fast storage, insert heavy with secondary index maintenance. My conclusion from running many insert benchmark tests is that I don't want to load big databases with MongoDB when secondary index maintenance must be done during the load. Sometimes creating the indexes after the load is an option but performance in this area must be improved.

The performance summary for the workload when the database is cached (smaller than RAM).
  • InnoDB and TokuMX are always much faster MongoDB except when database-per-collection is used and the MongoDB journal is disabled. I did not attempt to run InnoDB or TokuMX with the journal disabled or even on tmpfs so I am not comparing the same thing in that case. Reasons for better performance from InnoDB include the insert buffer, less bloat in the database, more internal concurrency and a more mature b-tree (sometimes older is better). Reasons for better performance from TokuMX include fractal trees, compression and more internal concurrency. AFAIK, the MongoDB write lock is not released when there is a disk read (page fault) during secondary index maintenance. Even when there aren't faults at most one client searches the b-tree indexes at a time.
  • InnoDB is faster than TokuMX
  • MongoDB inserts/second are doubled by using a database-per-collection compared to one database for all collections
  • MongoDB inserts/second are doubled by disabling the journal
  • MongoDB performance doesn't change much between using j:1 (fsync-on-commit) and w:1,j:0 (fsync a few times per second). 
The performance summary for the workload when the database is much larger than RAM. 
  • Eventually TokuMX is much faster than InnoDB. This is expected for the insert benchmark.
  • TokuMX and InnoDB are much faster than MongoDB. TPS degrades as the database size grows: not much for TokuMX, faster for InnoDB, really fast for MongoDB.
  • Disabling the journal doesn't help MongoDB The bottleneck is elsewhere.
  • Not using fsync-on-commit doesn't help MongoDB. The bottleneck is elsewhere.
  • Using database-per-collection doesn't do much to help MongoDB. The bottleneck is elsewhere.

Configuration

The Java and Python insert benchmark clients were used to load up to 3B documents/rows into 10 collections/tables, which is also up to 300M documents/rows per collection/table. For MongoDB the tests were run with all collections in one database and then again with a database-per-collection. Tests were run as a sequence of rounds where 100M documents/rows were loaded per round (10M per client) and performance metrics were computed per round. The test hosts have 144GB of RAM, PCIe flash and 24 CPU cores with HT enabled. The client and DBMS software ran on the same host. Several configurations were tested:
  • inno-lazy - MySQL 5.6.12, InnoDB, doublewrite=0, page_size=8k, flush_log_at_trx_commit=2
  • inno-sync - MySQL 5.6.12, InnoDB, doublewrite=0, page_size=8k, flush_log_at_trx_commit=1
  • toku-lazy - TokuMX 1.4.1, logFlushPeriod=300, w:1,j:0
  • toku-sync - TokuMX 1.4.1, logFlushPeriod=0, j:1
  • mongo24-1db-noj - MongoDB 2.4.9, nojournal=true, 10 collections in 1 database
  • mongo26-1db-noj - MongoDB 2.6.0rc2, nojournal=true, 10 collections in 1 database
  • mongo24-10db-noj - MongoDB 2.4.9, nojournal=true, database per collection
  • mongo26-10db-noj - MongoDB 2.6.0rc2, nojournal=true, database per collection
  • mongo24-1db-lazy - MongoDB 2.4.9, journalCommitInterval=300, w:1,j:0, 10 collections in 1 database
  • mongo26-1db-lazy - MongoDB 2.6.0rc2, journalCommitInterval=300, w:1,j:0, 10 collections in 1 database
  • mongo24-10db-lazy - MongoDB 2.4.9, journalCommitInterval=300, w:1,j:0, database per collection
  • mongo26-10db-lazy - MongoDB 2.6.0rc2, journalCommitInterval=300, w:1,j:0, database per collection
  • mongo24-1db-sync - MongoDB 2.4.9, journalCommitInterval=2, j:1, 10 collections in 1 database
  • mongo26-1db-sync - MongoDB 2.6.0rc2, journalCommitInterval=2, j:1, 10 collections in 1 database
  • mongo24-10db-sync - MongoDB 2.4.9, journalCommitInterval=2, j:1, database per collection
  • mongo26-10db-sync - MongoDB 2.6.0rc2, journalCommitInterval=2, j:1, database per collection

Results @100M rows

Legend for the columns:
  • DB-size - size of the database at the end of the test round
  • Bytes-per-doc - count of documents/rows divided by DB-size
  • Write-rate - average rate of bytes written to storage during the test round measured by iostat
  • Bytes-written - total bytes written to storage during the test round
  • Test-secs - number of seconds to complete the test round
  • TPS - average transactions per second during the test round where each transaction inserts 1000 documents/rows
  • Server - the tested configuration
Notes:
  • I don't know why inno-sync is a bit faster than inno-lazy, maybe HW was the cause. The key point is that doing fsync-on-commit isn't significant for this test. It also has a small impact for TokuMX.
  • MongoDB TPS is only close to InnoDB & TokuMX when the journal is disabled and a collection per database is used
  • There is up to a 2X benefit with MongoDB from using a database per collection

DB-size  Bytes-per-doc Write-rate  Bytes-written  Test-secs     TPS   Server
  17 GB     182          164 MB/s     77 GB           494    202672   inno-sync
  17 GB     182          159 MB/s     81 GB           540    185742   inno-lazy
  15 GB     161           60 MB/s     42 GB           686    145720   toku-sync
  14 GB     150           61 MB/s     41 GB           663    150718   toku-lazy
   X GB       X          140 MB/s   1606 GB         11451      8743   mongo24-1db-sync
  38 GB     400          148 MB/s   1566 GB         10586      9463   mongo26-1db-sync
  60 GB     644          205 MB/s   1337 GB          6485     15921   mongo24-10db-sync
  40 GB     429          216 MB/s   1310 GB          6069     16933   mongo26-10db-sync
  36 GB     387          138 MB/s   1611 GB         11640      8593   mongo24-1db-lazy
   X GB       X          147 MB/s   1574 GB         10702      9352   mongo26-1db-lazy
  60 GB     644          181 MB/s   1267 GB          6989     14684   mongo24-10db-lazy
  40 GB     429          189 MB/s   1251 GB          6629     15610   mongo26-10db-lazy
  37 GB     397          206 MB/s    995 GB          4819     20752   mongo24-1db-noj
   X GB       X          234 MB/s    878 GB          3742     26736   mongo26-1db-noj
  60 GB     644          370 MB/s    456 GB          1226     81663   mongo24-10db-noj
  40 GB     429          600 MB/s    348 GB           577    179224   mongo26-10db-noj

Results @500M rows

MongoDB TPS starts to drop as the database becomes larger than RAM.

DB-size  Bytes-per-doc Write-rate  Bytes-written  Test-secs     TPS   Server
  83 GB     178          269 MB/s    183 GB           708    141290   inno-sync
  83 GB     178          269 MB/s    180 GB           710    141611   inno-lazy
  44 GB      94           62 MB/s     59 GB           948    105421   toku-sync
  46 GB      99           61 MB/s     56 GB           918    108930   toku-lazy
 180 GB     387          147 MB/s   2284 GB         15519      6454   mongo24-1db-sync
 200 GB     429          158 MB/s   2277 GB         14394      6964   mongo26-1db-sync
 220 GB     472          249 MB/s   2230 GB          8938     11886   mongo24-10db-sync
 220 GB     472          262 MB/s   2232 GB          8528     12041   mongo26-10db-sync
 180 GB     387          146 MB/s   2287 GB         15696      6374   mongo24-1db-lazy
 199 GB     427          154 MB/s   2278 GB         14791      6773   mongo26-1db-lazy
 219 GB     470          226 MB/s   2197 GB          9718     10667   mongo24-10db-lazy
 219 GB     470          233 MB/s   2190 GB          9389     10817   mongo26-10db-lazy
 180 GB     387          261 MB/s   1997 GB          7640     13094   mongo24-1db-noj
   X GB       X          291 MB/s   1980 GB          6800     14736   mongo26-1db-noj
 220 GB     472          758 MB/s   1721 GB          2271     44067   mongo24-10db-noj
 220 GB     472          741 MB/s   1438 GB          1703     60036   mongo26-10db-noj

Results @1B rows

The gap widens between InnoDB/TokuMX and MongoDB. Another result that I saw is uneven durations for the test clients with MongoDB. With InnoDB/TokuMX the clients usually finish within a few seconds. With MongoDB I frequently see test runs where a few clients take hundreds of seconds more than other clients.

DB-size  Bytes-per-doc Write-rate  Bytes-written  Test-secs     TPS   Server
 159 GB     170          292 MB/s    292 GB          1051     95067   inno-sync
 159 GB     170          291 MB/s    291 GB          1057     94994   inno-lazy
  74 GB      79           63 MB/s     67 GB          1053     95443   toku-sync
  74 GB      79           67 MB/s     69 GB          1030     97071   toku-lazy
 400 GB     430           83 MB/s   2328 GB         28151      3557   mongo24-1db-sync
 380 GB     408           87 MB/s   2323 GB         26771      3746   mongo26-1db-sync
 419 GB     450          146 MB/s   2300 GB         15821      6880   mongo24-10db-sync
 400 GB     430          148 MB/s   2293 GB         15520      6989   mongo26-10db-sync
 400 GB     430           83 MB/s   2327 GB         28461      3517   mongo24-1db-lazy
 380 GB     408           86 MB/s   2322 GB         26937      3718   mongo26-1db-lazy
 420 GB     451          143 MB/s   2280 GB         15949      6555   mongo24-10db-lazy
 400 GB     430          148 MB/s   2267 GB         15361      6880   mongo26-10db-lazy
 400 GB     430          104 MB/s   2100 GB         20291      4938   mongo24-1db-noj
   X GB       X          110 MB/s   2098 GB         19147      5246   mongo26-1db-noj
 420 GB     451          206 MB/s   2068 GB         10039     10037   mongo24-10db-noj
 400 GB     430          214 MB/s   2063 GB          9623     10662   mongo26-10db-noj

Results @1.5B rows

Tests for mongo24-1db-lazy and mongo26-1db-lazy were stopped. I wasn't willing to wait. TPS for MongoDB continues to degrade faster than InnoDB & TokuMX.

DB-size  Bytes-per-doc Write-rate  Bytes-written  Test-secs     TPS   Server
 229 GB     163          304 MB/s    371 GB          1290     77491   inno-sync
 229 GB     163          303 MB/s    372 GB          1300     77108   inno-lazy
 110 GB      78           75 MB/s     80 GB          1072     93240   toku-sync
 110 GB      78           75 MB/s     81 GB          1068     93575   toku-lazy
 540 GB     387           54 MB/s   2336 GB         43013      2327   mongo24-1db-sync
 580 GB     415           56 MB/s   2330 GB         41803      2400   mongo26-1db-sync
 560 GB     401           84 MB/s   2322 GB         27778      3790   mongo24-10db-sync
 600 GB     429           82 MB/s   2315 GB         28257      3708   mongo26-10db-sync
 560 GB     401           93 MB/s   2316 GB         24887      4218   mongo24-10db-lazy
 600 GB     429           94 MB/s   2312 GB         24638      4185   mongo26-10db-lazy
 544 GB     389           60 MB/s   2116 GB         35192      2845   mongo24-1db-noj
   X GB       X           62 MB/s   2118 GB         34123      2951   mongo26-1db-noj
 560 GB     401           96 MB/s   2104 GB         21903      4837   mongo24-10db-noj
 600 GB     429           98 MB/s   2090 GB         21276      5115   mongo26-10db-noj

Results @1.7B rows

Tests for mongo2?-*db-sync were stopped at 1.7B rows. I wasn't willing to wait.

DB-size  Bytes-per-doc Write-rate  Bytes-written  Test-secs     TPS   Server
 620 GB     392           50 MB/s   2336 GB         46506      2153   mongo24-1db-sync
 620 GB     392           52 MB/s   2333 GB         45273      2218   mongo26-1db-sync
 639 GB     404           74 MB/s   2318 GB         31264      3348   mongo24-10db-sync
 640 GB     404           73 MB/s   2317 GB         31960      3266   mongo26-10db-sync

Results @2B rows

More of the same.

DB-size  Bytes-per-doc Write-rate  Bytes-written  Test-secs     TPS   Server
 305 GB     163          267 MB/s    449 GB          1777     56930   inno-sync
 305 GB     163          265 MB/s    445 GB          1774     56466   inno-lazy
 141 GB      76           77 MB/s     80 GB          1037     96401   toku-sync
 141 GB      76           78 MB/s     79 GB          1019     97993   toku-lazy
 704 GB     378           50 MB/s   2125 GB         42575      2352   mongo24-1db-noj
   X GB       X           51 MB/s   2124 GB         41401      2422   mongo26-1db-noj
 720 GB     387           69 MB/s   2114 GB         30779      3269   mongo24-10db-noj
 740 GB     397           65 MB/s   2111 GB         32501      3102   mongo26-10db-noj

Results @2.5B rows

More of the same. The inno-sync test was stopped. I was impatient but TPS was still pretty good.

DB-size  Bytes-per-doc Write-rate  Bytes-written  Test-secs     TPS   Server
 376 GB     161          228 MB/s    617 GB          2863     35489   inno-lazy
 180 GB      76           82 MB/s     87 GB          1044     94770   toku-sync
 180 GB      76           83 MB/s     85 GB          1023     97588   toku-lazy
 840 GB     361           47 MB/s   2202 GB         46782      2139   mongo24-1db-noj
   X GB       X           49 MB/s   2207 GB         45456      2209   mongo26-1db-noj
 860 GB     369           62 MB/s   2198 GB         35462      2824   mongo24-10db-noj
 920 GB     395           58 MB/s   2199 GB         37755      2652   mongo26-10db-noj

Results @3B rows

More of the same.  A few tests are still running and I will update this in a few days if they finish.

DB-size  Bytes-per-doc Write-rate  Bytes-written  Test-secs     TPS   Server
 448 GB     160          220 MB/s    665 GB          3186     31831   inno-lazy
 212 GB      76           84 MB/s     93 GB          1091     91657   toku-sync
 214 GB      77           86 MB/s     92 GB          1066     93724   toku-lazy
 960 GB     344           43 MB/s   2105 GB         48793      2052   mongo24-1db-noj
1100 GB     394           45 MB/s   2107 GB         47366      2123   mongo26-1db-noj
 980 GB     351           53 MB/s   2529 GB         39545      2530   mongo24-10db-noj
1100 GB     394           50 MB/s   2109 GB         42416      2362   mongo26-10db-noj

1 comment:

  1. One of the issues with InnoDB, I suspect, is the the IO layer code. It acquires too many mutexes. Once things become IO heavy my current hypothesis is that we pay a heavy price for that. It is an area that I would like to investigate (and fix) if I have time before 5.7 goes GA.

    ReplyDelete

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...