Tuesday, April 22, 2014

Concurrent, read-only & cached: MongoDB, TokuMX, MySQL

This has results for a read-only workload where all data is cached. The test query fetches all columns in one doucment/row by PK. For InnoDB all data is in the buffer pool. For TokuMX and MongoDB all data is in the OS filesystem cache and accessed via mmap'd files. The test server has 40 CPU cores with HT enabled and the test clients share the host with mysqld/mongod to reduce variance from network latency. This was similar to a previous test, except the database is in cache and the test host has more CPU cores. The summary of my results is:
  • MongoDB 2.6 has a performance regression from using more CPU per query. The regression might be limited to simple queries that do single row lookups on the _id index. I spent a bit of time rediscovering how to get hierarchical CPU profile data from gperftools to explain this. JIRAs 13663 and 13685 are open for this.
  • MySQL gets much more QPS at high concurrency than MongoDB and TokuMX
  • MySQL gets more QPS using the HANDLER interface than SELECT. I expect the InnoDB memcached API to be even faster than HANDLER but did not test it.
  • MySQL uses more CPU per query in 5.7.4 than in 5.6.12 but this didn't have an impact on QPS

Setup

The test was repeated for 1, 2, 4, 8, 16, 32 and 40 concurrent clients. It uses my forked versions of the MongoDB and C clients for sysbench. There are 8 collections/tables in one database. Each table has 400M rows but queries are limited to the first 1M. I don't know yet whether using a database per collection would improve the MongoDB results. Each query fetches all columns in one document/row by PK. I have yet to push my changes to the MongoDB sysbench client to make it fetch all columns. I tested these binaries:
  • fb56.handler - MySQL 5.6.12 with the Facebook patch and 8k pages. Uses HANDLER to fetch data.
  • fb56.sql - MySQL 5.6.12 with the Facebook patch and 8k pages. Uses SELECT to fetch data.
  • orig57.handler - MySQL 5.7.4 without the Facebook patch and 8k pages. Uses HANDLER to fetch data.
  • orig57.sql - MySQL 5.7.4 without the Facebook patch and 8k pages. Uses SELECT to fetch data.
  • tokumx - TokuMX 1.4.1 using quicklz and 32kb pages. There should be no decompression during the test as all data used by the test (1M documents) is much smaller than 50% of RAM.
  • mongo24 - MongoDB 2.4.9
  • mongo26 - MongoDB 2.6.0

Results

At last I included a graph. I have been reluctant to include graphs on previous posts comparing MongoDB, TokuMX and MySQL because I want to avoid benchmarketing and drive-by analysis. These tests have been time consuming to run and document and I don't want to make it too easy to misinterpret the results. Results for MySQL 5.7.4 are not in the graph to make it easier to read. The top two bars (blue & red) are for MySQL and you can see that QPS increases with more concurrency. QPS for MongoDB and TokuMX saturates at a lower level of concurrency.
Numbers used for the graph above.

point queries per second
    1      2      4      8     16     32     40  clients
17864  32397  60294 106374 184566 298276 350665  fb56.handler
11730  22884  39646  73485 131533 215487 249402  fb56.sql
18161  33262  59413 107505 185894 306084 371045  orig57.handler
11775  21838  40528  75322 135331 227450 266917  orig57.sql
14298  25219  45743  83214 142489 168498 161840  tokumx
17203  30158  52476  94705 161922 174453 170177  mongo24
10705  19502  34318  61977 109684 152667 151555  mongo26

Analysis

I used vmstat to measure the average CPU utilization (user + system) during the test. The numbers below are: (CPU_utilization / QPS) * 1,000,000. There are some interesting details.
  • the values are larger for MySQL 5.7 than for 5.6 at low concurrency. Note that in both cases the performance schema was disabled at compile time.
  • the values are much larger for MongoDB 2.6 than for 2.4 and hopefully this can be fixed via JIRAs 13663 and 13685.
(CPU_utilization / QPS) * 1,000,000
  1      2      4      8     16     32     40  clients
218    197    197    208    216    251    268  fb56.handler
323    310    287    298    304    352    372  fb56.sql
357    279    240    216    215    248    250  orig57.handler
407    380    313    288    302    342    359  orig57.sql
272    269    251    254    266    302    296  tokumx
232    215    219    225    234    257    252  mongo24
373    333    340    342    355    425    422  mongo26

I also used vmstat to measure the context switch rate and the table below lists the number of context switches per query. Note that the rate decreases with concurrency for MySQL but not for MongoDB and TokuMX. I don't know enough about Linux internals to interpret this.

vmstat.cs / QPS

context switch per query
     1      2      4      8     16     32     40  clients
  4.44   4.14   4.01   3.79   3.47   3.05   2.19  fb56.handler
  4.61   4.32   4.03   3.84   3.59   3.23   2.65  fb56.sql
  4.53   4.27   4.07   3.88   3.52   3.08   2.20  orig57.handler
  4.81   4.48   4.19   3.96   3.63   3.07   2.19  orig57.sql
  4.59   4.30   4.08   3.87   3.77   4.32   4.32  tokumx
  4.54   4.23   4.03   3.84   3.79   4.29   4.30  mongo24
  4.80   4.43   4.21   3.99   3.93   4.58   4.63  mongo26

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...