Small Datum: TokuMX, MongoDB and InnoDB, IO-bound update-only with fast storage

I repeated the update-only IO-bound tests using pure-flash servers to compare TokuMX, MongoDB and InnoDB. The test setup was the same as on the pure-disk servers except for the hardware. In this case the servers have fast flash storage, 144G of RAM and 24 CPU cores with HT enabled. As a reminder, the InnoDB change buffer and TokuMX fractal tree don't help on this workload because there are no secondary indexes to maintain. Note that all collections/tables are in one database for this workload thus showing the worst-case for the MongoDB per-database RW-lock. The result summary:

InnoDB is much faster than MongoDB and TokuMX. This test requires a high rate of dirty page writeback and thanks to a lot of work from the InnoDB team at MySQL with help from Percona and Facebook (and others) the InnoDB engine is now very good at that. Relative to MongoDB, InnoDB also benefits from a clustered PK index.
MongoDB is much slower than InnoDB for two reasons. First it doesn't have a clustered PK index so it might do storage reads for both the index search and then while reading the document. The second reason is the per-database RW-lock. As I described previously this lock appears to be held during disk reads when the index is searched so at most one thread searches the index at a time even though there are concurrent update requests. I created JIRA 3177 to make that obvious in the documentation. Because of this the peak rate for MongoDB is approximately the number of reads per second that one thread can do from the flash device. The device can sustain many more reads/second with concurrency but MongoDB doesn't get much benefit from it. I think there will be at most 2 concurrent flash/disk reads at any time -- one while searching the index and the other while prefetching the document into RAM after releasing the per-database RW-lock in Record::touch.
TokuMX also benefits from the clustered PK index but it suffers from other problems that I was unable to debug. I think it can do much better once a Toku expert reproduces the problem on their hardware.

Configuration

This test used the sysbench clients as described previously. Tests were run for 8, 16, 32 and 64 concurrent clients. There were 8 collections/tables in one database with 400M documents/rows per collection/table. The test server has fast flash storage that can do more than 5000 reads/second from one thread and more than 50,000 reads/second from many threads. The server also has 24 CPU cores with HT enabled and 144G of RAM. The sysbench clients ran on the same host as mysqld/mongod. Tests were first run for 30 minutes at each concurrency level to warmup the DBMS and then for either 60 or 120 minutes when measurements were taken. I tested the configurations listed below. I ran tests for more configurations but forgot to adjust read_ahead_kb so I won't publish results from those hosts.

mongo-p2y - 874 GB database, MongoDB 2.6.0rc2, powerOf2Sizes=1, journalCommitInterval=300, w:1,j:0

mysql-4k - 698 GB database, MySQL 5.6.12, InnoDB, no compression, flush_log_at_trx_commit=2, buffer_pool_size=120G, flush_method=O_DIRECT, page_size=4k, doublewrite=0, io_capacity=16000, lru_scan_depth=2000, buffer_pool_instances=8, write_io_threads=32, flush_neighbors=0

mysql-8k - 698 GB database, MySQL 5.6.12, InnoDB, no compression, flush_log_at_trx_commit=2, buffer_pool_size=120G, flush_method=O_DIRECT, page_size=8k, doublewrite=0, io_capacity=16000, lru_scan_depth=2000, buffer_pool_instances=8, write_io_threads=32, flush_neighbors=0

tokumx-quicklz - 513 GB database, TokuMX 1.4.1 with quicklz compression, logFlushPeriod=300, w:1,j:0

Results

That probably isn't a typo below. InnoDB sustained about 5 to 10 times more updates/second. MongoDB does many more disk reads per update which is similar to the pure-disk results. I don't have the expertise to explain why TokuMX results weren't better but I shared information with the Tokutek team. Bytes written to storage per update is listed for InnoDB to show the impact on the write rate from using a smaller page. That can be important when flash endurance must be improved.

TPS
configuration 8 clients 16 clients 32 clients 64 clients
mysql-8k 24834 33886 37573 40198
mysql-4k 24826 31704 34644 34987
tokumx-quicklz 3706 3950 3552 3357
mongo-p2y 5194 5167 5173 5102

disk reads/second from iostat r/s
configuration 8 clients 16 clients 32 clients 64 clients

mysql-8k 20995 28371 31397 33537

mysql-4k 22016 27985 30553 30972

tokumx-quicklz 4943 5641 4962 4783

mongo-p2y 8960 8921 8951 8859

disk reads per update

configuration 8 clients 16 clients 32 clients 64 clients

mysql-8k 0.85 0.84 0.84 0.83

mysql-4k 0.89 0.88 0.88 0.89

tokumx-quicklz 1.33 1.43 1.40 1.42

mongo-p2y 1.73 1.73 1.73 1.74

bytes written per update

configuration 8 clients 16 clients 32 clients 64 clients

mysql-8k 6.56 6.40 5.36 5.36

mysql-4k 3.86 3.72 3.76 3.78

Small Datum

Wednesday, April 16, 2014

TokuMX, MongoDB and InnoDB, IO-bound update-only with fast storage

Configuration

Results

No comments:

Post a Comment

Postgres 18 beta3, large server, sysbench