Small Datum: Biebermarks

Yet another microbenchmark result. This one is based on behavior that has caused problems in the past for a variety of reasons which lead to a few interesting discoveries. The first was that using a short lock-wait timeout was better than the InnoDB deadlock detection code. The second was that no-stored procedures could overcome network latency.

The workload is a large database where all updates are done to a small number of rows. I think it is important to use a large database to include the overhead from searching multiple levels of a b-tree. The inspiration for this is maintaining counts for popular entities like Justin Bieber and One Direction. This comes from serving the social graph. For more on that read about TAO and LinkBench.

The most popular benchmark for MySQL is sysbench and it is usually run with a uniform distribution so that all rows are equally likely to be queried or modified. But real workloads have skew which can cause stress in unexpected places and I describe one such place within InnoDB from this microbenchmark. YCSB and LinkBench are benchmarks that have skew and can be run for MySQL. I hope that more of the MySQL benchmark results in the future include skew.

Configuration

See a previous post for more details. Eight collections/tables with 400M documents/rows per collection/table were created. All collections/tables are in one database so MongoDB suffers from the per-database RW-lock. But MySQL and TokuMX also suffer from a similar issue when all clients are trying to update the same row. Tests were run for 1, 2, 4 and 8 tables where one row per table was updated. So when the test used 4 tables there were 4 rows getting updates. For each number of tables tests were run for up to 64 concurrent clients/threads. The result tables listed in the next section should make that clear.

The workload is updating the non-indexed column of one document/row by PK per transaction. There are no secondary indexes on the table. In this case the document/row with ID=1 is chosen for every update. For MySQL and TokuMX an auto-commit transaction is used. The journal (redo log) is used but the update does not wait for the journal/log to be forced to disk. The updates should not require disk reads as all relevant index and data blocks remain in cache. TokuMX might do reads in the background to maintain fractal trees but I don't understand their algorithm to be certain.

The database was loaded in PK order and about 8 hours of concurrent & random updates were done to warmup the database prior to this test. The warmup was the same workload as described in a previous post.

The MySQL test client limits clients to one table. So when there are 64 clients and 8 tables then there are 8 clients updating the 1 row per table. The MongoDB/TokuMX client does not do that. It lets all clients update all tables so in this case there are at most 64 clients updating the row per table and on average there would be 8.

The test server has 40 CPU cores with HT enabled, fast flash storage and 144G of RAM. The benchmark client and database servers shared the host. Tests were run for several configurations:

mongo26 - MongoDB 2.6.0rc2, powerOf2Sizes=1, journalCommitInterval=300, w:1,j:0
mongo24 - MongoDB 2.6.0rc2, powerOf2Sizes=0, journalCommitInterval=300, w:1,j:0
mysql - MySQL 5.6.12, InnoDB, no compression, flush_log_at_trx_commit=2, buffer_pool_size=120G, flush_method=O_DIRECT, page_size=8k, doublewrite=0, io_capacity=16000, lru_scan_depth=2000, buffer_pool_instances=8, write_io_threads=32, flush_neighbors=0
toku-32 - TokuMX 1.4.1, readPageSize=32k, quicklz compression, logFlushPeriod=300, w:1,j:0. I don't have results for toku-32 yet.
toku-64 - TokuMX 1.4.1, readPageSize=64k, quicklz compression, logFlushPeriod=300, w:1,j:0

Results per DBMS

I first list the results by DBMS to show the impact from spreading the workload over more rows/tables. The numbers below are the updates per second rate. I use "DOP=X" to indicate the number of concurrent clients and "DOP" stands for Degree Of Parallelism (it is an Oracle thing). A few conclusions from the results below:

MySQL/InnoDB does much better with more tables for two reasons. The first is that it allows for more concurrency. The second is that it avoids some of the overhead in the code that maintains row locks and threads waiting for row locks. I describe that in more detail at the end of this post.
MongoDB 2.4.9 is slightly faster than 2.6.0rc2. I think the problem is that mongod requires more CPU per update in 2.6 versus 2.4 and this looks like a performance regression in 2.6 (at least in 2.6.0rc2). I am still profiling to figure out where. More details on this are at the end of the post. I filed JIRA 13663 for this.
MongoDB doesn't benefit from spreading the load over more collections when all collections are in the same database. This is expected given the per-database RW-lock.

Updates per second
config #tables DOP=1 DOP=2 DOP=4 DOP=8 DOP=16 DOP=32 DOP=64
mysql 1 8360 15992 30182 24932 23924 23191 21048
mysql 2 X 16527 30824 49999 41045 40506 38357
mysql 4 X X 32351 51791 67423 62116 59137
mysql 8 X X X 54826 80409 73782 68128

config #tables DOP=1 DOP=2 DOP=4 DOP=8 DOP=16 DOP=32 DOP=64

mongo24 1 10212 17844 30204 34003 33895 33564 33451

mongo24 2 X 10256 17698 30547 34125 33717 33573

mongo24 4 X X 10670 17690 30903 34027 33586

mongo24 8 X X X 10379 17702 30920 33758

config #tables DOP=1 DOP=2 DOP=4 DOP=8 DOP=16 DOP=32 DOP=64

mongo26 1 9187 16131 27648 28506 27784 27437 27021

mongo26 2 X 9367 16035 27490 28326 27746 27354

mongo26 4 X X 9179 16028 27666 28330 27647

mongo26 8 X X X 9125 16038 27275 27858

config #tables DOP=1 DOP=2 DOP=4 DOP=8 DOP=16 DOP=32 DOP=64

toku-64 1 7327 12804 16179 12154 11021 9990 8344

toku-64 2 X 7173 12690 20483 23064 22354 20349

toku-64 4 X X 7191 12943 21399 33485 40124

toku-64 8 X X X 7121 12727 22096 38207

Results per number of tables

This reorders the results from above to show them for all configurations at the same number of tables. You are welcome to draw conclusions about which is faster.

Updates per second
config #tables DOP=1 DOP=2 DOP=4 DOP=8 DOP=16 DOP=32 DOP=64
mysql 1 8360 15992 30182 24932 23924 23191 21048
mongo24 1 10212 17844 30204 34003 33895 33564 33451
mongo26 1 9187 16131 27648 28506 27784 27437 27021

toku-64 1 7327 12804 16179 12154 11021 9990 8344

config #tables DOP=1 DOP=2 DOP=4 DOP=8 DOP=16 DOP=32 DOP=64
mysql 2 X 16527 30824 49999 41045 40506 38357
mongo24 2 X 10256 17698 30547 34125 33717 33573
mongo26 2 X 9367 16035 27490 28326 27746 27354
toku-64 2 X 7173 12690 20483 23064 22354 20349

config #tables DOP=1 DOP=2 DOP=4 DOP=8 DOP=16 DOP=32 DOP=64
mysql 4 X X 32351 51791 67423 62116 59137
mongo24 4 X X 10670 17690 30903 34027 33586
mongo26 4 X X 9179 16028 27666 28330 27647
toku-64 4 X X 7191 12943 21399 33485 40124

config #tables DOP=1 DOP=2 DOP=4 DOP=8 DOP=16 DOP=32 DOP=64
mysql 8 X X X 54826 80409 73782 68128
mongo24 8 X X X 10379 17702 30920 33758
mongo26 8 X X X 9125 16038 27275 27858
toku-64 8 X X X 7121 12727 22096 38207

Row locks for InnoDB

I used PMP to understand MySQL/InnoDB on this workload. I frequently saw all user threads blocked on a condition variable with this stack trace. It seems odd that all threads are sleeping. I think the problem is that one thread can run but has yet to be scheduled by Linux. My memory of the row lock code is that it wakes threads in FIFO order and when N threads wait for a lock on the same row then each thread waits on a separate condition variable. I am not sure if this code has been improved in MySQL 5.7. A quick reading of some of the 5.6.12 row lock code showed many mutex operations. Problems in this code have escaped scrutiny in the past because much of our public benchmark activity has used workloads with uniform distributions.

pthread_cond_wait@@GLIBC_2.3.2,os_cond_wait,os_event_wait_low2,lock_wait_suspend_thread,row_mysql_handle_errors,row_search_for_mysql,ha_innobase::index_read,handler::read_range_first,handler::multi_range_read_next,QUICK_RANGE_SELECT::get_next,rr_quick,mysql_update,mysql_execute_command,mysql_parse,dispatch_command,do_command,do_handle_one_connection,handle_one_connection

This was a less frequent stack trace from the test ...

lock_get_mode,lock_table_other_has_incompatible,lock_table,row_search_for_mysql,ha_innobase::index_read,handler::read_range_first,handler::multi_range_read_next,QUICK_RANGE_SELECT::get_next,rr_quick,mysql_update,mysql_execute_command,mysql_parse,dispatch_command,do_command,do_handle_one_connection,handle_one_connection

Row locks for TokuMX

TokuMX has a similar point at which all threads wait. It isn't a big surprise given that both provide fine-grained concurrency control but there is no granularity finer than a row lock.

pthread_cond_timedwait@@GLIBC_2.3.2,toku_cond_timedwait,toku::lock_request::wait,toku_db_wait_range_lock,toku_c_getf_set(__toku_dbc*,,db_getf_set,autotxn_db_getf_set(__toku_db*,,mongo::CollectionBase::findByPK(mongo::BSONObj,mongo::queryByPKHack(mongo::Collection*,,mongo::updateObjects(char,mongo::lockedReceivedUpdate(char,mongo::receivedUpdate(mongo::Message&,,mongo::assembleResponse(mongo::Message&,,mongo::MyMessageHandler::process(mongo::Message&,,mongo::PortMessageServer::handleIncomingMsg(void*)

MongoDB 2.4 versus 2.6

I get about 1.2X more updates/second with MongoDB 2.4.9 compared to 2.6.0rc2. I think the problem is that 2.6 uses more CPU per update. I filed JIRA 13663 for this but am still trying to profile the code. So far I know the following all of which indicates that the 2.4.9 test is running 1.2X faster than 2.6.0rc2 with 32 client threads and 1 table:

I get ~1.2X more updates/second with 2.4.9
the Java sysbench client uses ~1.2X more CPU per "top" with 2.4.9
the context switch rate is ~1.2X higher with 2.4.9

The interesting point is that mongod for 2.4.9 only uses ~1.03X more CPU than 2.6.0rc2 per "top" during this test even though it is doing 1.2X more updates/second. So 2.6.0rc2 uses more CPU per update. I will look at "perf" output. I can repeat this with the GA version of 2.6.

Small Datum

Friday, April 18, 2014

Biebermarks

Configuration

Results per DBMS

Results per number of tables

Row locks for InnoDB

Row locks for TokuMX

MongoDB 2.4 versus 2.6

No comments:

Post a Comment

Postgres 18 beta2: large server, Insert Benchmark, part 2