Friday, September 4, 2015

Linkbench for MySQL 5.7.8 with an IO-bound database

I wanted to try InnoDB transparent page compression that is new in the MySQL 5.7.8 RC. That didn't work out, so I limited my tests to old-style compression. I compared MyRocks with InnoDB from the Facebook patch for 5.6, upstream 5.6.26 and upstream 5.7.8. My performance summary is:

  • MyRocks loads data faster than InnoDB. This isn't a new result. Non-unique secondary index maintenance doesn't require a read before the write (unlike a B-Tree). This is also helped by less random IO on writes and better compression.
  • MyRocks compression is much better than compressed InnoDB. After 24 hours it used between 56% and 64% of the space compared to the compressed InnoDB configurations.
  • MyRocks QPS degrades over time. This will be fixed real soon.
  • Partitioning improves InnoDB load performance in MySQL 5.6 for compressed and non-compressed tables. This reduces stalls from the per-index mutex used by InnoDB when inserts cause or might cause a page split (pessimistic code path) because there is one mutex per partition. With MySQL 5.7 partitioning doesn't help in the non-compressed table case. There has been work in 5.7 to reduce contention on the per-index mutex and I think it helped. I suspect it is still needed with old-style compression because compressed page splits are more expensive as they include recompression.
  • The Facebook patch for MySQL 5.6 is faster than upstream 5.6 and competitive with upstream 5.7.8. Too bad that patches might not reach upstream.

Configuration

My test server has 144G of RAM, 40 HW threads with HT enabled and fast PCIe flash storage. I configured linkbench with loaders=10, requesters=20 and maxid1=1B. This uses 10 clients for the load, 20 clients for the query runs and about 1B rows in the node table after the load. The linkbench clients share the server with mysqld. The my.cnf settings are explained in a previous post.  The load was done with the binlog disabled. After the load there were 12 1-hour runs of the query test and I report results for hours 2 and 12. Then mysqld was restarted with the binlog enabled and 12 more 1-hour runs of the query test were done and I report results for hours 14 and 24. Fsync for the binlog was disabled. Fsync for the InnoDB redo log was done by a background thread (innodb_flush_log_at_trx_commit=2). Note that the InnoDB page size was 8kb so I used 2X compression for the link and count tables. The node table is not compressed for InnoDB because it is unlikely to compression by 50%.

I tested the following binaries:
  • myrocks - RocksDB storage engine for MySQL using the Facebook patch for MySQL 5.6
  • fb56 - InnoDB using the Facebook patch for MySQL 5.6
  • orig56 - upstream MySQL 5.6.26
  • orig57 - upstream MySQL 5.7.8
The partitioning and compression options are described by the following.  For partitioning I use 32 partitions and transactions/queries don't span partitions. All of the DDL is here.
  • p0 - no partitioning for RocksDB
  • p1 - partitioning for RocksDB
  • p0.c0 - no partitioning, no compression for InnoDB
  • p0.c1 - no partitioning, old-style compression for InnoDB
  • p1.c0 - partitioning, no compression for InnoDB
  • p1.c1 - partitioning, old-style compression for InnoDB

Results

This lists the database size in GB after the load and query tests at the 2nd, 12th, 14th and 24th hours. I don't have sufficient granularity in my measurement script for databases larger than 1T. I am not sure why compression with upstream 5.6 and 5.7 uses more space than with the Facebook patch.

Update - I removed the results for myrocks, p1 because my measurements were wrong.

load    2h      12h     14h     24h
gb      gb      gb      gb      gb      config
 487     493     512     514     523    myrocks, p0
.
11XX    11XX    12XX    12XX    13XX    fb56, p0.c0
 666     697     779     787     814    fb56, p0.c1
11XX    12XX    12XX    13XX    13XX    fb56, p1.c0
 707     745     803     808     826    fb56, p1.c1
.
12XX    12XX    13XX    14XX    14XX    orig56, p0.c0
 756     790     879     889     920    orig56, p0.c1
13XX    13XX    14XX    14XX    14XX    orig56, p1.c0
 803     838     901     907     930    orig56, p1.c1
.
12XX    13XX    14XX    14XX    15XX    orig57, p0.c0
 756     796     892     902     931    orig57, p0.c1
13XX    13XX    14XX    14XX    15XX    orig57, p1.c0
 803     844     844     916     940    orig57, p1.c1


This lists the insert rate during the load (load ips) and the average query rates for the 2nd, 12th, 14th and 24th hours. Note that the query rate is lousy for p0.c1 immediately after the load. The problem is that the b-tree pages are almost full after the load and then over time many of them get split. There are stalls from page splits with compression and over time the page split rate drops.


load    2h      12h     14h     24h
ips     qps     qps     qps     qps     config
165210  31826   22347   21293   17888   myrocks, p0
103145  30045   22376   21325   18387   myrocks, p1
.
109355  21151   23733   23478   24865   fb56, p0.c0
 74210   8261   13928   14706   18656   fb56, p0.c1
104900  26953   26029   25161   25479   fb56, p1.c0
 90162  19888   24431   22596   22811   fb56, p1.c1
.
105356  16472   16873   16575   17073   orig56, p0.c0
 45966   7638   12492   13178   16516   orig56, p0.c1
 98104  18797   18273   17625   17702   orig56, p1.c0
 66738  17731   19854   19159   19418   orig56, p1.c1
.
122454  31009   30260   29905   29751   orig57, p0.c0
 49101   9217   17552   18448   22092   orig57, p0.c1
114400  28191   26797   25820   25832   orig57, p1.c0
 69746  22028   25204   23882   23983   orig57, p1.c1

This is the same data as above, but grouped by configuration.

load    2h      12h     14h     24h
ips     qps     qps     qps     qps     config
109355  21151   23733   23478   24865   fb56, p0.c0
105356  16472   16873   16575   17073   orig56, p0.c0
122454  31009   30260   29905   29751   orig57, p0.c0
.
165210  31826   22347   21293   17888   myrocks, p0
 74210   8261   13928   14706   18656   fb56, p0.c1
 45966   7638   12492   13178   16516   orig56, p0.c1
 49101   9217   17552   18448   22092   orig57, p0.c1
.
104900  26953   26029   25161   25479   fb56, p1.c0
 98104  18797   18273   17625   17702   orig56, p1.c0
114400  28191   26797   25820   25832   orig57, p1.c0
.
103145  30045   22376   21325   18387   myrocks, p1
 90162  19888   24431   22596   22811   fb56, p1.c1
 66738  17731   19854   19159   19418   orig56, p1.c1
 69746  22028   25204   23882   23983   orig57, p1.c1

Graphs

For people who prefer graphs I include one for the load rates and another for the QPS from the configurations that use partitioning.


No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...