Tuesday, September 8, 2015

Third day with InnoDB transparent page compression

My first two days with InnoDB transparent page compression didn't turn out well. Transparent page compression can make InnoDB source code simpler and InnoDB more performant on insert heavy workloads. Unfortunately the versions of XFS that I use are not happy after doing a hole-punch on write. The performance summary is that with transparent compression:

  • Database load is slightly faster
  • Transaction processing is slightly slower
  • DROP TABLE is 43X slower

MySQL 5.6 vs 5.7

I used a host with 24 HW threads, 144G of RAM and a 400G Intel s3700 SSD. The server uses Fedora 19, XFS and Linux kernel 3.14.27-100.fc19. The benchmark application is linkbench and was run with maxid1=100M, loaders=10 and requesters=20 (10 clients for the load, 20 for queries). I compared the Facebook patch for MySQL 5.6 with upstream MySQL 5.7. For 5.6 I used old-style compression for linktable and counttable. For 5.7 I used transparent compression for all tables. I also used 32 partitions for 5.6 and no partitions for 5.7. After the database was loaded I ran the query test for 140 loops with 1 hour per loop.

The table below has data for the Facebook patch for MySQL 5.6 (fb56) and upstream 5.7.8 (orig578). The results are the row insert rate during the load (ips) and the average QPS of the Nth hourly run (qps@N). The QPS is better for MySQL 5.6 and the load rate is better for 5.7 but I don't know how much of that is due to the use of partitions for 5.6. I ran DROP TABLE after the test and that took ~8 minutes for MySQL 5.7.8. More details are in a previous post.


          fb56      orig578
ips       55199     81970
qps@20    13731     10581
qps@40    12172      9874
qps@60    11353      8875
qps@80    10977      8234
qps@100   10793      8021
qps@120   10691      7946
qps@140   10636      7949

Transparent vs old-style compression

I then ran a test for 24 hours to compare MySQL 5.7.8 in two setups and both used partitions for all tables. They differed in that one used old-style compression and the other used transparent compression. The results were similar to the comparison with MySQL 5.6 as the load was faster with transparent compression. transaction processing was faster with old-style compression and DROP TABLE was ~30X slower with transparent compression.

After the linkbench load I ran the query test for 24 1-hour loops. At test end the database with old-style compression was 4% larger than transparent compression, but it also had more data as it sustained a higher QPS rate. I didn't count the number of rows to determine whether it had 4% more data.

The table below displays the row insert rate during the load (ips) and the average QPS from 1-hour runs at the 2nd, 12th and 24th hours (qps@N). The load rate is better with transparent compression and the QPS is better with old-style compression.


           578, old-style    578, transparent
ips        72566             79518
qps@2      16542             15504
qps@12     16079             15136
qps@24     15506             14383

Transparent compression doesn't have to provide better compression or performance to be a win, but it needs to be stable. I ran DROP DATABASE at test end and that took 5 seconds for old-style compression vs 216 seconds for transparent. The database was ~100G when dropped.

This paste has the output from the 24th 1-hour run of the linkbench query test. There are two sections, the first is from old-style compression and the second from transparent compression. For most of the linkbench operations old-style is slightly faster. But the max times for operations is much worse (~2X) with transparent.

Friday, September 4, 2015

Linkbench for MySQL 5.7.8 with an IO-bound database

I wanted to try InnoDB transparent page compression that is new in the MySQL 5.7.8 RC. That didn't work out, so I limited my tests to old-style compression. I compared MyRocks with InnoDB from the Facebook patch for 5.6, upstream 5.6.26 and upstream 5.7.8. My performance summary is:

  • MyRocks loads data faster than InnoDB. This isn't a new result. Non-unique secondary index maintenance doesn't require a read before the write (unlike a B-Tree). This is also helped by less random IO on writes and better compression.
  • MyRocks compression is much better than compressed InnoDB. After 24 hours it used between 56% and 64% of the space compared to the compressed InnoDB configurations.
  • MyRocks QPS degrades over time. This will be fixed real soon.
  • Partitioning improves InnoDB load performance in MySQL 5.6 for compressed and non-compressed tables. This reduces stalls from the per-index mutex used by InnoDB when inserts cause or might cause a page split (pessimistic code path) because there is one mutex per partition. With MySQL 5.7 partitioning doesn't help in the non-compressed table case. There has been work in 5.7 to reduce contention on the per-index mutex and I think it helped. I suspect it is still needed with old-style compression because compressed page splits are more expensive as they include recompression.
  • The Facebook patch for MySQL 5.6 is faster than upstream 5.6 and competitive with upstream 5.7.8. Too bad that patches might not reach upstream.

Configuration

My test server has 144G of RAM, 40 HW threads with HT enabled and fast PCIe flash storage. I configured linkbench with loaders=10, requesters=20 and maxid1=1B. This uses 10 clients for the load, 20 clients for the query runs and about 1B rows in the node table after the load. The linkbench clients share the server with mysqld. The my.cnf settings are explained in a previous post.  The load was done with the binlog disabled. After the load there were 12 1-hour runs of the query test and I report results for hours 2 and 12. Then mysqld was restarted with the binlog enabled and 12 more 1-hour runs of the query test were done and I report results for hours 14 and 24. Fsync for the binlog was disabled. Fsync for the InnoDB redo log was done by a background thread (innodb_flush_log_at_trx_commit=2). Note that the InnoDB page size was 8kb so I used 2X compression for the link and count tables. The node table is not compressed for InnoDB because it is unlikely to compression by 50%.

I tested the following binaries:
  • myrocks - RocksDB storage engine for MySQL using the Facebook patch for MySQL 5.6
  • fb56 - InnoDB using the Facebook patch for MySQL 5.6
  • orig56 - upstream MySQL 5.6.26
  • orig57 - upstream MySQL 5.7.8
The partitioning and compression options are described by the following.  For partitioning I use 32 partitions and transactions/queries don't span partitions. All of the DDL is here.
  • p0 - no partitioning for RocksDB
  • p1 - partitioning for RocksDB
  • p0.c0 - no partitioning, no compression for InnoDB
  • p0.c1 - no partitioning, old-style compression for InnoDB
  • p1.c0 - partitioning, no compression for InnoDB
  • p1.c1 - partitioning, old-style compression for InnoDB

Results

This lists the database size in GB after the load and query tests at the 2nd, 12th, 14th and 24th hours. I don't have sufficient granularity in my measurement script for databases larger than 1T. I am not sure why compression with upstream 5.6 and 5.7 uses more space than with the Facebook patch.

Update - I removed the results for myrocks, p1 because my measurements were wrong.

load    2h      12h     14h     24h
gb      gb      gb      gb      gb      config
 487     493     512     514     523    myrocks, p0
.
11XX    11XX    12XX    12XX    13XX    fb56, p0.c0
 666     697     779     787     814    fb56, p0.c1
11XX    12XX    12XX    13XX    13XX    fb56, p1.c0
 707     745     803     808     826    fb56, p1.c1
.
12XX    12XX    13XX    14XX    14XX    orig56, p0.c0
 756     790     879     889     920    orig56, p0.c1
13XX    13XX    14XX    14XX    14XX    orig56, p1.c0
 803     838     901     907     930    orig56, p1.c1
.
12XX    13XX    14XX    14XX    15XX    orig57, p0.c0
 756     796     892     902     931    orig57, p0.c1
13XX    13XX    14XX    14XX    15XX    orig57, p1.c0
 803     844     844     916     940    orig57, p1.c1


This lists the insert rate during the load (load ips) and the average query rates for the 2nd, 12th, 14th and 24th hours. Note that the query rate is lousy for p0.c1 immediately after the load. The problem is that the b-tree pages are almost full after the load and then over time many of them get split. There are stalls from page splits with compression and over time the page split rate drops.


load    2h      12h     14h     24h
ips     qps     qps     qps     qps     config
165210  31826   22347   21293   17888   myrocks, p0
103145  30045   22376   21325   18387   myrocks, p1
.
109355  21151   23733   23478   24865   fb56, p0.c0
 74210   8261   13928   14706   18656   fb56, p0.c1
104900  26953   26029   25161   25479   fb56, p1.c0
 90162  19888   24431   22596   22811   fb56, p1.c1
.
105356  16472   16873   16575   17073   orig56, p0.c0
 45966   7638   12492   13178   16516   orig56, p0.c1
 98104  18797   18273   17625   17702   orig56, p1.c0
 66738  17731   19854   19159   19418   orig56, p1.c1
.
122454  31009   30260   29905   29751   orig57, p0.c0
 49101   9217   17552   18448   22092   orig57, p0.c1
114400  28191   26797   25820   25832   orig57, p1.c0
 69746  22028   25204   23882   23983   orig57, p1.c1

This is the same data as above, but grouped by configuration.

load    2h      12h     14h     24h
ips     qps     qps     qps     qps     config
109355  21151   23733   23478   24865   fb56, p0.c0
105356  16472   16873   16575   17073   orig56, p0.c0
122454  31009   30260   29905   29751   orig57, p0.c0
.
165210  31826   22347   21293   17888   myrocks, p0
 74210   8261   13928   14706   18656   fb56, p0.c1
 45966   7638   12492   13178   16516   orig56, p0.c1
 49101   9217   17552   18448   22092   orig57, p0.c1
.
104900  26953   26029   25161   25479   fb56, p1.c0
 98104  18797   18273   17625   17702   orig56, p1.c0
114400  28191   26797   25820   25832   orig57, p1.c0
.
103145  30045   22376   21325   18387   myrocks, p1
 90162  19888   24431   22596   22811   fb56, p1.c1
 66738  17731   19854   19159   19418   orig56, p1.c1
 69746  22028   25204   23882   23983   orig57, p1.c1

Graphs

For people who prefer graphs I include one for the load rates and another for the QPS from the configurations that use partitioning.


Thursday, September 3, 2015

Second day with InnoDB transparent page compression

My first day with InnoDB transparent page compression didn't turn out OK, but I am an optimist so onto day 2. I gave up trying to use it on a host with CentOS 6.6, XFS and a 3.10.53 kernel. I had a bit more luck with a Fedora 19 host using XFS and a 3.14.27 kernel and I ran linkbench.

You can follow along here or at bug 78277.

I configured linkbench with maxid1=100M and the database was about 100GB after the load. I used MySQL 5.7.8 with transparent page compression. I won't share the performance results just yet, but at the and of 7 days of the query test I shutdown mysqld and my storage device (400G Intel s3700) had 180G free space. I confirmed that I was able to use at least 176 of the 180 GB, which was a good sign. However, the device was only able to write at ~60 MB/second while making copies of a 2G test file. File copies were writing at ~200 MB/second on a second host that has the same hardware, the same amount of free space, but didn't use hole-punch for compression. So this is one more cost of using hole-punch, it makes the filesystem much slower. But I don't blame XFS. I don't think that doing a hole-punch on every small write was the intended use for that feature.

Then I ran DROP DATABASE for the database used by linkbench. It contained 3 unpartitioned tables. Two were ~60G and one was ~10G. The DROP TABLE took 8 minutes. While that ran there were many warnings like this:
[Warning] InnoDB: Trying to close/delete/truncate tablespace 'linkdb/linktable' but there are 1 pending operations on it.
[Warning] InnoDB: Trying to close/delete/truncate tablespace 'linkdb/linktable' but there are 1 pending operations on it.

Then it got more exciting -- a long semaphore wait courtesy of a slow unlink. Alas the DROP TABLE finished before the background thread killed mysqld.
InnoDB: A long semaphore wait:--Thread 139616032126720 has waited at row0purge.cc line 830 for 241.00 seconds the semaphore:
S-lock on RW-latch at 0x327b74a8 created in file dict0dict.cc line 1153
a writer (thread id 139616238388992) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 830
Last time write locked in file /home/mdcallag/b/mysql-5.7.8-rc/storage/innobase/row/row0mysql.cc line 4202
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:


RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...