Wednesday, November 8, 2017

IO-bound insert benchmark for InnoDB on a small server: MySQL 5.0 to 8.0

This post explains performance for the insert benchmark with InnoDB from MySQL versions 5.0 to 8.0. The goal is to understand how performance has changed across releases. This uses an IO-bound workload with small servers. The previous post covered an in-memory workload for InnoDB.

tl;dr - from 5.0.96 to 8.0.3
  • Insert performance improved by ~3X after MySQL 5.5 when I switching from built-in InnoDB to modern InnoDB.
  • Write stalls are a problem but I need better metrics for reporting this.
  • Secondary index scans are 1.09X faster on the i5 NUC and 1.64X faster on the i3 NUC
  • PK index scans are 1.88X faster on the i5 NUC and 1.96X faster on the i3 NUC
  • QPS increased by 5% on the i5 NUC and decreased by 25% on the i3 NUC in the read-write test that does 100 writes/second
tl;dr - from 5.6.35 to 8.0.3
  • The insert rate decreased by 8% on the i5 NUC and increased by 19% on the i3 NUC
  • There is a big improvement to range scan performance starting in MySQL 5.7
  • Secondary index scans are 1.24X faster on the i5 NUC and 1.16X faster on the i3 NUC
  • PK index scans are 1.35X faster on the i5 NUC and 1.25X faster on the i3 NUC.
  • QPS didn't change on the i5 NUC and increased by 6% on the i3 NUC in the read-write test that does 100 writes/second

Configuration

The tests used InnoDB from upstream MySQL versions 5.0.96, 5.1.72, 5.5.51, 5.6.35, 5.7.17 and 8.0.3. The built-in InnoDB (remember that?) was used for 5.0, 5.1 and 5.5 and then I switched to modern InnoDB starting with 5.6. All tests used jemalloc with mysqld. The i3 and i5 NUC servers are described here. The insert benchmark is described here. The my.cnf files are here for the i3 NUC and i5 NUC. The i5 NUC has more RAM, faster CPUs and faster storage than the i3 NUC. I tried to tune my.cnf for all engines including: disabled SSL for MySQL 5.7, used the same charset and collation and set innodb_purge_threads=1 to reduce mutex contention. For all tests the binlog was enabled but fsync was disabled for the binlog and database redo log. I compiled all of the MySQL versions on the test servers and did not use binaries from upstream.

The database is larger than RAM as the test table has ~250M rows for the i3 NUC and ~500M rows for the i5 NUC. The i3 NUC has 8gb of RAM and the i5 NUC has 16gb. The insert benchmark loaded the table with 250M or 500M rows rows, then did a full scan of each index on the table (PK, 3 secondary, PK again), then two read-write tests. The first read-write test tries to insert 1000 rows/second with one client while the other client does short range scans as fast as possible. The second read-write test is similar except the insert rate limit is 100/second.

Results

All of the data for the tests is here.

Results: load

The graph below has the insert rate for each release relative to the rate for InnoDB in MySQL 5.0.96. Performance improves dramatically in 5.6 thanks to switching from the built-in to modern InnoDB.

Additional metrics help to explain performance. The metrics are explained here. With the switch from built-in to modern InnoDB in MySQL 5.6 there are big improvements to the insert rate, write-amplification (wKB/i) and the CPU overhead per insert (Mcpu/i). From 5.6 to 8.0 on the i5 NUC the insert rate decreased by 8% while the CPU overhead/insert increased by 9%. I assume the new overhead is from code above InnoDB. From 5.6 to 8.0 on the i3 NUC the insert rate increased by 19% while the CPU overhead/insert decreased by 25%. I assume the change in CPU overhead explains the change in performance. While write-amplification decreased with modern InnoDB, the storage read overhead per insert (rKB/i) has increased. I haven't tried to explain that. Write stalls might be a problem because the 99th percentile insert rate is much worse than the average, but I need a better metric for reporting this.

i3 NUC
        ips.av  ips.99  rKB/i   wKB/i   Mcpu/i  size(GB)
5.0.96   864     502    0.38    106.56  10545    52
5.1.72   883     548    0.36    106.75  10850    52
5.5.51  1246     741    3.52     78.61   7791    47
5.6.35  2665    1185    6.10     39.08   4934    46
5.7.17  3242    1814    2.54     34.24   3526    46
8.0.3   3162    1771    2.63     35.08   3663    46

i5 NUC
        ips.av  ips.99  rKB/i   wKB/i   Mcpu/i  size(GB)
5.0.96  1892    1088    0.28    114.11   8533   102
5.1.72  1877    1116    0.29    114.33   8850   102
5.5.51  2516    1452    1.55     95.17   7754    94
5.6.35  5978    2431    2.10     44.50   4265    90
5.7.17  5634    3131    1.26     44.92   4690    90
8.0.3   5482    3017    1.29     45.93   4657    90

Results: scan

Below are tables that show the number of seconds for each full index scan: 1 is the PK, 2/3/4 are the secondary indexes and 5 is the PK again. The scanned index has 250M entries on the i3 NUC and 500M entries on the i5 NUC. The first PK index scan (#1) is slower than the second PK scan (#5) because writeback is in progress during the first scan and competes for storage IO. Writeback for a dirty page can also delay a read waiting for a clean page. Something was done to make range scans faster starting in MySQL 5.7 (thanks to the InnoDB team).

Comparing InnoDB between 5.0.96 and 8.0.3: secondary index scans are 1.09X faster on the i5 NUC and 1.64X faster on the i3 NUC, PK index scans are 1.88X faster on the i5 NUC and 1.96X faster on the i3 NUC.

Comparing InnoDB between 5.6.35 and 8.0.3: secondary index scans are 1.24X faster on the i5 NUC and 1.16X faster on the i3 NUC, PK index scans are 1.35X faster on the i5 NUC and 1.25X faster on the i3 NUC. Range scans are much faster for InnoDB starting in MySQL 5.7.

#seconds to scan an index, i3 NUC
1       2       3       4       5       2+3+4   engine
-       -       -       -       -       -----   ------
1349    422     198     208     200      828    5.0.96
1161    601     200     216     134     1017    5.1.72
 319    184     197     223     101      604    5.5.51
 292    181     194     212     127      587    5.6.35
 419    148     157     203      99      508    5.7.17
 403    147     154     204     102      505    8.0.3

#seconds to scan an index, i5 NUC
1       2       3       4       5       2+3+4   engine
-       -       -       -       -       -----   ------
2377    255     271     273     313     799     5.0.96
2477    266     283     286     399     835     5.1.72
 425    296     317     307     174     920     5.5.51
 475    290     302     314     225     906     5.6.35
 417    250     267     249     164     766     5.7.17
 420    227     260     245     167     732     8.0.3

Hardware efficiency metrics help to understand the performance improvements between MySQL 5.6.35 and 5.7.17. First are results for index scan #4, which is a secondary index. In MySQL 5.7.17 the CPU overhead per row read (Mcpu/o) is lower and that might explain why the storage read IO rate (rMB/s) is larger.

i3 NUC
secs    rMB/s   rKB/o   rGB     Mcpu/o  engine
212     35.2    0.031   7       13.584  5.6.35
203     37.8    0.031   7       12.552  5.7.17

i5 NUC
secs    rMB/s   rKB/o   rGB     Mcpu/o  engine
314     44.2    0.028   13      10.748  5.6.35
249     56.0    0.028   14       7.592  5.7.17

And then metrics for index scan #5, which is the second scan of the PK. The improvement in HW efficiency here is similar to the i3 NUC results above.

i3 NUC
secs    rMB/s   rKB/o   rGB     Mcpu/o  engine
127     130.5   0.067   16      16.560  5.6.35
 99     161.7   0.065   15       9.728  5.7.17

i5 NUC
secs    rMB/s   rKB/o   rGB     Mcpu/o  engine
225     147.0   0.067   32      14.732  5.6.35
164     198.1   0.067   32       8.900  5.7.17

Results: read-write, 1000 inserts/second

This section has results for the read-write tests where the writer does 1000 inserts/second. The graph has the query rate relative to the rate for MySQL 5.0.96. From MySQL 5.0.96 to 8.0.3 QPS increased by 24% on the i5 NUC and decreased by 22% on the i3 NUC. From MySQL 5.6.35 to 8.0.3 QPS increased by 2% on the i5 NUC and 18% on the i3 NUC. The results for MySQL 5.1.72 are an outlier in this test and in the test below with 100 inserts/second. It looks like InnoDB is able to cache the secondary indexes for 5.1.72, so it does less storage reads per query and the QPS is larger.
Additional metrics help explain the performance, although the results here are more confusing than for other tests. The metrics are explained here. The built-in InnoDB with MySQL 5.5.51 was unable to sustain the target insert rate on the i3 NUC. Write stalls were a problem for all releases based on the values in IPS.99 that are less than 999 and MySQL 5.5.51 was the worst. The CPU overhead per query (CPU/q) decreased starting with MySQL 5.6.35.

i3 NUC
        IPS.av  IPS.99  wKB/i   QPS.av  QPS.99  rKB/q   CPU/q
5.0.96  999     972     65.75    389      98     31.45  31282
5.1.72  999     961     63.98   4562    2149      0.26   6488
5.5.51  948     761     81.24    161      33     78.49  63049
5.6.35  999     977     67.92    258     156    101.61  37834
5.7.17  999     977     65.64    311     238     98.68  26829
8.0.3   999     978     65.97    303     230     99.50  27868

i5 NUC
        IPS.av  IPS.99  wKB/i   QPS.av  QPS.99  rKB/q   CPU/q
5.0.96  999     988     71.25    820     694    30.27   23455
5.1.72  999     994     65.48   5460    2619     0.25    5374
5.5.51  999     968     88.74    710     480    53.41   24597
5.6.35  999     987     73.09    992     626    59.45   16798
5.7.17  999     988     71.22   1042     738    61.07   16115
8.0.3   999     989     71.63   1019     722    61.36   16723

Results: read-write, 100 inserts/second

This section has results for the read-write tests where the writer does 100 inserts/second. The graph has the query rate relative to the rate for MySQL 5.0.96. From MySQL 5.0.96 to 8.0.3 QPS increased by 5% on the i5 NUC and decreased by 25% on the i3 NUC. From MySQL 5.6.35 to 8.0.3 QPS didn't change on the i5 NUC and increased by 6% on the i3 NUC. The QPS for 5.1.72 is an outlier. Similar to above, it looks like InnoDB cached the indexes in that setup and QPS was much larger.
Additional metrics help explain the performance. The metrics are explained hereAll of the releases sustained the target insert rate and write stalls were not a problem. The CPU overhead per query (CPU/q) decreased by 26% on the i5 NUC and 25% on the i3 NUC and that might explain the increase in the query rate.

i3 NUC
        IPS.av  IPS.99  wKB/i   QPS.av  QPS.99  rKB/q   CPU/q
5.0.96  100     100     122.04  1171     810    21.97   14084
5.1.72  100     100      95.08  4881    4707     0.01    5278
5.5.51  100     100     165.01   670     369    66.26   16998
5.6.35  100     100     126.74   827     534    66.49   13675
5.7.17  100     100     128.32   900     566    69.56   10269
8.0.3   100     100     129.58   883     547    69.86   10625

i5 NUC
        IPS.av  IPS.99  wKB/i   QPS.av  QPS.99  rKB/q   CPU/q
5.0.96  100     100     107.04  1267    1180    25.84   15237
5.1.72  100     100      91.49  5625    5455     0.01    4477
5.5.51  100     100     171.97  1139     918    51.67   14015
5.6.35  100     100     157.74  1334     951    50.42   11546
5.7.17  100     100     160.00  1383     996    52.50   10807
8.0.3   100     100     161.31  1334    1041    52.85   11333

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...