Friday, June 30, 2017

One more time with sysbench, a small server & MySQL 5.6, 5.7 and 8.0

 Update - the regression isn't as bad as I have been reporting. Read this post to understand why.

The good news is that I hope to begin debugging this problem next week. After fixing a few problems to reduce variance I am repeating tests to document the performance regression from MySQL 5.6 to 8.0. The first problem was fixed by disabling turbo boost on my Intel NUC servers to avoid thermal throttling. The other problem was the impact from mutex contention for InnoDB purge threads and I repeated tests with it set to 1 and 4. This is part of my series on low-concurrency CPU regressions for bug 86215.

tl;dr for in-memory sysbench on a small server with a fast SSD
  • most of the regression is from 5.6.35 to 5.7.17, much less is from 5.7.1 to 8.0.1
  • innodb_purge_threads=4 costs 10% to 15% of the QPS for write-heavy tests
  • QPS is 30% less for 5.7.17 & 8.0.1 vs 5.6.35 on write-only tests
  • QPS is 30% to 40% less for 5.7.17 & 8.0.1 vs 5.6.35 on read-write tests
  • QPS is 40% to 50% less for 5.7.17 & 8.0.1 vs 5.6.35 on read-only tests
  • QPS is 40% less for 5.7.17 & 8.0.1 vs 5.6.35 for point-query
  • QPS is 30% less for 5.7.17 & 8.0.1 vs 5.6.35 for insert-only

Configuration

I tested MySQL with upstream 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I used the latin1 charset and latin1_swedish_ci collation. I used the i5 NUC servers described here and the my.cnf used are here. I run mysqld and the sysbench client on the same server. The binlog is enabled but sync-on-commit is disabled.

Sysbench is run with 4 tables and 1M rows per table. The database fits in the InnoDB buffer pool. My usage of sysbench is described here. That explains the helper scripts that invoke sysbench and collect performance metrics. When I return home I will update this with the sysbench command lines that are generated by my helper scripts.

Results: write-only

Sorry, no graphs this time. I run sysbench for 1, 2 and 4 concurrent clients and share both the QPS for each test and then the QPS for MySQL 5.7.17 and 8.0.1 relative to 5.6.35. The ratio is less than 1 when the QPS is larger for 5.6.35.

All of these tests are run with innodb_purge_threads=1 which is the default for 5.6.35. The default for 5.7.17 and 8.0.1 is 4.

The first batch of results is from write-only tests. Most of the QPS regression is from MySQL 5.6.35 to 5.7.17. Excluding the update-index test, going from 5.6 to 5.7 loses about 30% of QPS.

update-index : QPS
1       2       4       concurrency/engine
5806    9837    12354   inno5635
5270    8798    11677   inno5717
4909    8176    10917   inno801

update-index : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.91     .89     .95     inno5717
.85     .83     .88     inno801

update-nonindex : QPS
1       2       4       concurrency/engine
10435   15680   18487   inno5635
 7691   11497   14989   inno5717
 7179   10845   14186   inno801

update-nonindex : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.74     .73     .81     inno5717
.69     .69     .77     inno801

delete : QPS
1       2       4       concurrency/engine
19461   28797   35684   inno5635
13525   19937   25466   inno5717
12551   18810   24023   inno801

delete : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.69     .69     .71     inno5717
.64     .65     .67     inno801

write-only : QPS
1       2       4       concurrency/engine
16892   25376   30915   inno5635
11765   17239   22061   inno5717
10729   16108   20682   inno801

write-only : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.70     .68     .71     inno5717
.64     .63     .67     inno801

Results: read-write

The next batch of results is from the classic read-write OLTP sysbench test. But I repeat it using different sizes for the range query. The regression is larger here than for the write-only tests above perhaps because of the regression for range scans. Going from 5.6.35 to 5.7.17 loses between 30% and 40% of the QPS. The regression is worse for longer range scans.

read-write.range100 : QPS
1       2       4       concurrency/engine
11653   18109   25325   inno5635
 7520   10871   14498   inno5717
 6965   10274   14098   inno801

read-write.range100 : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.65     .60     .57     inno5717
.60     .57     .56     inno801

read-write.range10000 : QPS
1       2       4       concurrency/engine
337     604     849     inno5635
202     386     443     inno5717
200     378     436     inno801

read-write.range10000 : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.60     .64     .52     inno5717
.59     .63     .51     inno801

Results: read-only
The next batch of results is from the classic read-only OLTP sysbench test. But I repeat it using different sizes for the range query. Most of the regression is from 5.6.35 to 5.7.17. Going from 5.6 to 5.7 loses between 40% and 50% of the QPS so the regression here is larger than above for the read-write tests. There isn't a larger regression for larger range queries.

read-only.range10 : QPS
1       2       4       concurrency/engine
17372   30663   50570   inno5635
10829   19021   25874   inno5717
10171   18743   25713   inno801

read-only.range10 : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.62     .62     .51     inno5717
.59     .61     .51     inno801

read-only.range100 : QPS
1       2       4       concurrency/engine
11247   20922   32930   inno5635
 6815   12823   16225   inno5717
 6475   12308   15834   inno801

read-only.range100 : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.61     .61     .49     inno5717
.58     .59     .48     inno801

read-only.range1000 : QPS
1       2       4       concurrency/engine
2590    4840    6816    inno5635
1591    2979    3408    inno5717
1552    2918    3363    inno801

read-only.range1000 : QPS relatie to MySQL 5.6.35
1       2       4       concurrency/engine
.61     .62     .50     inno5717
.60     .60     .49     inno801

read-only.range10000 : QPS
1       2       4       concurrency/engine
273     497     686     inno5635
161     304     355     inno5717
159     299     350     inno801

read-only.range10000 : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.59     .61     .52     inno5717
.68     .60     .51     inno801

Results: point-query and insert-only

Finally results for the last two tests -- point-query and insert-only. MySQL 5.7.17 loses about 40% of the QPS for point-query and 30% of the QPS for insert-only compared to 5.6.35.

point-query : QPS
1       2       4       concurrency/engine
19674   36269   55266   inno5635
11964   22941   29174   inno5717
11624   20679   29271   inno801

point-query : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.61     .63     .53     inno5717
.59     .57     .53     inno801

insert : QPS
1       2       4       concurrency/engine
11288   16268   19355   inno5635
 7951   12176   15660   inno5717
 7493   11277   14857   inno801

insert : QPS relative to MySQL 5.6.35
1       2       4       concurrency/engine
.70     .75     .81     inno5717
.66     .69     .77     inno801

innodb_purge_threads

Finally I repeated tests with innodb_purge_threads=4 to show the impact from that. On a small server (2 cores, 4 HW threads) there is too much mutex from innodb_purge_threads=4. As 4 is the default for 5.7.17 and 8.0.1 they suffer more than 5.6.35 when the default is used. The results above are for innodb_purge_threads=1 and then I repeated the tests with it set to 4. Here I show the QPS with purge_threads=4 / QPS with purge_threads=1. For the tests below QPS is reduced by 10% to 15% when innodb_purge_threads=4 on a small server. The insert-only test doesn't suffer, but there isn't anything to purge from the insert-only workload.

update-index
1       2       4       concurrency/engine
.85     .76     .75     inno5635
.76     .76     .77     inno5717
.89     .96     .89     inno801

update-nonindex
1       2       4       concurrency/engine
.82     .78     .88     inno5635
.77     .79     .86     inno5717
.86     .95     .91     inno801

delete
1       2       4       concurrency/engine
.84     .81     .82     inno5635
.84     .81     .87     inno5717
.87     .92     .94     inno801

write-only
1       2       4       concurrency/engine
.89     .85     .85     inno5635
.88     .86     .87     inno5717
.91     .95     .94     inno801

insert
1       2       4       concurrency/engine
.99     .99     .99     inno5635
.99     1.00    1.00    inno5717
1.01    1.01    1.00    inno801

Wednesday, June 28, 2017

MyISAM, small servers and sysbench at low concurrency

Update - the regression isn't as bad as I have been reporting. Read this post to understand why.

I am a big fan of Intel NUC servers but recently noticed that CPU performance varied frequently from thermal throttling for my NUC7i5bnh.  To prevent this I disabled turbo boost for the CPUs. My old NUC servers (NUC5i3ryh) don't suffer from this because that CPU does not have turbo boost. Now I need to repeat many tests. Today I share results for a quick run of sysbench using MyISAM and InnoDB in MySQL 5.6, 5.7 and 8.0. This is part of my series on low-concurrency CPU regressions for bug 86215.

tl;dr for in-memory sysbench:
  • the QPS regression is from MySQL 5.6 to 5.7
  • the QPS regression is similar for MyISAM and InnoDB
  • the default value for innodb_purge_threads, which is 4, can cause too much mutex contention and a loss in QPS on small servers. For sysbench update-only I lose 25% of updates/second with 5.7.17 and 15% with 8.0.1 when going from innodb_purge_threads=1 to =4.

Configuration

I tested MySQL with upstream 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I used the latin1 charset and latin1_swedish_ci collation. I used the i5 NUC servers described here and the my.cnf used are here. I run mysqld and the sysbench client on the same server. The binlog is enabled but sync-on-commit is disabled. Sysbench is run with 4 tables and 1M rows per table.

My usage of sysbench is described here but in this case I ran a shorter version of the test and here are the command lines for MyISAM and for InnoDB. I used 1 table with 1M rows for an in-memory workload and ran update-index, update-nonindex, read-only with 10 to 10,000 rows and then point-query. Tests were run for 1 client and 3 minutes duration. I did this to quickly get results for MyISAM and InnoDB.

I noticed mutex contention from InnoDB purge threads and had been using the default value (innodb_purge_threads=4) so I repeated tests with innodb_purge_threads=1.

Results

In the tables below for MyISAM: 5635 is 5.6.35, 5717 is 5.7.17, 801 is 8.0.1. In the tables below for InnoDB: 5635 is 5.6.35, 5717-pX is 5.7.17 with innodb_purge_threads set to X, 801-pX is 8.0.1 with innodb_purge_threads set to X.

The first table shows the QPS for MyISAM.

 5635    5717     801   release/test
11143    7995    7692   update-index
12572    8743    8106   update-nonindex
17177   10565   10403   read-only.range10
 9994    6287    6337   read-only.range100
 2088    1351    1299   read-only.range1000
  208     142     145   read-only.range10000
20369   12106   11177   point-query

The next table shows the QPS for MyISAM relative to MyISAM in MySQL 5.6.35. Most of the QPS regression is from MySQL 5.6 to 5.7.

5635   5717     801     release/test
1.00    .72     .69     update-index
1.00    .70     .64     update-nonindex
1.00    .62     .61     read-only.range10
1.00    .63     .63     read-only.range100
1.00    .65     .62     read-only.range1000
1.00    .68     .70     read-only.range10000
1.00    .59     .55     point-query

The next table shows the QPS for InnoDB.

5635    5717-p4 5717-p1 801-p4  801-p1
 5723    3949    5265    4434    4902   update-index
10565    5975    7770    6243    7287   update-nonindex
17408   11522   11020   10492   10717   read-only.range10
11262    6947    6887    6513    6726   read-only.range100
 2611    1578    1576    1577    1627   read-only.range1000
  277     163     166     161     164   read-only.range10000
20406   11671   11350   10978   11348   point-query

The last table shows the QPS for InnoDB relative to InnoDB in MySQL 5.6.35. Most of the QPS regression is from MySQL 5.6 to 5.7. For the update tests, InnoDB loses about 25% of QPS in MySQL 5.7.17 and about 15% in MySQL 8.0.1 with the default value of innodb_purge_threads, which is 4. Note that this server has 2 CPU cores and 4 hardware threads.

5635    5717-p4 5717-p1 801-p4  801-p1

1.00    .69     .92     .77     .86     update-index
1.00    .57     .74     .59     .69     update-nonindex
1.00    .66     .63     .60     .62     read-only.range10
1.00    .62     .61     .58     .60     read-only.range100
1.00    .60     .60     .60     .62     read-only.range1000
1.00    .59     .60     .58     .59     read-only.range10000


Wednesday, June 21, 2017

Linux perf and the CPU regression in MySQL 5.7

Update - the regression isn't as bad as I have been reporting. Read this post to understand why.

I used Linux perf to get more details on system performance while running the point-query test with modern sysbench. This is for an in-memory workload and part of my series on low-concurrency CPU regressions for bug 86215.

tl;dr
  • I have more explaining to do
  • The increase in instructions/query explains the decrease in queries/second from MySQL 5.6 to 5.7 to 8.0.

Configuration

I tested MySQL using upstream 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I used the latin1 charset and latin1_swedish_ci collation. My servers are described here and are named i3 NUC and i5 NUC.

My usage of sysbench, including the tests run, and the my.cnf used for each database engine is described here for MySQL 5.6, 5.7 and 8. The my.cnf files I share were for the i3 NUC. For the i5 NUC the InnoDB buffer pool and IO capacity options were increased using these values. The my.cnf files are here for 5.05.1 and 5.5. I use the same server for mysqld and the sysbench clients. The binlog is enabled but sync-on-commit is disabled. Sysbench is run with 4 tables and 1M rows per table.

After loading the tables via the sysbench prepare option I ran the update-index test for a few minutes and then ran point-query with 1 client. After letting point-query warm up for 60 seconds I used this script to collect data from Linux perf. Output from Linux perf is here:

Results

The charts below show the queries/second and instructions/query for each server/release with the point-query sysbench test and 1 client. For the i3 NUC the regression is from MySQL 5.7.17 to 8.0.1. For the i5 NUC it is from 5.6.35 to 5.7.17. Hopefully I will explain why the results are different between the i3 and i5 NUC servers. I am still collecting results from the CPUs I use at work and they don't match what I report here. So I have some explaining to do.

But it is clear to me that the increase in instructions/query explains the decrease in queries/second. Note that I run the sysbench client on the same host as mysqld so the instructions/query overhead includes the sysbench client and the real regression from MySQL 5.6 to 8.0 would be larger were that excluded.

Bar

Monday, June 19, 2017

Impact of perf schema on sysbench at low concurrency

Update - the regression isn't as bad as I have been reporting. Read this post to understand why.

As I document the low-concurrency CPU regressions in MySQL 5.7 and 8 one of the questions is whether the performance schema is part of the problem. I don't think it is as the worst-case overhead I measure is about 7% and the typical overhead is less than 5% for a single-threaded & in-memory workload.

Eventually I will repeat this test for workloads with more concurrency because there are overheads that won't be apparent at low-concurrency. The overhead is larger for simple queries, like sysbench point-query, and larger for complex queries. The overhead is also larger for in-memory workloads.

tl;dr - it isn't the perf schema

Configuration

I tested MySQL using upstream 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I used the latin1 charset and latin1_swedish_ci collation. My servers are described here but I only share results for the i5 NUC.

I enabled support for the performance schema at compile time. When tests were run I set performance_schema to 1 to enable it and to 0 to disable it. Nothing else was set in my.cnf for the perf schema.

My usage of sysbench, including the tests run, and the my.cnf used for each database engine is described here for MySQL 5.6, 5.7 and 8. The my.cnf files I share were for the i3 NUC. For the i5 NUC the InnoDB buffer pool and IO capacity options were increased using these values. I use the same server for mysqld and the sysbench clients. The binlog is enabled but sync-on-commit is disabled. Sysbench is run with 4 tables and 1M rows per table for 1, 2 and 4 concurrent clients. The database fits in the InnoDB buffer pool.

Results

The QPS for all tests is here. The next table is QPS relative to MySQL 5.6.35 with the perf schema enabled for each engine/configuration. For each release there isn't much difference between enabling (*-ps) and disabling (*-nops) the perf schema.

legend:
* 56-ps, 56-nops - 5.6.35 with and without perf schema
* 57-ps, 57-nops - 5.7.17 with and without perf schema
* 80-ps, 80-nops - 8.0.1 with and without perf schema

56-ps   56-nops 57-ps   57-nops 80-ps   80-nops release/test
1.00    1.02    0.66    0.68    0.73    0.76    update-index
1.00    1.02    0.57    0.59    0.61    0.64    update-nonindex
1.00    1.03    0.59    0.61    0.58    0.60    read-write.range100
1.00    1.03    0.59    0.59    0.58    0.58    read-write.range10000
1.00    1.05    0.61    0.61    0.57    0.60    read-only.range10
1.00    1.03    0.60    0.60    0.60    0.60    read-only.range10000
1.00    1.05    0.60    0.59    0.56    0.60    point-query
1.00    1.02    0.69    0.73    0.67    0.68    insert

The next table is the relative QPS for MySQL version X without the perf schema relative to the QPS for version X with the perf schema. It shows the relative the gain in QPS from disabling the perf schema for a given release. The maximum gain is about 7% but in most cases it is less than 5%. So the perf schema overhead doesn't explain the CPU regression from MySQL 5.6 to 8. It isn't a surprise that the largest overhead occurs for the point-query test because that test has the least complex queries.

5.6     5.7     8.0     release/test
1.02    1.03    1.04    update-index
1.02    1.03    1.05    update-nonindex
1.03    1.02    1.03    read-write.range100
1.03    1.00    1.00    read-write.range10000
1.05    1.00    1.05    read-only.range10
1.03    1.01    1.00    read-only.range10000
1.05    0.99    1.07    point-query
1.02    1.06    1.02    insert

Charts

Finally I have charts of the QPS for all tests. I don't have commentary for each chart. For the results below I use -ps for configurations when the perf schema was enabled and -nops when it was disabled.

Friday, June 16, 2017

Sysbench for MySQL 5.0, 5.1, 5.5, 5.6, 5.7 and 8

Update - the regression isn't as bad as I have been reporting. Read this post to understand why.

After sharing results for in-memory sysbench with MySQL 5.6, 5.7 and 8 I was curious about older releases and here I have results for MySQL 5.0, 5.1 and 5.5 in addition to 5.6, 5.7 and 8. This is one more result in my series on low-concurrency performance regressions.

tl;dr
  • MySQL 4.1 and 5.5 weren't great releases for performance. I skipped both in production.
  • The biggest drop in QPS between releases is from 5.6 to 5.7 and that drop often exceeds the drop from 5.0 to 5.6. What happened? Bug 86215 is open for this.

Configuration

I tested MySQL using upstream 5.0.96, 5.1.72, 5.5.51, 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I used the latin1 charset and latin1_swedish_ci collation. My servers are described here but I only share results for the i5 NUC.

I was able to compile and run MySQL 4.1.22 on the same server but have yet to share the results. The results weren't good and that matches my memory of 4.1 not being a great release. MySQL 4.0 was an awesome release but I have yet to get it running on Ubuntu 16.04 with gcc 4.7 or 4.8. There are segfaults soon after startup.

My usage of sysbench, including the tests run, and the my.cnf used for each database engine is described here for MySQL 5.6, 5.7 and 8. The my.cnf files I share were for the i3 NUC. For the i5 NUC the InnoDB buffer pool and IO capacity options were increased using these values. The my.cnf files are here for 5.0, 5.1 and 5.5. I use the same server for mysqld and the sysbench clients. The binlog is enabled but sync-on-commit is disabled. Sysbench is run with 4 tables and 1M rows per table for 1, 2 and 4 concurrent clients. The database fits in the InnoDB buffer pool.

Results

The QPS for all tests is here. Charts for some of the tests are below.

The table below lists the QPS relative to MySQL 5.0 for each test. When the value is 0.53 (see update-index for MySQL 8) then MySQL 8 gets 53% of the QPS compared to MySQL 5.0 and 5.0 is almost 2X faster. As reported in previous posts, the regression from 5.6 to 5.7 is large. Fortunately that performance loss hasn't been repeated from 5.7 to 8.

This gives me hope. At first I thought the problem was a steady loss of performance in each major release as features are added and code paths get longer. But now it looks like most of the problem arrived with MySQL 5.7. Maybe we can fix this.

5.0     5.1     5.5     5.6     5.7     8       release/test
----    ----    ----    ----    ----    ----
1.00    0.89    0.73    0.73    0.49    0.53    update-index
1.00    0.89    0.59    0.72    0.41    0.44    update-nonindex
1.00    0.92    0.99    1.04    0.62    0.60    read-write.range100
1.00    0.95    0.92    0.85    0.50    0.49    read-write.range10000
1.00    0.92    0.89    0.92    0.56    0.52    read-only.range10
1.00    0.95    0.92    0.84    0.50    0.50    read-only.range10000
1.00    0.85    0.77    0.76    0.45    0.42    point-query
1.00    0.95    0.93    1.22    0.84    0.82    insert

Charts

For update-index the biggest drop in QPS is from 5.6 to 5.7.
For update-nonindex the biggest drop in QPS is from 5.6 to 5.7. There is also a big drop from 5.1 to 5.5, but that is fixed in 5.6. Looks like 5.5 was a lousy release for performance.
For read-write.range100 the biggest drop in QPS is from 5.6 to 5.7.
For read-write.range10000 the biggest drop in QPS was from 5.6 to 5.7.
For read-only.range10 the biggest drop in QPS is from 5.6 to 5.7.
For read-only.range10000 the biggest drop in QPS is from 5.6 to 5.7.
For point-query the biggest drop in QPS is from 5.6 to 5.7.
For insert the QPS regression is small.

Monday, June 12, 2017

Linkbench, IO-bound & Intel NUC for MySQL 5.6, 5.7 & 8

Update - the regression isn't as bad as I have been reporting. Read this post to understand why.

Next up in my series on low-concurrency performance is Linkbench on Intel NUC servers with an IO-bound workload. The working set is larger than RAM and there many reads from and writes to storage. This is part of my work on bug 86215 and will be my last blog post on the topic for a few weeks. For more on Linkbench read the Facebook post and paper and my latest post.

tl;dr
  • For the i5 NUC at least 2/3 of the regression is from MySQL 5.6 to 5.7 and the remainder from 5.7 to 8. Most of this is explained by more CPU overhead in 5.7 and 8.
  • For the i3 NUC there is more variance, but that has a slower and older CPU that is less interesting to me. At this point I think the i3 NUC is useful to test builds and run MTR but not to test performance.

Configuration

I tested MySQL using upstream 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I repeated the test: first with the default charset/collation and then with latain1/latin1_swedish_ci.

My servers are described here. The charts below refer to the servers as i3 NUC and i5 NUC. Compared to the i3 NUC, the i5 NUC is newer, has a CPU that is 2X faster, 2X more RAM, 2X more storage and storage that is more than 2X faster.

The my.cnf used for each database engine is described here. The my.cnf files I share were for the i3 NUC. For the i5 NUC the InnoDB buffer pool and IO capacity options were increased using these values. I use the same server for mysqld and the sysbench clients. The binlog is enabled but sync-on-commit is disabled.

For this test I used maxid1=40M in the Linkbench configuration for the i3 NUC and maxid1=80M for the i5 NUC. I use Linkbench from here and helper scripts from here. My usage of Linkbench is described here. A sample command line to run the tests is:
bash all.sh rx ~/bin/mysql /data/m/data 40000001 sdb 1 3600 mysql lb.sql.inno 24 127.0.0.1 1
Results

The first pair of charts shows the absolute and relative insert rates during the load test. For the i3 NUC there is a regression from MySQL 5.6 to 5.7 but not to 8. Results for many of my tests have been less predictable on the i3 NUC perhaps because it has a much slower and older CPU. For the i5 NUC most of the regression is from MySQL 5.6 to 5.7.

The next pair of charts show the absolute value for CPU per insert followed by the inverse of the relative rates. The absolute value comes from the Mcpu/t column in my performance data. The i3 NUC results continue to be odd for MySQL 8. For the i5 NUC most of the CPU regression is from MySQL 5.6 to 5.7. For the i5 NUC the inverse of CPU per insert values are similar to the relative insert rates above and the CPU regression explains the drop in insert rates from MySQL 5.6 to 5.7 and 8.


The next pair of charts show the absolute transactions/second rates and the the rates relative to the value for MySQL 5.6. These are from the 24th hour of the transaction test. For the i3 NUC about 1/3 of the regression is from MySQL 5.6 to 5.7. For the i5 NUC about 2/3 of the regression is from MySQL 5.6 to 5.7.


The final pair of charts shows the absolute value for CPU per transaction followed by the inverse of the value relative to MySQL 5.6. These are from the 24th hour of the transaction test. The absolute value is from the Mcpu/t column in the performance data. For the i3 NUC about 1/3 of the CPU regression is from MySQL 5.6 to 5.7 and for the i5 NUC most of the regression is from MySQL 5.6 to 5.7. The second chart shows that the inverse of the relative CPU predicts the transaction rate and the CPU regression explains the transaction rate regression.


Metrics

All of the performance data is here and explained by a previous post.

Saturday, June 10, 2017

Summary of recent performance tests for MySQL 5.6, 5.7 and 8

Update - the regression isn't as bad as I have been reporting. Read this post to understand why.

I have been reporting on low-concurrency performance regressions in MySQL for a few years and recently published many reports to compare MySQL 5.6.35, 5.7.17 and 8.0.1 using Intel NUC servers. This is tracked by bug 86215. My summary of the recent tests is:
  • The problem is not that there is a CPU regression from MySQL 5.6 to 5.7 to 8 as that is expected. The problem is that the regression is too large. In the worst case, I have MySQL 5.6 gets up to 2X more QPS than 5.7 and 8 using sysbench. In a typical case MySQL 5.6 gets 1.2X to 1.3X more QPS than 5.7 and 8.
  • Most of the slowdown is from MySQL 5.6 to 5.7 and less of the problem is from 5.7 to 8. I think this is good news. This is based on my results from the i5 NUC.
  • My team is committed to making this better. I hope that upstream is too. One day the big Percona Live presentation from upstream will include benchmark results for MySQL at low concurrency in addition to the results we always get for extremely high concurrency.
  • If you publish results from tests for N configurations, you will always get a request for testing one more configuration. So keep N small and save energy for the followup. But my hope is that we learn something from the tests that I did, rather than ask for more tests. A more clever person would run tests for N configurations, initially share results for N-2 and then when asked for a few more configs wait a day and share the final two. Alas, I am not clever.
Background reading

My recent performance reports are here:
  • May 2017 - Sysbench, in-memory & Intel NUC
  • May 2017 - Sysbench, IO-bound & Intel NUC for MySQL 5.6, 5.7 & 8
  • June 2017 - Insert benchmark, in-memory & Intel NUC for MySQL 5.6, 5.7 and 8
  • June 2017 - Insert benchmark, IO-bound & Intel NUC for MySQL 5.6, 5.7 & 8
  • June 2017 - Linkbench, in-memory & Intel NUC for MySQL 5.6, 5.7 & 8
  • June 2017 - Linkbench, IO-bound & Intel NUC for MySQL 5.6, 5.7 & 8
  • June 2017 - Sysbench for MySQL 5.0, 5.1, 5.5, 5.6, 5.7 and 8
  • June 2017 - Impact of perf schema on sysbench at low concurrency
  • June 2017 - Linux perf and the CPU regression in MySQL 5.7
  • June 2017 - MyISAM, small servers and sysbench at low concurrency
  • June 2017 - One more time with sysbench, a small server & MySQL 5.7, 5.7 and 8.0
  • July 2017 - I overstated the CPU regression in MySQL 5.7

Friday, June 9, 2017

Linkbench, in-memory & Intel NUC for MySQL 5.6, 5.7 & 8

Update - the regression isn't as bad as I have been reporting. Read this post to understand why.

Next up in my series on low-concurrency performance is Linkbench on Intel NUC servers with an in-memory workload. The database fits in RAM and there are no reads from but many writes to storage. This is part of my work on bug 86215. One day I will stop running tests and start working on code improvements but not yet. For more on Linkbench read the Facebook post and paper and my latest post.

tl;dr
  • MySQL 5.7 and 8 get about 80% of insert rate and 60% of the transaction rate compared to MySQL 5.6 in the worst case at low concurrency. This is explained by an increase in the CPU cost per insert and per transaction. I assume more code & more features is the cause.
  • For the i3 NUC some of the regression is from MySQL 5.6 to 5.7 and some is from 5.7 to 8.
  • For the i5 NUC most of the regression is from MySQL 5.7 to 8.

Configuration

I tested MySQL using upstream 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I repeated the test: first with the default charset/collation and then with latain1/latin1_bin for the i3 NUC and latin1/latin1_swedish_ci for the i5 NUC.

My servers are described here. The charts below refer to the servers as i3 NUC and i5 NUC. Compared to the i3 NUC, the i5 NUC is newer, has a CPU that is 2X faster, 2X more RAM and storage that is more than 2X faster.

The my.cnf used for each database engine is described here. The my.cnf files I share were for the i3 NUC. For the i5 NUC the InnoDB buffer pool and IO capacity options were increased using these values. I use the same server for mysqld and the sysbench clients. The binlog is enabled but sync-on-commit is disabled.

For this test I used maxid1=2M in the Linkbench configuration, about 2M nodes will be loaded and the InnoDB database is about 3gb. I use Linkbench from here and helper scripts from here. My usage of Linkbench is described here. A sample command line to run the tests is:
bash all.sh rx ~/bin/mysql /data/m/data 2000001 sdb 1 3600 mysql lb.sql.inno 24 127.0.0.1 1
Results

The first pair of charts shows the absolute and relative insert rates during the load test. Insert rates drop from MySQL 5.6 to 5.7 and 8. For the i3 NUC about half of the regression is from MySQL 5.6 to 5.7. For the i5 NUC about 2/3 of the regression is from MySQL 5.6 to 5.7.

The next pair of charts show the absolute value for CPU per insert followed by the inverse of the relative rates. The absolute value comes from the Mcpu/t column in my performance data. More CPU is used in MySQL 5.7 and 8 compared to 5.6. For the i3 NUC about 1/3 of the CPU regression is from MySQL 5.6 to 5.7. For the i5 NUC about 2/3 of the CPU regression is from MySQL 5.6 to 5.7. The inverse of CPU per insert values are similar to the relative insert rates above and the CPU regression explains the drop in insert rates from MySQL 5.6 to 5.7 and 8.

The next pair of charts show the absolute transactions/second rates and the the rates relative to the value for MySQL 5.6. These are from the 24th hour of the transaction test. For the i3 NUC about 1/2 of the regression is from MySQL 5.6 to 5.7. For the i5 NUC most of the regression is from MySQL 5.6 to 5.7.


The final pair of charts shows the absolute value for CPU per transaction followed by the inverse of the value relative to MySQL 5.6. These are from the 24th hour of the transaction test. The absolute value is from the Mcpu/t column in the performance data. The first chart shows that for the i3 NUC about 1/2 of the CPU regression is from MySQL 5.6 to 5.7 and for the i5 NUC most of the regression is from MySQL 5.6 to 5.7. The second chart shows that the inverse of the relative CPU predicts the transaction rate and the CPU regression explains the transaction rate regression.

Metrics

All of the performance data is here and explained by a previous post.

All about Linkbench

Linkbench is a benchmark for social graph transaction processing. It lead to a paper, a post, benchmark client in Java and many blog posts from me. As a bonus, I also got to know the person who did the work for it - thanks Tim. The benchmark client has moved to my Github account because the upstream project has been archived (no comment). The repo is not active but someone recently added support for Postgres, someone else added support for multiple schemas and I fixed it to work with MySQL 8. Percona forked it to add support for MongoDB but that really needs proper support for multi-document transactions in MongoDB.

I run Linkbench via helper scripts. These include files to create tables for MyRocks and InnoDB. There are two phases Linkbench: load and transaction. My helper scripts run the load phase and then transactions. From past experience results become stable after a few hours of transactions so I run it for 24 hours in 1-hour loops and compute metrics per hour to understand whether performance and efficiency change over time. My helper script all.sh uses load.sh and run.sh to do that while also computing hardware efficiency metrics. A typical command line for a small database is below. The last argument is the number of clients to concurrent load link and count tables while the load of the node table is always done by 1 thread So when the last argument is 1 there will be 2 concurrent clients for most of the load (1 for link/count and 1 for node). Note that the load of the link and count tables can finish before the load of the node table.
bash all.sh rx ~/bin/mysql /data/m/data 2000001 sdb 1 3600 mysql lb.sql.inno 24 127.0.0.1 1
Load phase

The load.sh script creates a result summary file with the name l.r.$something and an example is here. I extract the most interesting numbers from that file into a one-line summary that I archive. An example summary line is listed below. The legend for the results is:
  • ips - average insert rate for the link table. This is less than the real insert rate during the test because it doesn't count inserts to the count and node tables. Don't forget this when looking at the results below that are divided by ips.
  • r/i, rkb/i, wkb/i - iostat r/s, iostat rKB/s and iostat wKB/s divided by ips.
  • Mcpu/i - vmstat us + sy columns divided by ips and then multiplied by a constant. Note that this value can only be compared between servers using the same CPU.
  • size - database size in GB at the end of the load.
  • rss - mysqld RSS size in GB at the end of the load.
  • wMB/s - average of iostat wMB/s during the load. I really need to update this script to compute r/s and rMB/s.
  • cpu - average of vmstat us + sy columns during the test
  • engine - notes on the storage engine and configuration

ips    r/i    rkb/i  wkb/i  Mcpu/i  size   rss    wMB/s  cpu    engine
29810  0      0      1.33   1311    2.8g   0.17   39.6   39.1   MyRocks

Transaction phase

The run.sh script runs transaction phases in a loop for a fixed amount of time per loop and creates results summary files with the name r.r.$something.L$n.P$x where $n is the loop number. When it runs with 16 clients for 24 1-hour loops then there will be files like r.r.$something.L1.P16, r.r.$something.L2.P16, etc. An example is here.

I extract interesting results from that file to create a one line summary as shown below. The legend for the results is:
  • tps - average transaction rate.
  • r/t, rkb/t, wkb/t - iostat r/s, iostat rKB/s and iotat wKB/s divided by tps.
  • Mcpu/t - vmstat us and sy columns divided by tps and then multipled by a constant. Note that this value can only be compared between servers using the same CPU.
  • size - database size in GB at test end.
  • rss - mysqld RSS size in GB at test end.
  • un, gn, ul, gl - 99th percentile response time in milliseconds for the most frequent transactions: un is UPDATE_NODE, gn is GET_NODE, ul is UPDATE_LINK, gl is GET_LINKS_LIST. The types of transactions are explained in the paper and implemented for MySQL in LinkStoreMysql.java.
  • r/s, rMB/s, wMB/s - average for iostat r/s, rMB/s and wMB/s during the test.
  • cpu - average of vmstat us + sy columns during the test
  • engine - notes on the storage engine and configuration

tps   r/t   rkb/t wkb/t Mcpu/t  size  rss   un    gn   ul   gl   r/s   rMB/s wMB/s cpu   engine
3868  0      0    1.41  8154    3.0g  4.04  0.2   0.1  0.6  0.5  10.4  0     5.4   31.5  MyRocks

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...