I updated this blog post on Sunday, Oct 5 after first publishing it on Saturday, Oct 4. I added results for all versions with the InnoDB adaptive hash index disabled. I also corrected results for the setup that used innodb_thread_concurrency as my first round of results didn't really enable it. All graphs have been regenerated. But I don't write much about the new results here. That waits for another blog post.
I did similar testing for MySQL 5.6 in 2013 and opened bugs 68825 (Apr 2013) and 69236 (May 2013). They are still open and the regressions still exist in 5.6 and 5.7. Those bug reports are not specific enough to describe all of the sources of regression. A lot more work is required to explain the problem. I have yet to do that analysis and it isn't clear to me that others have done it. While I hope to do some of the debugging I also hope to motivate upstream to expand their public benchmark effort beyond high-concurrency workloads (replication and low-concurrency are also critical). This can't be a one time effort. Potential regressions arrive with each new commit to the upstream source repo, but my guess is that most regressions come over many diffs rather than 1 bad diff. On the bright side I think it is possible to automate testing to search for the source of perf regressions.
I have 4 new bug reports so far for 5.7: 74198 for a bad query plan with the Heap engine, 74166 for a compiler error, 74167 for a problem with InnoDB and ext-3 (already fixed in 5.7.6) and 74216 for a documentation improvement. From these tests I think I found a few more performance regressions but will save that for the next blog post.
setup
I use mysqlslap as explained here and tested binaries for MySQL versions 5.0.85, 5.1.61, 5.5.40, 5.6.21 and 5.7.5 as explained here. I tested two additional configuration for each release - one to use innodb_thread_concurrency and the other to disable the adaptive hash index. The innodb_thread_concurrency tests are indicated by ${binary}-itc below and used innodb_thread_concurrency=32 and innodb_concurrency_tickets=500 to understand the impact on queries that need more than 500 tickets, like the LIMIT 1000 queries here. The other new setup used innodb_adaptive_hash_index=0 to disable the AHI and is indicated by ${binary}-noahi below.
Three different queries were tested and for each type of query the test was repeated with a LIMIT 10, LIMIT 100 and LIMIT 1000 clause to understand the impact from increasing the scan length. I only report results for the LIMIT 10 and LIMIT 1000 cases. The queries were:
- PK - scan PK index
- nocover-force - scan non-covering secondary index, fetch missing columns from PK
- cover-force - scan covering secondary index
This is the text for the queries:
# PK
select j from foo where i >= 100 order by i limit $N;
# nocover-force
select k from foo FORCE INDEX(`xj`) where j >= 100 order by j limit $N;
# cover-force
select k from foo FORCE INDEX(`xjk`) where j >= 100 order by j limit $N;
This is an example command line to run a test with mysqlslap:
mysqlslap --concurrency=$C -h127.0.0.1 --create-schema=test1 --number-of-queries=$Q --create=ddl.sql --query="select j from foo where i >= 1000 order by i limit $N"
results for LIMIT 10
This has results for queries that scan and return 10 rows. In the many graphs that follow the results use the normalized response time (time for query / base case) where the base case is the response time for MySQL 5.0.85 at 1 thread. Tests were repeated for 1, 4 and 32 threads. There are several goals:
These graphs show the normalized response times for the 10 row scan with 1 client. The two things that stand out for me are:
- Show that newer releases of MySQL have a normalized response time close to 1 for the 1-thread test or show the regression.
- Show that response time doesn't get a lot worse for 4 and 32 threads.
- Show that newer releases do better than 5.0.85 at 32 threads.
- Show the overhead from using the PS for short running queries.
results for LIMIT 10 at 1-thread
These graphs show the normalized response times for the 10 row scan with 1 client. The two things that stand out for me are:
- The regression from 5.6 to 5.7 is much larger than from 5.5 to 5.6. But it isn't that different from the 5.0 to 5.1 regression.
- The regression from using the PS in 5.5.40 is large. Fortunately it is smaller in 5.6 and 5.7.
results for LIMIT 10 at 4-threads
Results here are similar to the 1-thread case but there is one difference. Concurrency improvements in recent releases reduce the regression for the non-covering secondary query in the second slide below.
results for LIMIT 10 at 32-threads
Newer MySQL releases are much better than 5.0.85 at 32 threads. Here I use two graphs for each query and one truncates the response time graph to make it easier to see the differences between the newer releases. Note that problems at high concurrency exist in 5.0, 5.1 and 5.5. InnoDB took a huge step forward in 5.6 after taking a step backwards in 5.5.
Now the test switches to a range query that scans & fetches 1000 rows. For the PK and covering secondary scans there is a regression in 5.1 and new releases show steady regressions. For the non-covering index scan performance has not regressed much in new releases.
results for LIMIT 1000 at 4-threads
The LIMIT 1000 query is more sensitive to mutex contention than the LIMIT 10 query so new releases tend to do better than older ones, but there is a regression from 5.6 to 5.7. The non-covering secondary query is even more sensitive to mutex contention.
results for LIMIT 1000 at 32-threads
There must be replication lag in the Google spreadsheet client because the itc binaries below don't have results in the saved images even though they exist on my browser. New releases tend to do better than old releases. But there is a regression in 5.7 for the non-covering secondary index scan. More on that in my next blog post.
Thank you for your continued work in benchmarking MySQL.
ReplyDeleteI see that for Scan 1000, non-covering secondary, 1 thread, performance is indeed 2.67X the baseline. But 5.0.85 is around 2.44X, so 5.7 is not that bad, right?
The problem is non-trivial, and I think you are absolutely right that there is not one push, or not even a few pushes, that are to blame for the regression.
Our strategy in the optimizer group is to streamline the query processing, clean up interfaces, avoid double work, and avoid work that has to be rollbacked later. The main purpose of this work is to make the optimizer easier to maintain and extend, but there is also an important secondary goal of making the processing more efficient. I hope that we will see effects from that work soon, although when we add new features into the optimizer, it is challenging to retain old efficiency.
For the 1X tests, most regression is probably in the optimizer domain (which in this respect includes the parser, resolver, optimizer and executor), but for 1000X tests, the blame can probably be shared with InnoDB. But we need to run some perf tests to tell for sure...