Wednesday, December 30, 2020

Sysbench: in-memory InnoDB in MySQL 5.6, 5.7 and 8.0

This post has results for InnoDB in MySQL 5.6, 5.7 and 8.0 using an in-memory & low-concurrency workload via sysbench. The setup is the same as I described for MyRocks. The goal is to understand where performance changes across releases and the in-memory workloads make it easier to spot CPU overhead.

I will repeat a disclaimer from my previous post, these results are from microbenchmarks. They are useful but be careful to assume these results will reproduce for your workload, especially if your queries are more complex than those used by sysbench. Regardless, it is good for users and the environment to be careful about CPU efficiency in new versions of a popular DBMS.

Updated results with improved my.cnf options are here.

Summary:

  • Things that might motivate a repeat
    • Change my scripts so that InnoDB writeback is done before starting the tests.
    • Set innodb_purge_threads=1 as the default value (4) caused mutex contention in the past on my NUC servers that have 4 CPU cores.
    • Debug whether the new InnoDB redo log code wastes CPU on my small servers. It did in early 8.0 versions.
  • Most of the performance regression from 5.6 to 8.0 is from code above the storage engine layer
  • In most cases, 8.0 gets between 70% and 80% of the throughput vs 5.6
  • 8.0 is better than 5.6 at longer range scans but worse at shorter ones
  • 8.0 is a lot worse for point-queries on secondary indexes (5.6 gets 2X more QPS). Most of this arrived in 5.7, but 8.0 made it worse.
  • Filed bug 102037 because optimizer overhead is too large with large in-list SELECTs

Overview

I tested MySQL versions 5.6.49, 5.7.31, and 8.0.17 through 8.0.22. As much as possible, I used the same my.cnf options for all versions. The my.cnf files are in the per-version directories in github and here is my.cnf for 8.0.22. I used the my.cnf options that were good for the insert benchmark (the cx6d configuration). For 8.0 ssl is disabled and the same charset/collation is used for all versions. Fsync is not done on commit to avoid becoming an fsync latency benchmark.

Results

The tests are in 4 groups based on the sequence in which they are run: read-only run before write-heavy, write-heavy, read-only run after write-heavy and insert/delete. 

I have scripts that generate 3 summaries -- absolute throughput, relative throughput and HW efficiency. Absolute throughput is the QPS/TPS that a test gets for 1, 2, 3 and 4 clients. Relative throughput is the QPS/TPS relative to the base case (MySQL 5.6.49). The HW efficiency report has absolute and relative results (base case is MySQL 5.6.49) for CPU and IO per operation.

I use ratios to explain performance. In this case MySQL 5.6.49 is the denominator and the numerator is the value for 5.7 or 8.0. A QPS ratio < 1 means the new version is slower. For HW efficiency I use CPU/operation and IO/operation (read & write). For CPU and IO per operation a ratio > 1 means the new version uses more CPU or IO per query.

The results are in github for absolute throughput, relative throughput and HW efficiency. I annotate the relative throughput and HW efficiency results below. Unless called out, I am explaining the results for 8.0.22 relative to 5.6.49. 

Read-only before write-heavy:
  • QPS ratios are 0.76, 0.71, 0.59, 0.24 for the first 4 tests (up to here)
    • These do point queries
    • CPU/query ratios are: 1.36, 1.44, 1.78, 4.08. Up to here.
    • The large regressions (0.59, 0.24) are from bug 102037 which is new in 8.0.22. For 8.0.21 the QPS ratios are 0.74, 0.70, 0.74, 0.89 and CPU/query ratios are 1.46, 1.51, 1.44, 1.16.
    • MySQL 5.6 was able to finish InnoDB write-back prior to these tests but it was still in progress for 5.7 until the fourth test (above) and for 8.0 until the update-only tests (below). I assume that 5.6 is more willing to do furious flushing while 5.7 and 8.0 are not. That is neither good nor bad but makes it harder to tune across versions. Given that writeback is in progress for 8.0 the CPU overhead in the Read-only before write-heavy section will be overstated for 8.0. Fortunately the results in the Read-only after write-heavy section should still be truthy.
  • QPS ratios are 0.74, 1.03, 1.26 for the next 3 tests (here to here
    • These have the range scans from oltp_read_write.lua with ranges of size 10, 100 & 10,000
    • CPU/query ratios are 1.38, 1.02, 0.83 (here to here). Long scans are better in 8.0 vs 5.6, but worse in 8.0 vs 5.7.
  • QPS ratios are 0.71, 0.71 for the next 2 tests (here to here)
    • These do point queries via in-lists that are covering and then not covering for the primary index. 
    • CPU/query ratios are 1.45, 1.44 (here to here).
  • QPS ratios are 0.47, 0.49 for the next 2 tests (here to here)
    • These are similar to the previous test, but use the secondary index. The regression here is larger than for the PK index and CPU/query is the problem. The regression is in 5.7 and 8.0.
    • CPU/query ratios are 2.06, 1.98 (here to here)
  • QPS ratios are 0.74, 0.73 for the next 2 tests (here to here)
    • These do range queries that are covering and then not covering for the primary index
    • CPU/query ratios are 1.38, 1.39 (here to here)
  • QPS ratios are 0.75, 0.67 for the next 2 tests (here to here)
    • These are similar to the previous test, but use the secondary index. 
    • CPU/query ratios are 1.35, 1.52 (here to here)
Write-heavy
  • QPS ratios are 0.99, 1.61, 0.75, 0.70, 0.74 for the next 5 tests (here to here)
    • These are update-only
    • CPU/statement ratios are 2.01, 1.43, 2.08, 2.22, 2.21 (here to here). 
    • 8.0 is faster than 5.6 for update-index (QPS ratio is 1.61), but slower in the other tests except for update-inlist. 8.0.17 had problems but perf has improved since then. The wKB/o results would be easier to explain if each test ran longer than 90 seconds, otherwise the writes done during one test might have been triggered by a previous test. 8.0 is almost as fast as 5.6 (QPS ratio is 0.99) for update-inlist. I assume because the in-list is large enough that the overhead is dominated by the storage engine (and not code above the engine).
  • QPS ratio is 0.88 for the next test, write-only. See here.
    • This has the writes from oltp_read_write.lua. 
    • CPU/transaction ratio is 1.94. See here.
  • QPS ratios are 0.73, 0.85 for the next two tests, read-write (here to here)
    • These are the traditional sysbench tests (oltp_read_write.lua) with ranges of size 10 and 100
    • CPU/transaction ratios are 1.32, 1.22 (here to here)
Read-only after write-heavy includes tests that were run before write-heavy:
  • QPS ratio is 0.77, 1.02, 1.24 for the next 3 tests, read-only (here to here)
    • These have the queries (range scans) from oltp_read_write.lua with ranges of size 10, 100 and 10,000. Results here are similar to Read-only before write-heavy and 8.0 is better than 5.6 at long range scans, but worse at shorter ones. 
    • CPU/transaction ratios are 1.21, 1.03, 0.85 (here to here)
  • QPS ratios are 0.77, 0.71, 0.59, 0.24, 0.49 for the next 5 tests (here to here)
    • These do a variety of point queries. The first 4 were run in Read-only before write heavy, and results here are similar.
    • CPU/query ratios are 1.34, 1.44, 1.78, 4.07, 1.95 (here to here)
    • The large regressions in QPS ratios (0.59, 0.24, 0.49) are from bug 102037 which is new in 8.0.22. For 8.0.21 the QPS ratios are 0.74, 0.71, 0.74, 0.90, 0.62 and the CPU/query ratios are 1.43, 1.50, 1.45, 1.15, 1.57.
  • QPS ratios are 0.71, 0.71 for the next 2 tests (here to here)
    • These do point queries via in-lists that are covering and then not covering for the primary index. Results here are similar to Read-only before write-heavy.
    • CPU/query ratios are 1.45, 1.45 (here to here)
  • QPS ratios are 0.48, 0.50 for the next 2 tests (here to here)
    • These are similar to the previous test, but use the secondary index. Results here are similar to Read-only before write-heavy
    • CPU/query ratios are 2.05, 1.97 (here to here). 
  • QPS ratios are 0.73, 0.73 for the next 2 tests (here to here)
    • These do range queries that are covering and then not covering for the primary index
    • Results here are similar to Read-only before write-heavy.
    • CPU/query ratios are 1.42, 1.41 (here to here)
  • QPS ratios are 0.75, 0.67 for the next 2 tests (here to here)
    • These are similar to the previous test, but use the secondary index. Results here are similar to Read-only before write-heavy.
    • CPU/query ratios are 1.37, 1.51 (here to here)
    Insert/delete

    • QPS ratio is 0.69 for the delete test and 0.70 for the insert test
    • CPU/statement ratio is 2.32 for delete and 2.62 for insert


    No comments:

    Post a Comment

    RocksDB on a big server: LRU vs hyperclock, v2

    This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...