Monday, October 9, 2023

Perf regressions in MySQL from 5.6.21 to 8.0.34 using the Insert Benchmark and a small server, part 2

This post has results from the Insert Benchmark for some MySQL 5.6 releases, all 5.7 releases and all 8.0 releases using a small server and an IO-bound workload. The previous post has results from a cached workload.

tl;dr

  • There are significant regressions from 8.0.28 to 8.0.30 that I hope to explain in the future
  • There are significant regressions from 5.6 to 8.0 but they are smaller here for an IO-bound workload than for a cached workload
  • InnoDB perf would be easier to debug if there were counters for time (CPU seconds, wall clock) spent by InnoDB background threads, foreground threads doing single-page flushing, and foreground threads stalled on writeback while waiting for clean buffers.

Builds

It isn't easy to build older code on newer systems, compilers, etc. Notes on that are here for 5.6for 5.6 and 5.7for 5.7 and for 8.0. A note on using cmake is here. The rel builds were used as everything was compiled using CMAKE_BUILD_TYPE=Release.

Tests were done for:
  • 5.6 - 5.6.21, 5.6.31, 5.6.41, 5.6.51
  • 5.7 - all point releases from 5.7.10 to 5.7.43
  • 8.0 - all point releases from 8.0.13 to 8.0.34
I used the cy10a_bee config and it is here for 5.65.7 and 8.0. For 8.0 releases older than 8.0.19 I changed innodb_idle_flush_pct=1 to loose_innodb_idle_flush_pct=1.

Benchmark

The Insert Benchmark was run in one setup, IO-bound as the database was larger than memory, there are many reads from storage for user queries.

The benchmark used the Beelink server explained here that has 8 cores, 16G RAM and 1TB of NVMe SSD with XFS and Ubuntu 22.04. 

The benchmark is run with 1 client and 1 table. The benchmark is a sequence of steps.

  • l.i0
    • insert 800 million rows per table
  • l.x
    • create 3 secondary indexes. I usually ignore performance from this step.
  • l.i1
    • insert and delete another 5 million rows per table with secondary index maintenance. The number of rows/table at the end of the benchmark step matches the number at the start with inserts done to the table head and the deletes done from the tail. 
  • q100, q500, q1000
    • do queries as fast as possible with 100, 500 and 1000 inserts/s/client and the same rate for deletes/s done in the background. Run for 1200 seconds.
Results

The Summary sections linked below have tables for absolute and relative throughput. Absolute throughput is just the QPS per version. The relative throughput is (QPS for my version / QPS for base version). The base version is 5.6.21 for the 5.6-only report, 5.7.10 for the 5.7-only report, 8.0.13 for the 8.0-only report and 5.6.21 for the report that compares 5.6, 5.7 and 8.0.

Reports are here for
  • Some MySQL 5.6 releases
    • From the Summary - throughput drops by ~6% for the l.i0 benchmark step (initial load) from 5.6.21 to 5.6.51. Otherwise throughput doesn't change. 
    • The results here for IO-bound are better than the results for a cached workload that had regressions for more benchmark steps.
    • Results from vmstat/iostat metrics are similar across versions
  • All MySQL 5.7 releases
    • From the Summary throughput in 5.7.43 drops by (6%, 8%, 4%) for write-heavy (l.i0, l.x, l.i1) and (2%, 2%, 2%) for read-heavy (q100, q500, q1000) benchmark steps relative to the base case (5.7.10). 
    • The results here for IO-bound are similar to the results for a cached workload.
    • Results from vmstat/iostat metrics show there is a small increase in CPU/operation for the write-heavy steps (l.i0, l.x, l.i1).
  • All MySQL 8.0 releases
    • From the Summary throughput drops by ~17% for the l.i0 benchmark step (initial load) from 8.0.10 to 8.0.34. Otherwise throughput doesn't change. 
    • Comparing these results to a cached workload - both have the ~20% regression for l.i0 but the cached workload results also have ~20 regressions for the read-write steps (q100, q500, q1000).
    • Results from the vmstat/iostat metrics are interesting. They might show an 8% increase in CPU/operation (see cpupq). They definitely show a large decrease (more than half) in context switches per operation (see cspq). Write IO per operation (see wkbpi) doesn't change but MB written to IO/s (see wmbps) drops by ~25%. So it is possible that writeback got slower over time and that slowdown is the root cause for the regression in l.i0. Also, for both cached and IO-bound workloads the l.i0 throughput drops by ~7% from 8.0.28 to 8.0.30 (there is no 8.0.29). I would like to see counters for the time (wall clock, CPU seconds) for which InnoDB background threads are active but I have yet to find that in the PS or IS.
  • Comparing MySQL 5.6, 5.7 and 8.0
    • From the Summary throughput drops by ~50% for l.i0, increases by ~30% for l.x and l.i1 and drops by up to 10% for the read-write steps (q100, q500, q1000).
    • Comparing these results to a cached workload - both have the ~20% regression for l.i0 and ~20% improvement for l.x but they diverge for other benchmark steps. For l.i1 8.0.34 is ~30% faster than 5.6.21 with IO-bound vs ~20% slower with cached. For the read-write steps (q100, q500, q1000) 8.0.34 is up to 10% slower with IO-bound vs ~50% slower with cached.
    • Results from the vmstat/iostat metrics show
      • A ~1.5X increase in CPU/operation for l.i0. All of that is in 8.0.x.
      • A ~3X increase in CPU/operation for l.x that arrives between 8.0.20 and 8.0.27.
      • A ~1.5X increase in CPU/operaton for q100 that arrives in 5.7.10. That drops to ~1.1X for q500 and then doesn't repeat in q1000.














No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...