This post has results from the Insert Benchmark for some MySQL 5.6 releases, all 5.7 releases and all 8.0 releases using a small server and an IO-bound workload. The previous post has results from a cached workload.
- There are significant regressions from 8.0.28 to 8.0.30 that I hope to explain in the future
- There are significant regressions from 5.6 to 8.0 but they are smaller here for an IO-bound workload than for a cached workload
- InnoDB perf would be easier to debug if there were counters for time (CPU seconds, wall clock) spent by InnoDB background threads, foreground threads doing single-page flushing, and foreground threads stalled on writeback while waiting for clean buffers.
The Insert Benchmark was run in one setup, IO0-bound as the database was larger than memory, there are many reads from storage for user queries.The benchmark used the Beelink server explained here that has 8 cores, 16G RAM and 1TB of NVMe SSD with XFS and Ubuntu 22.04.
The benchmark is run with 1 client and 1 table. The benchmark is a sequence of steps.
- insert 800 million rows per table
- create 3 secondary indexes. I usually ignore performance from this step.
- insert and delete another 5 million rows per table with secondary index maintenance. The number of rows/table at the end of the benchmark step matches the number at the start with inserts done to the table head and the deletes done from the tail.
- q100, q500, q1000
- do queries as fast as possible with 100, 500 and 1000 inserts/s/client and the same rate for deletes/s done in the background. Run for 1200 seconds.
- Some MySQL 5.6 releases
- From the Summary - throughput drops by ~6% for the l.i0 benchmark step (initial load) from 5.6.21 to 5.6.51. Otherwise throughput doesn't change.
- The results here for IO-bound are better than the results for a cached workload that had regressions for more benchmark steps.
- Results from vmstat/iostat metrics are similar across versions
- All MySQL 5.7 releases
- From the Summary throughput in 5.7.43 drops by (6%, 8%, 4%) for write-heavy (l.i0, l.x, l.i1) and (2%, 2%, 2%) for read-heavy (q100, q500, q1000) benchmark steps relative to the base case (5.7.10).
- The results here for IO-bound are similar to the results for a cached workload.
- Results from vmstat/iostat metrics show there is a small increase in CPU/operation for the write-heavy steps (l.i0, l.x, l.i1).
- All MySQL 8.0 releases
- From the Summary throughput drops by ~17% for the l.i0 benchmark step (initial load) from 8.0.10 to 8.0.34. Otherwise throughput doesn't change.
- Comparing these results to a cached workload - both have the ~20% regression for l.i0 but the cached workload results also have ~20 regressions for the read-write steps (q100, q500, q1000).
- Results from the vmstat/iostat metrics are interesting. They might show an 8% increase in CPU/operation (see cpupq). They definitely show a large decrease (more than half) in context switches per operation (see cspq). Write IO per operation (see wkbpi) doesn't change but MB written to IO/s (see wmbps) drops by ~25%. So it is possible that writeback got slower over time and that slowdown is the root cause for the regression in l.i0. Also, for both cached and IO-bound workloads the l.i0 throughput drops by ~7% from 8.0.28 to 8.0.30 (there is no 8.0.29). I would like to see counters for the time (wall clock, CPU seconds) for which InnoDB background threads are active but I have yet to find that in the PS or IS.
- Comparing MySQL 5.6, 5.7 and 8.0
- From the Summary throughput drops by ~50% for l.i0, increases by ~30% for l.x and l.i1 and drops by up to 10% for the read-write steps (q100, q500, q1000).
- Comparing these results to a cached workload - both have the ~20% regression for l.i0 and ~20% improvement for l.x but they diverge for other benchmark steps. For l.i1 8.0.34 is ~30% faster than 5.6.21 with IO-bound vs ~20% slower with cached. For the read-write steps (q100, q500, q1000) 8.0.34 is up to 10% slower with IO-bound vs ~50% slower with cached.
- Results from the vmstat/iostat metrics show
- A ~1.5X increase in CPU/operation for l.i0. All of that is in 8.0.x.
- A ~3X increase in CPU/operation for l.x that arrives between 8.0.20 and 8.0.27.
- A ~1.5X increase in CPU/operaton for q100 that arrives in 5.7.10. That drops to ~1.1X for q500 and then doesn't repeat in q1000.