This has results for an IO-bound, low-concurrency insert benchmark on a small server. The previous report for a CPU-bound workload is here. This used the --delete_per_insert option so that the write-heavy steps ran for a long time while the working set remained in memory.
This work was delayed because I had to figure out the memory demand for create index to avoid OOM.
tl;dr
- 5.6.35 has better perf than 8.0.28 courtesy of new CPU overheads in upstream MySQL 8 and with 8.0.28 the throughput is up to 22% less than 5.6.35
- Variance is visible, but not horrible.
Benchmark
The insert benchmark was repeated for 1 client and 4 clients. For 1 client the l.i0 benchmark step loaded 800M rows into 1 table. For 4 clients it loaded 200M rows per table and there were 4 tables.
In both cases:
- the l.i1 benchmark step did 50M inserts matched by 50M deletes
- the q100, q500 and q1000 benchmark steps each ran for 1800 seconds
Reports
The first thing to acknowledge is the performance regression. There is more CPU overhead in MySQL 8.0 than 5.6 (in non-RocksDB code) so 8.0 gets less throughput than 5.6 in many cases. The following table shows the throughput in 8.0.28 relative to 5.6.35 and a value < 1.0 means 8.0 is slower. These results are from the Summary tables in the reports:
See the previous report for a description of the benchmark steps.
Throughput from 8.0.28 / 5.6.35
l.i0 l.x l.i1 q100 q500 q1000
0.78 0.97 0.90 1.03 0.99 0.98 1 client
0.80 0.95 1.01 0.92 0.90 0.90 4 clients
Interesting things in the reports:
- The largest regression is in l.i0
- Response times are similar for 1 client and for 4 clients.
- Insert response time graphs show two stall levels (two horizontal lines) for 8.0.28 but only one for 5.6.35. An example is here (see 5.6 and 8.0)
- QPS graphs vs time have a lot more noise with 8.0.28. Perhaps this is a function of CPU overhead. See here for 5.6 and for 8.0.
- There are intermittent slow queries in 5.6 and 8.0 (response time in usecs jumps from < 1000 to > 10k. I have yet to explain this. See here for 5.6.
- For the read+write benchmark steps (q100, q500, q1000) the CPU/query (see the cpupq columns here and here) explains the regressions in the 4 clients results. But I didn't expect this change and repeat the benchmark with a longer running time.
Write performance vs memory
The table below lists the insert rate for the l.i0 and l.i1 benchmark steps for the cached workload from the previous report and the IO-bound workload explained here. It might be a surprise that the rates are similar between cached and IO-bound. But this is expected thanks to the trivial move optimization (for l.i0) and read-free secondary index maintenance (for l.i1).
--- 1 client
Cached IO-bound
l.i0 l.i1 l.i0 l.i1
76628 26667 77527 26288 5.6.35
61162 23866 60469 23719 8.0.28
--- 4 clients
Cached IO-bound
l.i0 l.i1 l.i0 l.i1
194175 39877 212427 36062 5.6.35
158730 40530 169635 36298 8.0.28
No comments:
Post a Comment