Monday, June 12, 2023

Insert+delete benchmark: small server, MyRocks and IO-bound

This has results for an IO-bound, low-concurrency insert benchmark on a small server. The previous report for a CPU-bound workload is here. This used the --delete_per_insert option so that the write-heavy steps ran for a long time while the working set remained in memory.

This work was delayed because I had to figure out the memory demand for create index to avoid OOM.

tl;dr

  • 5.6.35 has better perf than 8.0.28 courtesy of new CPU overheads in upstream MySQL 8 and with 8.0.28 the throughput is up to 22% less than 5.6.35
  • Variance is visible, but not horrible.
Benchmark

See the previous report. MyRocks used the cy10a_bee config for 5.6.35 and for 8.0.28.

The insert benchmark was repeated for 1 client and 4 clients. For 1 client the l.i0 benchmark step loaded 800M rows into 1 table. For 4 clients it loaded 200M rows per table and there were 4 tables.

In both cases:
  • the l.i1 benchmark step did 50M inserts matched by 50M deletes
  • the q100, q500 and q1000 benchmark steps each ran for 1800 seconds
Reports

Reports are here for 1 client and 4 clients.

The first thing to acknowledge is the performance regression. There is more CPU overhead in MySQL 8.0 than 5.6 (in non-RocksDB code) so 8.0 gets less throughput than 5.6 in many cases. The following table shows the throughput in 8.0.28 relative to 5.6.35 and a value < 1.0 means 8.0 is slower. These results are from the Summary tables in the reports:

See the previous report for a description of the benchmark steps.

Throughput from 8.0.28 / 5.6.35

l.i0    l.x     l.i1    q100    q500    q1000
0.78    0.97    0.90    1.03    0.99    0.98    1 client
0.80    0.95    1.01    0.92    0.90    0.90    4 clients

Interesting things in the reports:
  • The largest regression is in l.i0
  • Response times are similar for 1 client and for 4 clients.
  • Insert response time graphs show two stall levels (two horizontal lines) for 8.0.28 but only one for 5.6.35. An example is here (see 5.6 and 8.0)
  • QPS graphs vs time have a lot more noise with 8.0.28. Perhaps this is a function of CPU overhead. See here for 5.6 and for 8.0.
  • There are intermittent slow queries in 5.6 and 8.0 (response time in usecs jumps from < 1000 to > 10k. I have yet to explain this. See here for 5.6.
  • For the read+write benchmark steps (q100, q500, q1000) the CPU/query (see the cpupq columns here and here) explains the regressions in the 4 clients results. But I didn't expect this change and repeat the benchmark with a longer running time.
Write performance vs memory

The table below lists the insert rate for the l.i0 and l.i1 benchmark steps for the cached workload from the previous report and the IO-bound workload explained here. It might be a surprise that the rates are similar between cached and IO-bound. But this is expected thanks to the trivial move optimization (for l.i0) and read-free secondary index maintenance (for l.i1).

--- 1 client

Cached                  IO-bound
l.i0    l.i1            l.i0    l.i1
76628   26667           77527   26288   5.6.35
61162   23866           60469   23719   8.0.28

--- 4 clients

Cached                  IO-bound
l.i0    l.i1            l.i0    l.i1
194175  39877           212427  36062   5.6.35
158730  40530           169635  36298   8.0.28

No comments:

Post a Comment