Tuesday, May 30, 2023

Insert+delete benchmark, small server and MyRocks

This has results for an in-memory, low-concurrency insert benchmark on a small server. The previous report is here. This used the --delete_per_insert option so that the write-heavy steps ran for a long time while the working set remained in memory.

tl;dr

  • 5.6.35 has better perf than 8.0.28 courtesy of new CPU overheads in upstream MySQL 8
  • Variance is visible, but not horrible.

Benchmarks

The small server is a Beelink SER 4700u with 8 AMD cores, 16G RAM and NVMe SSD and low-concurrency was 1 and 4 clients. 

An overview of the insert benchmark is herehere and here. The insert benchmark was run for a cached database with both 1 and 4 clients. For 1 client the benchmark used 1 table. For 4 clients the benchmark was used 4 tables with a client per table. The read+write steps (q100.1, q500.1, q1000.1) were run for 1800 seconds each. The --delete_per_insert option was set for l.i1, q100.1, q500.1 and q1000.1.

Benchmarks were repeated for two configurations:
  • cached by RocksDB - all data fits in the RocksDB block cache
  • cached by OS - all data fits in the OS page cache but not the RocksDB block cache. For the small server RocksDB block cache size was set to 1G.
The my.cnf files are here for:
The benchmark is a sequence of steps.

  • l.i0
    • insert 20 million rows across all tables without secondary indexes
  • l.x
    • create 3 secondary indexes. I usually ignore results from this step.
  • l.i1
    • insert another 50 million rows across all tables with secondary index maintenance. Also delete 50M rows across all tables. The number of rows/table at the end of the benchmark step matches the number at the start. The inserts are done to the table head and the deletes are done from the tail.
  • q100.1
    • do queries as fast as possible with 100 inserts/s/client and the same rate for deletes/s done in the background
  • q500.1
    • do queries as fast as possible with 500 inserts/s/client and the same rate for deletes/s done in the background
  • q1000.1
    • do queries as fast as possible with 1000 inserts/s/client and the same rate for deletes/s done in the background
MyRocks

Tests used MyRocks from FB MySQL 5.6.35 and 8.0.28 with different builds to test compiler optimizations. The builds are described in a previous post and I tested the rel build for MyRocks 5.6.35 and the rel_native_lto build for 8.0.28.

Reports

Reports for cached by RocksDB:
Reports for cached by OS:
The first thing to acknowledge is the performance regression. There is more CPU overhead in MySQL 8.0 than 5.6 (in non-RocksDB code) so 8.0 gets less throughput than 5.6 in many cases. The following table shows the throughput in 8.0.28 relative to 5.6.35 and a value < 1.0 means 8.0 is slower. These results are from the Summary tables in the reports linked below.
  • The largest regression is in l.i0
  • Regressions are slightly worse for the 1-client case than the 4-client case

Throughput from 8.0.28 / 5.6.35

l.i0    l.x     l.i1    q100    q500    q1000   cached
-- 1 client
0.80    0.95    0.89    0.93    0.92    0.92    by RocksDB
0.77    0.94    0.88    0.91    0.92    0.92    by OS
-- 4 clients
0.82    0.93    1.02    0.99    0.96    0.95    by RocksDB
0.82    0.94    1.02    0.95    0.97    0.91    by OS

Other performance artifacts:

  • 1 client, cached by RocksDB
    • l.i1 - for 8.0.28 but not for 5.6.35 the max insert/delete response time charts show two levels, one at ~10ms and the other at ~20ms. See here.
    • q1000 - the QPS vs time chart shows a sawtooth pattern. This might be from changes in the CPU overhead/query as memtable flushes and L0->L1 compaction are done. There is more variance in the 8.0.28 result. See here.
  • 1 client, cached by OS
    • l.i0 & l.i1 - for 8.0.28 but not for 5.6.35 the max insert response time chart has 2 levels (one at ~10ms, the other at ~20ms). See here for l.i0 and for l.i1.
    • q1000 - the QPS vs time chart shows a sawtooth parttern. See here.
  • 4 clients, cached by RocksDB
    • l.i0 - the insert/s rate tails off, but the test finishes in 90s and that is too short from which to draw strong conclusions. See here.
    • q100 - QPS has variance and perhaps a sawtooth pattern. See here.
    • q500 - there are extended (>= 100s) for delete response time but not for inserts. They are worse for 8.0.28 than 5.6.35. See here.
    • q1000 - there are extended periods during which delete and query response time jump from <= 5ms to ~10ms. See here.
  • 4 clients, cached by OS
    • l.i0 - similar to 4 clients, cached by RocksDB. See here.
    • l.i1 - insert and delete response times have two levels. See here.
    • q100 - similar to 4 clients, cached by RocksDB. See here.
    • q500 - similar to 4 clients, cached by RocksDB plus QPS has a sawtooth pattern. See here.
    • q1000 - delete and insert response times for 8.0.28 have a permanent step halfway into the run. See here.

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...