Sunday, December 31, 2023

Updated Insert benchmark: InnoDB, MySQL 5.6 to 8.0, large server, cached database

This has results for the Insert Benchmark using InnoDB from MySQL 5.6 to 8.0, a large server and a cached workload. 

tl;dr

  • MySQL 8.0 is faster than 5.6 for writes because the improvements for concurrent workloads are larger than the regressions from new CPU overheads. With 8.0 the throughput is ~2X larger than 5.6 for l.i0 (inserts with no secondary indexes) and ~3X larger for l.i1/l.i2 (random inserts, random deletes).
  • MySQL 8.0 is slower than 5.6 for reads because there are new CPU overheads. For range queries with 8.0 the throughput is ~13% to ~40% less than 5.6 and for point queries it is ~20% less.
  • With MySQL 5.7 the write throughput is not as large as MySQL 8.0 but the read throughput is better. 
  • Context matters as the results here for MySQL 8.0 are better than results on a small server. The difference is that I use a workload with high concurrency (24 clients) on the large server and low concurrency (1 client) on the small server. On both servers MySQL 8.0 pays a price from new CPU overheads. But on the large server it also gets a benefit from some of that new code.
Build + Configuration

I tested MySQL 5.6.51, 5.7.40 and 8.0.33. These were compiled from source. The builds use non-default compiler toolchains to run on production hardware so the build process is non-trivial (CMake input files have to set many things) and I am reluctant to build many versions from each major release because it takes hours to days to figure out the CMake input file. All builds use CMAKE_BUILD_TYPE =Release, -march=native and -mtune=native. The 8.0 builds might have used -DWITH_LTO =ON.

The my.cnf files are here for 5.6, for 5.7 and for 8.0.

Benchmark

The benchmark is run with 24 clients to avoid over-subscribing the CPU.

I used the updated Insert Benchmark so there are more benchmark steps described below. In order, the benchmark steps are:

  • l.i0
    • insert 20 million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.
  • l.x
    • create 3 secondary indexes per table. There is one connection per client.
  • l.i1
    • use 2 connections/client. One inserts 50M rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.
  • l.i2
    • like l.i1 but each transaction modifies 5 rows (small transactions).
  • qr100
    • use 3 connections/client. One does range queries for 3600 seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.
  • qp100
    • lik qr100 except uses point queries on the PK index
  • qr500
    • like qr100 but the insert and delete rates are increased from 100/s to 500/s
  • qp500
    • like qp100 but the insert and delete rates are increased from 100/s to 500/s
  • qr1000
    • like qr100 but the insert and delete rates are increased from 100/s to 1000/s
  • qp1000
    • like qp100 but the insert and delete rates are increased from 100/s to 1000/s

Results

The performance report is here. It has a lot more detail including charts, tables and metrics from iostat and vmstat to help explain the performance differences.

The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.

A brief overview from the summary

  • MySQL 8.0 is faster than 5.6 for writes because the improvements for concurrent workloads are larger than the regressions from new CPU overheads. With 8.0 the throughput is ~2X larger than 5.6 for l.i0 (inserts with no secondary indexes) and ~3X larger for l.i1/l.i2 (random inserts, random deletes).
  • MySQL 8.0 is slower than 5.6 for reads because there are new CPU overheads. For range queries with 8.0 the throughput is ~13% to ~40% less than 5.6 and for point queries it is ~20% less.
  • MySQL 5.7 does better than 8.0 for reads but worse for writes.
Most of the performance differences can be explained by CPU overhead. See the cpupq column in the tables here. This is the total CPU time per operation where CPU time is the sum of the us and sy columns in vmstat and operations is inserts for l.i0, l.i1 and l.i2, indexed rows for l.x and queries for the qr* and qp* steps. Note that that cpupq isn't perfect -- it measures too much including the benchmark client CPU -- but it still is a useful tool.

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...