Sunday, December 31, 2023

Updated Insert benchmark: MyRocks 5.6 and 8.0, large server, cached database

This has results for the Insert Benchmark using MyRocks 5.6 and 8.0 using a large server and cached workload. 

tl;dr

  • performance between MyRocks 5.6.35, 8.0.28 and 8.0.32 are similar, unlike what I see with InnoDB where there are large changes, both good (writes) and bad (for reads) using the same setup
Build + Configuration

I tested MyRocks 5.6.35, 8.0.28 and 8.0.32. These were compiled from source. All builds use CMAKE_BUILD_TYPE =Release.
  • MyRocks 5.6.35
    • compiled with clang on Dec 21, 2023 at git hash 4f3a57a1, RocksDB 8.7.0 at git hash 29005f0b
  • MyRocks 8.0.28
    • compiled with clang on Dec 22, 2023 at git hash 2ad105fc, RocksDB 8.7.0 at git hash 29005f0b
  • MyRocks 8.0.32
    • compiled with clang on Dec 22, 2023 at git hash 76707b44, RocksDB 8.7.0 at git hash 29005f0b
The my.cnf files are here for 5.6.35 (d1), 8.0.28 (d1 and d2) and 8.0.32 (d1 and d2). For MyRocks 8.0 the d1 configs enable the performance_schema and the d2 configs disable it. The full names for the configs are my.cnf.cy9d1_u (d1) and my.cnf.cy9d2_u (d2).

Benchmark

The benchmark is run with 24 clients to avoid over-subscribing the CPU. The server has 40 cores, 80 HW threads (hyperthreads enabled), 256G of RAM and many TB of fast SSD.

I used the updated Insert Benchmark so there are more benchmark steps described below. In order, the benchmark steps are:

  • l.i0
    • insert 20 million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.
  • l.x
    • create 3 secondary indexes per table. There is one connection per client.
  • l.i1
    • use 2 connections/client. One inserts 50M rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.
  • l.i2
    • like l.i1 but each transaction modifies 5 rows (small transactions).
  • qr100
    • use 3 connections/client. One does range queries for 3600 seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.
  • qp100
    • lik qr100 except uses point queries on the PK index
  • qr500
    • like qr100 but the insert and delete rates are increased from 100/s to 500/s
  • qp500
    • like qp100 but the insert and delete rates are increased from 100/s to 500/s
  • qr1000
    • like qr100 but the insert and delete rates are increased from 100/s to 1000/s
  • qp1000
    • like qp100 but the insert and delete rates are increased from 100/s to 1000/s

Results

The performance report is here. It has a lot more detail including charts, tables and metrics from iostat and vmstat to help explain the performance differences.

From the summary throughput is similar between MyRocks 5.6 and 8.0. The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to MyRocks 5.6.35. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.

Another report is here and the difference is that I changed the benchmark to wait for X seconds after l.i2 to reduce writeback debt before the read-write steps start. And X is max(1200, 60 + #rows/1M). The goal is to reduce variance for the read-write benchmark steps.

From the HW efficiency metrics per benchmark step
  • l.i0l.x
    • metrics are similar
  • l.i1
    • most metrics are similar but CPU overhead (cpupq) is ~13% larger in 8.0 and max response time (max_op) is more than 2X larger in 8.0
  • l.i2
    • most metrics are similar but CPU overhead (cpupq) is ~39% larger in 8.0 and max response time (max_op) is more than 2X larger in 8.0
  • qr100, qp100, qr500, qp500, qr1000, qp1000
    • metrics are similar
It is interesting that the CPU overhead is larger for MyRocks 8.0.28 and 8.0.32 during the l.i1 and l.i2 benchmark steps because throughput and response time distributions are similar to MyRocks 5.6.35. So the extra CPU in MyRocks 8.0 does not appear to occur in the foreground threads that processes the insert and delete statements. 

The CPU overhead from compaction is 3% to 6% larger for MyRocks 8.0 -- see the CompMergeCPU(sec) column on the Sum line for 5.6.35, 8.0.28 and 8.0.32. But that can only explain a few (<= 5) usecs/operation of CPU overhead -- not the 20 to 100+ that I want to explain. 

If I just look at the output from ps aux | grep mysqld I see that MyRocks 8.0 uses at least 1.2X more CPU than 5.6. But again, that can be from background activities and might not be on the path for foreground operations (queries).

For now this is a mystery to which I will return later.












Updated Insert benchmark: InnoDB, MySQL 5.6 to 8.0, large server, cached database

This has results for the Insert Benchmark using InnoDB from MySQL 5.6 to 8.0, a large server and a cached workload. 

tl;dr

  • MySQL 8.0 is faster than 5.6 for writes because the improvements for concurrent workloads are larger than the regressions from new CPU overheads. With 8.0 the throughput is ~2X larger than 5.6 for l.i0 (inserts with no secondary indexes) and ~3X larger for l.i1/l.i2 (random inserts, random deletes).
  • MySQL 8.0 is slower than 5.6 for reads because there are new CPU overheads. For range queries with 8.0 the throughput is ~13% to ~40% less than 5.6 and for point queries it is ~20% less.
  • With MySQL 5.7 the write throughput is not as large as MySQL 8.0 but the read throughput is better. 
  • Context matters as the results here for MySQL 8.0 are better than results on a small server. The difference is that I use a workload with high concurrency (24 clients) on the large server and low concurrency (1 client) on the small server. On both servers MySQL 8.0 pays a price from new CPU overheads. But on the large server it also gets a benefit from some of that new code.
Build + Configuration

I tested MySQL 5.6.51, 5.7.40 and 8.0.33. These were compiled from source. The builds use non-default compiler toolchains to run on production hardware so the build process is non-trivial (CMake input files have to set many things) and I am reluctant to build many versions from each major release because it takes hours to days to figure out the CMake input file. All builds use CMAKE_BUILD_TYPE =Release, -march=native and -mtune=native. The 8.0 builds might have used -DWITH_LTO =ON.

The my.cnf files are here for 5.6, for 5.7 and for 8.0.

Benchmark

The benchmark is run with 24 clients to avoid over-subscribing the CPU.

I used the updated Insert Benchmark so there are more benchmark steps described below. In order, the benchmark steps are:

  • l.i0
    • insert 20 million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.
  • l.x
    • create 3 secondary indexes per table. There is one connection per client.
  • l.i1
    • use 2 connections/client. One inserts 50M rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.
  • l.i2
    • like l.i1 but each transaction modifies 5 rows (small transactions).
  • qr100
    • use 3 connections/client. One does range queries for 3600 seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.
  • qp100
    • lik qr100 except uses point queries on the PK index
  • qr500
    • like qr100 but the insert and delete rates are increased from 100/s to 500/s
  • qp500
    • like qp100 but the insert and delete rates are increased from 100/s to 500/s
  • qr1000
    • like qr100 but the insert and delete rates are increased from 100/s to 1000/s
  • qp1000
    • like qp100 but the insert and delete rates are increased from 100/s to 1000/s

Results

The performance report is here. It has a lot more detail including charts, tables and metrics from iostat and vmstat to help explain the performance differences.

The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.

A brief overview from the summary

  • MySQL 8.0 is faster than 5.6 for writes because the improvements for concurrent workloads are larger than the regressions from new CPU overheads. With 8.0 the throughput is ~2X larger than 5.6 for l.i0 (inserts with no secondary indexes) and ~3X larger for l.i1/l.i2 (random inserts, random deletes).
  • MySQL 8.0 is slower than 5.6 for reads because there are new CPU overheads. For range queries with 8.0 the throughput is ~13% to ~40% less than 5.6 and for point queries it is ~20% less.
  • MySQL 5.7 does better than 8.0 for reads but worse for writes.
Most of the performance differences can be explained by CPU overhead. See the cpupq column in the tables here. This is the total CPU time per operation where CPU time is the sum of the us and sy columns in vmstat and operations is inserts for l.i0, l.i1 and l.i2, indexed rows for l.x and queries for the qr* and qp* steps. Note that that cpupq isn't perfect -- it measures too much including the benchmark client CPU -- but it still is a useful tool.

Updates for the Insert Benchmark, December 2023

This describes the Insert Benchmark after another round of changes I recently made. Past descriptions are here and here. Source code for the benchmark is here. It is still written in Python and I hope that one day a Python JIT will arrive to reduce the overhead of the benchmark client.

The changes include:

  • switched from float to int for the price column
  • changed the marketsegment index from (price, customerid) to (productid, customerid). The other indexes are unchanged -- registersegment is on (cashregisterid, price, customerid) and pdc is on (price, dateandtime, customerid)
  • added a mode for the read-write benchmark step to do point queries on the PK index
  • changed some of the benchmark steps to do a delete per insert to avoid growing the table size. I made this change a few months ago.
  • add the l.i2 benchmark step that modifies fewer rows per transaction compared to l.i1
  • added code to reduce checkpoint (InnoDB) and compaction (RocksDB) debt that runs between the l.i2 and qr100 benchmark steps. The code for Postgres was already there.
Alas, I have yet to address coordinated omission.

Benchmark steps

The benchmark is run with X clients and usually with a client per table.

The benchmark is a sequence of steps that are run in order:
  • l.i0
    • insert Y million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.
  • l.x
    • create 3 secondary indexes per table. There is one connection per client.
  • l.i1
    • use 2 connections/client. One does inserts as fast as possible and the other does deletes at the same rate as the inserts to avoid changing the number of rows in the table. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.
  • l.i2
    • like l.i1 but each transaction modifies 5 rows (small transactions).
    • Wait for X seconds after the step finishes to reduce variance during the read-write benchmark steps that follow where X is max(1200, 60 + #nrows/1M). While waiting do things to reduce writeback debt where the things are:
      • Postgres (see here) - do a vacuum analyze for all tables concurrently. When that finishes do a checkpoint.
      • InnoDB (see here) - change innodb_max_dirty_pages_pct[_lwm] to 1, change innodb_idle_flush_pct to 100. When done waiting restore them to previous values.
      • MyRocks (see here) - set rocksdb_force_flush_memtable_now to flush the memtable, wait 20 seconds and then set rocksdb_compact_lzero_now to flush L0. Note that rocksdb_compact_lzero_now wasn't supported until mid-2023.
  • qr100
    • use 3 connections/client. One does range queries as fast as possible and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.
  • qp100
    • lik qr100 except uses point queries on the PK index
  • qr500
    • like qr100 but the insert and delete rates are increased from 100/s to 500/s
  • qp500
    • like qp100 but the insert and delete rates are increased from 100/s to 500/s
  • qr1000
    • like qr100 but the insert and delete rates are increased from 100/s to 1000/s
  • qp1000
    • like qp100 but the insert and delete rates are increased from 100/s to 1000/s

Friday, December 29, 2023

RocksDB 8.x benchmarks: large server, cached database

This post has results for performance regressions in all versions of 8.x from 8.0.0 to 8.9.2 using a large server. In a previous post I shared results for RocksDB 8.0 to 8.8 on the same hardware.

tl;dr

  • The regressions from 8.0 to 8.9 are small. For the non read-only benchmark steps, QPS from 8.9.2 is at most 3% less than 8.0.0 with leveled and at most 1% less with universal.
I focus on the benchmark steps that aren't read-only because they suffer less from noise.  These benchmark steps are fillseq, revrangewhilewriting, fwdrangewhilewriting, readwhilewriting and overwriteandwait. I also focus on leveled more so than universal, in part because there is more noise with universal but also because the workloads I care about most use leveled.

Builds

I compiled with gcc RocksDB 8.0.0, 8.1.1, 8.2.1, 8.3.3, 8.4.4, 8.5.4, 8.6.7, 8.7.3 and 8.8.1 and 8.9.2 which are the latest patch releases.

Benchmark

All tests used a server with 40 cores, 80 HW threads, 2 sockets, 256GB of RAM and many TB of fast NVMe SSD with Linux 5.1.2, XFS and SW RAID 0 across 6 devices. For the results here, the database is cached by RocksDB. The benchmark was repeated for leveled and universal compaction.

Everything used the LRU block cache and the default value for compaction_readahead_size. Soon I will switch to using the hyper clock cache once RocksDB 9.0 arrives.

I used my fork of the RocksDB benchmark scripts that are wrappers to run db_bench. These run db_bench tests in a special sequence -- load in key order, read-only, do some overwrites, read-write and then write-only. The benchmark was run using 24 threads. How I do benchmarks for RocksDB is explained here and here. The command line to run them is: 
bash x3.sh 24 no 3600 c40r256bc180 40000000 4000000000 byrx
A spreadsheet with all results is here and performance summaries are here for leveled and for universal.

Results: leveled

For the non read-only benchmark steps, QPS from 8.9.2 is at most 3% less than from 8.0.0.
Results: universal

For the non read-only benchmark steps, QPS from 8.9.2 is at most 1% less than from 8.0.0.


Sunday, December 24, 2023

Create InnoDB indexes 2X faster with this simple trick

I made parallel create index for InnoDB up to 2.4X faster by doing one of:

I filed bug 113505 for this. The problem is that the parallel create index threads all call that function and there is too much contention on the counter that is incremented. The problem wasn't obvious from Linux perf when using the default metric (cycles). I then tried perf stat and IPC (instructions/cycle) was about 2X larger during create index with performance_schema=0. Finally, I repeated the perf measurements using memory system counters (cache-misses, etc) and with perf record, perf report and perf annotate the problem was obvious.

I am still deciding whether this 2X faster bug fix is more exciting then the one I found for Falcon.

Results

The results here are from a server with 2 sockets, 12 cores/socket, 64G of RAM running Ubuntu 22.04 and one m.2 device with 2TB and XFS. All tests use MySQL 8.0.35 and all of the my.cnf files are here.

The test is the create index benchmark step from the Insert Benchmark with one client, 80M rows and 3 secondary indexes created in one ALTER statement. With innodb_ddl_threads=N if N=1 then only one thread handles the index create work, otherwise it is done in parallel over N threads -- this is assuming innodb_parallel_read_threads >= innodb_ddl_threads, I am not sure I want to remember how they interact.

I tested my.cnf configs with innodb_ddl_threads and innodb_parallel_read_threads set to =1, =4, =8 and =16. Note that the default is =4. Tests were repeated with performance_schema=1 and =0 and for =1, nothing else was set for the perf schema.

The following configurations are tested:
  • ddl1 - innodb_ddl_threads=1, innodb_parallel_read_threads=1, performance_schema=1
  • ddl1+ps0 - like ddl1 but with perfomance_schema=0
  • ddl4 - innodb_ddl_threads=4, innodb_parallel_read_threads=4, performance_schema=1
  • ddl4+ps0 - like ddl4 but with perfomance_schema=0
  • ddl8 - innodb_ddl_threads=8, innodb_parallel_read_threads=8, performance_schema=1
  • ddl8+ps0 - like ddl8 but with perfomance_schema=0
  • ddl16 - innodb_ddl_threads=16, innodb_parallel_read_threads=16, performance_schema=1
  • ddl16+ps0 - like ddl16 but with perfomance_schema=0
The chart shows the time to create indexes for MySQL 8.0.35 using two builds. Note that shorter bars are better here. The as-is build is 8.0.35 unchanged and the fixed build is 8.0.35 with this line removed.
  • When performance_schema=0 then there is no difference between the as-is and fixed builds
  • When performance_schema=1 then the fixed build is ~2X faster than the as-is build


Monday, December 18, 2023

What is the cost of the performance schema in MySQL 5.7 and 8.0: small server, cached database, simple queries

This post has results from the Insert Benchmark for some MySQL 8.0.35 with InnoDB using the Insert Benchmark, a small server and a cached workload. The goal is to document the overhead of the performance schema.

tl;dr

  • The perf schema in 5.7 costs <= 4% of QPS
  • The perf schema in 8.0 costs ~10% of QPS, except for index create where the cost is much larger. I will try to explain that later. 
Build + configuration

I used InnoDB with MySQL 8.0.35 and 5.7.44.

The cmake files for each of the builds are here for 5.7.44 and for 8.0.35

By default the my.cnf files I use for MySQL 5.7 and 8.0 have performance_schema=1 but I don't otherwise enable instruments. So the tests here document the overhead from using whatever is enabled by default with the perf schema.

For 8.0.35 there were 5 variants of the build and 2 my.cnf files. I didn't test the full cross product, but there are 7 different combinations I tried that are listed below.

The 8.0.35 builds are: 

  • my8035_rel
    • MySQL 8.0.35 and the rel build with CMAKE_BUILD_TYPE=Release
  • my8035_rel_native
    • MySQL 8.0.35 and the rel_native build with CMAKE_BUILD_TYPE=Release -march=native -mtune=native
  • my8035_rel_native_lto
    • MySQL 8.0.35 and the rel_native_lto build with CMAKE_BUILD_TYPE=Release -march=native -mtune=native WITH_LTO=ON
  • my8035_rel_less
    • MySQL 8.0.35 and the rel_less build with CMAKE_BUILD_TYPE=Release ENABLED_PROFILING=OFF WITH_RAPID=OFF
  • my8035_rel_lessps.cz10a_bee
    • MySQL 8.0.35 and the rel build with CMAKE_BUILD_TYPE=Release and as much as possible of the perf schema code disabled at compile time. See the cmake files linked above.
The 8.0.35 my.cnf files:

I tried 7 combinations of build + configuration for 8.0.35:

  • my8035_rel.cz10a_bee
  • my8035_rel_native.cz10a_bee
  • my8035_rel_native_lto.cz10a_bee
  • my8035_rel_less.cz10a_bee
  • my8035_rel_lessps.cz10a_bee
  • my8035_rel.cz10aps0_bee
  • my8035_rel_lessps.cz10aps0_bee
For 5.7.44 I tried two combinations of build + configuration
  • my5744_rel.cz10a_bee - uses the rel build and cz10a_bee my.cnf
  • my5744_rel.cz10aps0_bee - uses the rel build and cz10aps0_bee my.cnf that disables the perf schema

Benchmark

The Insert Benchmark was run in one setup - a cached workload.

The benchmark used the Beelink server explained here that has 8 cores, 16G RAM and 1TB of NVMe SSD with XFS and Ubuntu 22.04. 

The benchmark is run with 1 client and 1 table. The benchmark is a sequence of steps.

  • l.i0
    • insert 20 million rows per table
  • l.x
    • create 3 secondary indexes. I usually ignore performance from this step.
  • l.i1
    • insert and delete another 50 million rows per table with secondary index maintenance. The number of rows/table at the end of the benchmark step matches the number at the start with inserts done to the table head and the deletes done from the tail. 
  • q100, q500, q1000
    • do queries as fast as possible with 100, 500 and 1000 inserts/s/client and the same rate for deletes/s done in the background. Run for 1800 seconds.
Results

Reports are here for 5.7.44 and for 8.0.35.

The Summary sections linked below have tables for absolute and relative throughput. The relative throughput is (QPS for my version / QPS for base version). The base version is my5744_rel.cz10a_bee for the 5.7.44 report and my8035_rel.cz10a_bee for the 8.0.35 report.

The summaries linked below have three tables. The first table shows the absolute throughput (queries or writes/s). The second table shows the throughput for a given version relative to MySQL 5.6.21 (a value less than 1.0 means the version is slower). The third table shows the average background insert rate during q100, q500 and q1000. All versions sustained the targets, so that table can be ignored here.

By write-heavy I mean the l.i0, l.x and l.i1 benchmark steps. By read-heavy I mean the q100, q500 and q1000 benchmark steps. Below I use red and green to indicate when the relative QPS is down or up by more than 5% and grey otherwise.

From the Summary for 5.7.44
  • relative QPS is relative to my5744_rel.cz10a_bee
  • relative QPS for my5744_rel.cz10aps0_bee is (1.03, 1.12, 1.03) and (1.02, 1.03, 1.04) for write- and read-heavy. So throughput benefit from disabling the performance schema is <= 4% except for the l.x benchmark step that creates indexes.
From the Summary for 8.0.35
  • relative QPS is relative to my8035_rel.cz10a_bee
  • my8035_rel_native.cz10a_bee and my8035_rel_less.cz10a_bee have performance similar to the base case
  • my8035_rel_native_lto.cz10a_bee gets between 4% and 9% more QPS than the base case
  • builds that disable the perf schema in my.cnf or at compile time get ~1.07X more QPS for l.i0 and l.i1, ~1.5X more throughput for index create and between 1.11X and 1.15X more QPS for read-heavy. I will soon get flamegraphs to explain some of this.

Perf regressions in MySQL from 5.6 to 8.0 using the Insert Benchmark and a small server, Dec23: part 1

This post has results from the Insert Benchmark for some MySQL 5.6 releases, all 5.7 releases and all 8.0 releases using a small server and a cached workload. It follows up on work I published in October. The differences are that I updated the my.cnf files to make them more similar across releases and I added results for pre-GA releases from 5.7 and 8.0.

tl;dr

  • For 4 of the 6 benchmark steps, MySQL 8.0.35 gets about 60% of the throughput relative to 5.6.51 with simple queries and a cached database. The regressions for random writes (l.i1) is less and index create (l.x) is faster in 8.0 than in 5.6.
  • For MySQL 5.7 about half of the perf regressions occur between the last 5.6 release (5.6.51) and the first 5.7 release that I was able to compile (5.7.5). The rest are spread evenly across 5.7.
  • For MySQL 8.0 almost all of the perf regressions are spread across 8.0 releases from 8.0.0 to 8.0.35. 
  • For InnoDB there is at least one significant regression arriving in 8.0.30+ related to which code gets inlined by the compiler. This is somewhat obfuscated because two perf bugs that hurt this benchmark were fixed after 8.0.28. For more details see PS-8822.
Builds

It isn't easy to build older code on newer systems, compilers, etc. Notes on that are here for 5.6for 5.6 and 5.7for 5.7 and for 8.0. A note on using cmake is here. The rel builds were used as everything was compiled using CMAKE_BUILD_TYPE=Release.

Tests were done for:
  • 5.6 - 5.6.21, 5.6.51
  • 5.7 - 5.7.5 thru 5.7.10, 5.7.20, 5.7.30, 5.7.44
  • 8.0 - 8.0.0 thru 8.0.5, 8.0.11 thru 8.0.14, 8.0.20, 8.0.22, 8.0.28, 8.0.30, 8.0.34, 8.0.35
I used the cz10a_bee config and it is here for
Benchmark

The Insert Benchmark was run in one setup - a cached workload.

The benchmark used the Beelink server explained here that has 8 cores, 16G RAM and 1TB of NVMe SSD with XFS and Ubuntu 22.04. 

The benchmark is run with 1 client and 1 table. The benchmark is a sequence of steps.

  • l.i0
    • insert 20 million rows per table
  • l.x
    • create 3 secondary indexes. I usually ignore performance from this step.
  • l.i1
    • insert and delete another 50 million rows per table with secondary index maintenance. The number of rows/table at the end of the benchmark step matches the number at the start with inserts done to the table head and the deletes done from the tail. 
  • q100, q500, q1000
    • do queries as fast as possible with 100, 500 and 1000 inserts/s/client and the same rate for deletes/s done in the background. Run for 1800 seconds.
Results

The Summary sections linked below have tables for absolute and relative throughput. Absolute throughput is just the QPS per version. The relative throughput is (QPS for my version / QPS for base version). The base version is 5.6.21 for the 5.6-only report, 5.7.5 for the 5.7-only report, 8.0.0 for the 8.0-only report and 5.6.21 for the report that compares 5.6, 5.7 and 8.0.

Reports are here for 5.6, 5.7, 8.0, 5.6 thru 8.0.

The summaries linked below have three tables. The first table shows the absolute throughput (queries or writes/s). The second table shows the throughput for a given version relative to MySQL 5.6.21 (a value less than 1.0 means the version is slower). The third table shows the average background insert rate during q100, q500 and q1000. All versions sustained the targets, so that table can be ignored here.

By write-heavy I mean the l.i0, l.x and l.i1 benchmark steps. By read-heavy I mean the q100, q500 and q1000 benchmark steps. Below I use red and green to indicate when the relative QPS is down or up by more than 5% and grey otherwise.

From the Summary for 5.6, 5.7 and 8.0
  • relative QPS is relative to MySQL 5.6.21
  • relative QPS in 5.7.44 is (0.82, 1.35, 1.10) and  (0.730.720.72) for write- and read-heavy
    • There are perf regressions for the l.i0, q100, q500, and q1000 benchmark steps. About half of that is from the 5.6 to 5.7 transition and the rest is a gradual loss during across 5.7. 
    • For the l.x benchmark step perf improves from 5.6 to 5.7 and then gradually declines across 5.7.
    • For the l.i1 benchmark step perf improves from 5.6 to 5.7 and is steady across 5.7
  • relative QPS in 8.0.35 is (0.55, 1.24, 0.89) and (0.62, 0.61, 0.62) for write- and read-heavy
    • Perf in 8.0.0 is close to 5.7.44 and 8.0.0 is the first (pre-GA) 8.0 release
    • There are perf regressions for all benchmark steps. These are gradual across 8.0.
From the Summary for 8.0
  • relative QPS is relative to MySQL 8.0.0
  • relative QPS in 8.0.35 is (0.69, 0.92, 0.84) and (0.84, 0.85, 0.85) for write- and read-heavy
From the Summary for 5.7
  • relative QPS is relative to MySQL 5.7.5
  • relative QPS in 5.7.44 is (0.920.831.02) and (0.880.870.88) for write- and read-heavy
From the Summary for 5.6
  • relative QPS is relative to MySQL 5.6.21
  • relative QPS in 5.6.51 is (0.96, 1.02, 0.97) and (0.98, 0.97, 0.98) for write- and read-heavy