Monday, December 18, 2023

What is the cost of the performance schema in MySQL 5.7 and 8.0: small server, cached database, simple queries

This post has results from the Insert Benchmark for some MySQL 8.0.35 with InnoDB using the Insert Benchmark, a small server and a cached workload. The goal is to document the overhead of the performance schema.

tl;dr

  • The perf schema in 5.7 costs <= 4% of QPS
  • The perf schema in 8.0 costs ~10% of QPS, except for index create where the cost is much larger. I will try to explain that later. 
Build + configuration

I used InnoDB with MySQL 8.0.35 and 5.7.44.

The cmake files for each of the builds are here for 5.7.44 and for 8.0.35

By default the my.cnf files I use for MySQL 5.7 and 8.0 have performance_schema=1 but I don't otherwise enable instruments. So the tests here document the overhead from using whatever is enabled by default with the perf schema.

For 8.0.35 there were 5 variants of the build and 2 my.cnf files. I didn't test the full cross product, but there are 7 different combinations I tried that are listed below.

The 8.0.35 builds are: 

  • my8035_rel
    • MySQL 8.0.35 and the rel build with CMAKE_BUILD_TYPE=Release
  • my8035_rel_native
    • MySQL 8.0.35 and the rel_native build with CMAKE_BUILD_TYPE=Release -march=native -mtune=native
  • my8035_rel_native_lto
    • MySQL 8.0.35 and the rel_native_lto build with CMAKE_BUILD_TYPE=Release -march=native -mtune=native WITH_LTO=ON
  • my8035_rel_less
    • MySQL 8.0.35 and the rel_less build with CMAKE_BUILD_TYPE=Release ENABLED_PROFILING=OFF WITH_RAPID=OFF
  • my8035_rel_lessps.cz10a_bee
    • MySQL 8.0.35 and the rel build with CMAKE_BUILD_TYPE=Release and as much as possible of the perf schema code disabled at compile time. See the cmake files linked above.
The 8.0.35 my.cnf files:

I tried 7 combinations of build + configuration for 8.0.35:

  • my8035_rel.cz10a_bee
  • my8035_rel_native.cz10a_bee
  • my8035_rel_native_lto.cz10a_bee
  • my8035_rel_less.cz10a_bee
  • my8035_rel_lessps.cz10a_bee
  • my8035_rel.cz10aps0_bee
  • my8035_rel_lessps.cz10aps0_bee
For 5.7.44 I tried two combinations of build + configuration
  • my5744_rel.cz10a_bee - uses the rel build and cz10a_bee my.cnf
  • my5744_rel.cz10aps0_bee - uses the rel build and cz10aps0_bee my.cnf that disables the perf schema

Benchmark

The Insert Benchmark was run in one setup - a cached workload.

The benchmark used the Beelink server explained here that has 8 cores, 16G RAM and 1TB of NVMe SSD with XFS and Ubuntu 22.04. 

The benchmark is run with 1 client and 1 table. The benchmark is a sequence of steps.

  • l.i0
    • insert 20 million rows per table
  • l.x
    • create 3 secondary indexes. I usually ignore performance from this step.
  • l.i1
    • insert and delete another 50 million rows per table with secondary index maintenance. The number of rows/table at the end of the benchmark step matches the number at the start with inserts done to the table head and the deletes done from the tail. 
  • q100, q500, q1000
    • do queries as fast as possible with 100, 500 and 1000 inserts/s/client and the same rate for deletes/s done in the background. Run for 1800 seconds.
Results

Reports are here for 5.7.44 and for 8.0.35.

The Summary sections linked below have tables for absolute and relative throughput. The relative throughput is (QPS for my version / QPS for base version). The base version is my5744_rel.cz10a_bee for the 5.7.44 report and my8035_rel.cz10a_bee for the 8.0.35 report.

The summaries linked below have three tables. The first table shows the absolute throughput (queries or writes/s). The second table shows the throughput for a given version relative to MySQL 5.6.21 (a value less than 1.0 means the version is slower). The third table shows the average background insert rate during q100, q500 and q1000. All versions sustained the targets, so that table can be ignored here.

By write-heavy I mean the l.i0, l.x and l.i1 benchmark steps. By read-heavy I mean the q100, q500 and q1000 benchmark steps. Below I use red and green to indicate when the relative QPS is down or up by more than 5% and grey otherwise.

From the Summary for 5.7.44
  • relative QPS is relative to my5744_rel.cz10a_bee
  • relative QPS for my5744_rel.cz10aps0_bee is (1.03, 1.12, 1.03) and (1.02, 1.03, 1.04) for write- and read-heavy. So throughput benefit from disabling the performance schema is <= 4% except for the l.x benchmark step that creates indexes.
From the Summary for 8.0.35
  • relative QPS is relative to my8035_rel.cz10a_bee
  • my8035_rel_native.cz10a_bee and my8035_rel_less.cz10a_bee have performance similar to the base case
  • my8035_rel_native_lto.cz10a_bee gets between 4% and 9% more QPS than the base case
  • builds that disable the perf schema in my.cnf or at compile time get ~1.07X more QPS for l.i0 and l.i1, ~1.5X more throughput for index create and between 1.11X and 1.15X more QPS for read-heavy. I will soon get flamegraphs to explain some of this.

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...