This post has results from the Insert Benchmark for some MySQL 8.0.35 with InnoDB using the Insert Benchmark, a small server and a cached workload. The goal is to document the overhead of the performance schema.
tl;dr
- The perf schema in 5.7 costs <= 4% of QPS
- The perf schema in 8.0 costs ~10% of QPS, except for index create where the cost is much larger. I will try to explain that later.
The cmake files for each of the builds are here for 5.7.44 and for 8.0.35.
By default the my.cnf files I use for MySQL 5.7 and 8.0 have performance_schema=1 but I don't otherwise enable instruments. So the tests here document the overhead from using whatever is enabled by default with the perf schema.
For 8.0.35 there were 5 variants of the build and 2 my.cnf files. I didn't test the full cross product, but there are 7 different combinations I tried that are listed below.
The 8.0.35 builds are:
- my8035_rel
- MySQL 8.0.35 and the rel build with CMAKE_BUILD_TYPE=Release
- my8035_rel_native
- MySQL 8.0.35 and the rel_native build with CMAKE_BUILD_TYPE=Release -march=native -mtune=native
- my8035_rel_native_lto
- MySQL 8.0.35 and the rel_native_lto build with CMAKE_BUILD_TYPE=Release -march=native -mtune=native WITH_LTO=ON
- my8035_rel_less
- MySQL 8.0.35 and the rel_less build with CMAKE_BUILD_TYPE=Release ENABLED_PROFILING=OFF WITH_RAPID=OFF
- my8035_rel_lessps.cz10a_bee
- MySQL 8.0.35 and the rel build with CMAKE_BUILD_TYPE=Release and as much as possible of the perf schema code disabled at compile time. See the cmake files linked above.
- my.cnf.cz10a_bee - my new default my.cnf for the small server
- my.cnf.cz10aps0_bee - same as cz10a_bee except the perf schema is disabled
I tried 7 combinations of build + configuration for 8.0.35:
- my8035_rel.cz10a_bee
- my8035_rel_native.cz10a_bee
- my8035_rel_native_lto.cz10a_bee
- my8035_rel_less.cz10a_bee
- my8035_rel_lessps.cz10a_bee
- my8035_rel.cz10aps0_bee
- my8035_rel_lessps.cz10aps0_bee
- my5744_rel.cz10a_bee - uses the rel build and cz10a_bee my.cnf
- my5744_rel.cz10aps0_bee - uses the rel build and cz10aps0_bee my.cnf that disables the perf schema
The Insert Benchmark was run in one setup - a cached workload.
The benchmark used the Beelink server explained here that has 8 cores, 16G RAM and 1TB of NVMe SSD with XFS and Ubuntu 22.04.The benchmark is run with 1 client and 1 table. The benchmark is a sequence of steps.
- l.i0
- insert 20 million rows per table
- l.x
- create 3 secondary indexes. I usually ignore performance from this step.
- l.i1
- insert and delete another 50 million rows per table with secondary index maintenance. The number of rows/table at the end of the benchmark step matches the number at the start with inserts done to the table head and the deletes done from the tail.
- q100, q500, q1000
- do queries as fast as possible with 100, 500 and 1000 inserts/s/client and the same rate for deletes/s done in the background. Run for 1800 seconds.
- relative QPS is relative to my5744_rel.cz10a_bee
- relative QPS for my5744_rel.cz10aps0_bee is (1.03, 1.12, 1.03) and (1.02, 1.03, 1.04) for write- and read-heavy. So throughput benefit from disabling the performance schema is <= 4% except for the l.x benchmark step that creates indexes.
- relative QPS is relative to my8035_rel.cz10a_bee
- my8035_rel_native.cz10a_bee and my8035_rel_less.cz10a_bee have performance similar to the base case
- my8035_rel_native_lto.cz10a_bee gets between 4% and 9% more QPS than the base case
- builds that disable the perf schema in my.cnf or at compile time get ~1.07X more QPS for l.i0 and l.i1, ~1.5X more throughput for index create and between 1.11X and 1.15X more QPS for read-heavy. I will soon get flamegraphs to explain some of this.
 
 
No comments:
Post a Comment