Small Datum: Perf regressions in MyRocks, a small server & sysbench

I used sysbench to search for performance regressions from old MyRocks (5.6.35) to modern MyRocks (8.0.28) and to determine the impact of compiler optimizations because I build it from source. The context for the results is short-running queries, in-memory (cached by MyRocks) with low-concurrency (1 & 4 clients) on a small server (8-core AMD).

tl;dr:

For MyRocks 5.6.35 the rel build has the best performance
For MyRocks 8.0.28 the rel_native_lto build has the best performance. The largest improvement is from link time optimization.
MyRocks 8.0.28 gets ~10% less throughput than 8.0.28 for short-running queries. The cause is more CPU/query. Much of the regression appears to be above the MySQL storage engine layer because the regressions from 5.6 to 8.0 are even larger for InnoDB than for MyRocks -- 25% or more with upstream MySQL/InnoDB vs 10% here.
The microbenchmarks with the largest regressions from 5.6 to 8.0 are random-points (select 1000 rows via in-list), insert and scan. Explaining these has been added to my TODO list although the problem with random-points is probably bug 102037 (fixed upstream in 8.0.31). See the Results: all versions section for more detail.

Benchmark

A description of how I run sysbench is here. Tests use the Beelink server (8-core AMD, 16G RAM, NVMe SSD). The sysbench tests were run for 600 seconds per microbenchmark using 1 table with 20M rows. All tests use the MyRocks storage engine. The test database fits in the MyRocks buffer pool. The benchmark was repeated for 1 and 4 clients.

I used a similar configuration (my.cnf) for all versions which is here for 5.6.35 and 8.0.28.

Builds

I tested MyRocks in FB MySQL versions 5.6.35 and 8.0.28 using multiple builds for each version. For each build+version the full set of sysbench microbenchmarks was repeated.

Compiler options tested by the builds include:

-O2 vs -O3
link time optimization via -flto
CPU specific tuning via -march=native -mtune=native
CMAKE_BUILD_TYPE set to RelWithDebInfo vs Release (see here)

The possible builds are:

rel_withdbg

CMAKE_BUILD_TYPE=RelWithDebInfo which implies -O2 -flto (this gets link time optimization by default, unlike Release)

CMAKE_BUILD_TYPE=Release which implies -O3

rel_o2

CMAKE_BUILD_TYPE=Release, forces -O2

rel_native

CMAKE_BUILD_TYPE=Release which implies -O3, adds -march=native -mtune=native

rel_o2_lto

CMAKE_BUILD_TYPE=Release, forces -O2, adds -flto for link time optimization

rel_native_lto

CMAKE_BUILD_TYPE=Release which implies -O3, adds -march=native -mtune=native, adds -flto for link time optimization

rel_lto

CMAKE_BUILD_TYPE=Release which implies -O3, adds -flto for link time optimization

For MyRocks 5.6.35 I tested these builds: rel, rel_o2, rel_withdbg. The command line for cmake, output from cmake and output from make is here.

For MyRocks 8.0.28 I tested these builds: rel_withdbg, rel_o2, rel_native, rel, rel_o2_lto, rel_native_lto, rel_lto. The command line for cmake, output from cmake and output from make is here.

Results: per-version

The result spreadsheet is here.

The graphs use relative throughput which is throughput for me / throughput for base case. When the relative throughput is > 1 then my results are better than the base case. When it is 1.10 then my results are ~10% better than the base case. The base case is the rel_withdbg build for 5.6.35 and 8.0.28.

There are three graphs per version which group the microbenchmarks by the dominant operation: one for point queries, one for range queries, one for writes.

Disclaimers:

Readability is much better via the spreadsheet so I did not make the graphs x-large here.
For most of the graphs the axis with values doesn't start at 0 to improve readability

For MyRocks 5.6.35 with 1 client the throughput median for the rel build relative to rel_withdbg is 1.03 for point, 1.03 for range, 1.00 for writes.

For MyRocks 5.6.35 with 4 clients the throughput median for the rel build relative to rel_withdbg is 1.01 for point, 1.03 for range, 1.01 for writes.

For MyRocks 8.0.28 with 1 client the throughput median for the rel_native_lto build relative to rel_withdbg is 1.08 for point, 1.08 for range, 1.08 for writes.

For MyRocks 8.0.28 with 4 clients the throughput median for the rel_native_lto build relative to rel_withdbg is 1.07 for point, 1.11 for range, 1.08 for writes.

Results: all versions

These have results for MyRocks versions 5.6.35 and 8.0.28 on one graph using the rel build for 5.6.35 and the rel_native_lto build for 8.0.28. The result spreadsheet is here.

There are regressions (more CPU/query) in MySQL releases from 5.6 to 8.0 and most appear to be above the storage engine level because the regressions here are not as bad as the results for upstream MySQL with InnoDB.

This table shows the median throughput for MyRocks 8.0.28 relative to 5.6.35 for the 1-client and 4-client benchmarks.

	1-client	4-clients
Point	0.85	0.90
Range	0.90	0.98
Write	0.90	0.91

The microbenchmarks with the largest regressions from 5.6.35 to 8.0.28 are:

	1-client	4-clients
random-points.pre_range=1000	0.43	0.45
random-points_range=1000	0.45	0.50
scan_range=100	0.78	0.84
insert_range=100	0.70	0.73

For the microbenchmarks with the largest regression, I will do more to explain these in a future post:

random-points - the Lua file is oltp_inlist_select.lua and the SQL is here. The query is a SELECT statement with 1000 values in the in-list to fetch rows by an exact match on an index. My first guess is that this is from the optimizer doing more index dives for 8.0.28 than for 5.6.35 as I filed bug 91139 and blogged about this in 2017. However, the my.cnf I use have eq_range_index_dive_limit=10 so I have yet to explain this. Then I remembered that I reported another bug for the same microbenchmark that arrived around 8.0.22 and was fixed in 8.0.31 -- see bug 102037. I don't think MyRocks 8.0.28 has that fix yet.
scan - the Lua file is oltp_scan.lua and the SQL is here. The query is written to filter all rows via the WHERE clause (nothing matches). So it isn't clear whether the regression is from the storage engine or the MySQL code that evaluates the WHERE clause.
insert - the Lua file is oltp_insert.lua and the SQL is here

There are three graphs per version which group the microbenchmarks by the dominant operation: one for point queries, one for range queries, one for writes.

First the graphs for 1 client (1 thread).

And then the graphs for 4 clients (4 threads).

Summary statistics: per version

These are computed for the throughput relative to the rel_withdbg build.

For MyRocks 5.6.35 with 1 client

rel_withdbg	rel_o2	rel
Point: avg	1.01	1.09
Point: median	0.99	1.03
Point: min	0.97	0.99
Point: max	1.24	1.74
Point: stddev	0.071	0.193

Range: avg	0.99	1.04
Range: median	0.99	1.03
Range: min	0.93	1.00
Range: max	1.04	1.16
Range: stddev	0.024	0.038

Write: avg	0.99	1.00
Write: median	1.00	1.00
Write: min	0.96	0.96
Write: max	1.02	1.02
Write: stddev	0.018	0.019

For MyRocks 5.6.35 with 4 clients

rel_withdbg	rel_o2	rel
Point: avg	1.03	1.01
Point: median	1.00	1.01
Point: min	0.98	0.99
Point: max	1.29	1.04
Point: stddev	0.079	0.015

Range: avg	0.99	1.02
Range: median	0.99	1.03
Range: min	0.89	0.98
Range: max	1.06	1.06
Range: stddev	0.036	0.022

Write: avg	1.00	1.01
Write: median	1.00	1.01
Write: min	0.99	0.99
Write: max	1.01	1.02
Write: stddev	0.008	0.008

For MyRocks 8.0.28 with 1 client

rel_withdbg	rel_o2	rel_native	rel	rel_o2_lto	rel_native_lto	rel_lto
Point: avg	1.00	1.01	1.01	1.04	1.08	1.10
Point: median	1.00	1.01	1.01	1.03	1.08	1.09
Point: min	0.96	0.99	0.99	0.97	0.99	1.05
Point: max	1.03	1.03	1.04	1.10	1.18	1.26
Point: stddev	0.020	0.011	0.013	0.036	0.042	0.051

Range: avg	1.00	1.01	1.02	1.05	1.09	1.08
Range: median	1.00	1.01	1.02	1.04	1.08	1.08
Range: min	0.96	0.99	1.00	0.98	1.07	1.06
Range: max	1.02	1.04	1.05	1.11	1.12	1.12
Range: stddev	0.015	0.013	0.017	0.033	0.014	0.018

Write: avg	1.01	1.01	1.01	1.06	1.08	1.07
Write: median	1.00	1.01	1.02	1.06	1.08	1.07
Write: min	0.99	0.99	0.99	1.03	1.04	1.04
Write: max	1.03	1.02	1.03	1.08	1.10	1.10
Write: stddev	0.011	0.011	0.013	0.014	0.018	0.018

For MyRocks 8.0.28 with 4 clients

rel_withdbg	rel_o2	rel_native	rel	rel_o2_lto	rel_native_lto	rel_lto
Point: avg	0.99	1.00	0.99	1.02	1.06	1.05
Point: median	1.01	1.02	1.01	1.04	1.07	1.08
Point: min	0.80	0.77	0.75	0.76	0.82	0.78
Point: max	1.01	1.04	1.03	1.09	1.14	1.16
Point: stddev	0.057	0.071	0.074	0.083	0.077	0.089

Range: avg	1.00	1.02	1.02	1.04	1.10	1.08
Range: median	1.01	1.02	1.03	1.05	1.11	1.08
Range: min	0.96	0.95	0.95	0.97	1.06	1.03
Range: max	1.03	1.03	1.05	1.08	1.12	1.13
Range: stddev	0.016	0.021	0.026	0.029	0.024	0.030

Write: avg	1.01	1.00	1.01	1.05	1.08	1.07
Write: median	1.01	1.00	1.01	1.06	1.08	1.07
Write: min	1.00	0.99	0.99	1.02	1.04	1.04
Write: max	1.02	1.02	1.02	1.08	1.10	1.08
Write: stddev	0.007	0.009	0.010	0.017	0.018	0.014

Summary statistics: per version

These are computed for the throughput from MyRocks 8.0.28 with the rel_native_lto build relative to the rel build in MyRocks 5.6.35

1 client (1 thread)

5635_rel	8028_rel_native_lto
Point: avg	0.79
Point: median	0.85
Point: min	0.43
Point: max	0.97
Point: stddev	0.167

Range: avg	0.91
Range: median	0.90
Range: min	0.78
Range: max	1.04
Range: stddev	0.073

Write: avg	0.87
Write: median	0.90
Write: min	0.70
Write: max	0.92
Write: stddev	0.064

4 clients (4 threads)

5635_rel	8028_rel_native_lto
Point: avg	0.86
Point: median	0.90
Point: min	0.45
Point: max	1.01
Point: stddev	0.159

Range: avg	0.97
Range: median	0.98
Range: min	0.84
Range: max	1.08
Range: stddev	0.060

Write: avg	0.89
Write: median	0.91
Write: min	0.73
Write: max	0.99
Write: stddev	0.077

Small Datum

Wednesday, March 29, 2023

Perf regressions in MyRocks, a small server & sysbench

No comments:

Post a Comment

MariaDB innovation: vector index performance