Wednesday, March 29, 2023

Perf regressions in MyRocks, a small server & sysbench

I used sysbench to search for performance regressions from old MyRocks (5.6.35) to modern MyRocks (8.0.28) and to determine the impact of compiler optimizations because I build it from source. The context for the results is short-running queries, in-memory (cached by MyRocks) with low-concurrency (1 & 4 clients) on a small server (8-core AMD).

tl;dr:

  • For MyRocks 5.6.35 the rel build has the best performance
  • For MyRocks 8.0.28 the rel_native_lto build has the best performance. The largest improvement is from link time optimization.
  • MyRocks 8.0.28 gets ~10% less throughput than 8.0.28 for short-running queries. The cause is more CPU/query. Much of the regression appears to be above the MySQL storage engine layer because the regressions from 5.6 to 8.0 are even larger for InnoDB than for MyRocks -- 25% or more with upstream MySQL/InnoDB vs 10% here.
  • The microbenchmarks with the largest regressions from 5.6 to 8.0 are random-points (select 1000 rows via in-list), insert and scan. Explaining these has been added to my TODO list although the problem with random-points is probably bug 102037 (fixed upstream in 8.0.31). See the Results: all versions section for more detail. 

Benchmark

A description of how I run sysbench is here. Tests use the Beelink server (8-core AMD, 16G RAM, NVMe SSD). The sysbench tests were run for 600 seconds per microbenchmark using 1 table with 20M rows. All tests use the MyRocks storage engine. The test database fits in the MyRocks buffer pool.  The benchmark was repeated for 1 and 4 clients.

I used a similar configuration (my.cnf) for all versions which is here for 5.6.35 and 8.0.28.

Builds

I tested MyRocks in FB MySQL versions 5.6.35 and 8.0.28 using multiple builds for each version. For each build+version the full set of sysbench microbenchmarks was repeated.

Compiler options tested by the builds include:
  • -O2 vs -O3
  • link time optimization via -flto
  • CPU specific tuning via -march=native -mtune=native
  • CMAKE_BUILD_TYPE set to RelWithDebInfo vs Release (see here)
The possible builds are:
  • rel_withdbg
    • CMAKE_BUILD_TYPE=RelWithDebInfo which implies -O2 -flto (this gets link time optimization by default, unlike Release)
  • rel
    • CMAKE_BUILD_TYPE=Release which implies -O3
  • rel_o2
    • CMAKE_BUILD_TYPE=Release, forces -O2
  • rel_native
    • CMAKE_BUILD_TYPE=Release which implies -O3, adds -march=native -mtune=native
  • rel_o2_lto
    • CMAKE_BUILD_TYPE=Release, forces -O2, adds -flto for link time optimization
  • rel_native_lto
    • CMAKE_BUILD_TYPE=Release which implies -O3, adds -march=native -mtune=native, adds -flto for link time optimization
  • rel_lto
    • CMAKE_BUILD_TYPE=Release which implies -O3, adds -flto for link time optimization
For MyRocks 5.6.35 I tested these builds: rel, rel_o2, rel_withdbg. The command line for cmake, output from cmake and output from make is here.

For MyRocks 8.0.28 I tested these builds: rel_withdbg, rel_o2, rel_native, rel, rel_o2_lto, rel_native_lto, rel_lto. The command line for cmake, output from cmake and output from make is here

Results: per-version

The result spreadsheet is here.

The graphs use relative throughput which is throughput for me / throughput for base case. When the relative throughput is > 1 then my results are better than the base case. When it is 1.10 then my results are ~10% better than the base case. The base case is the rel_withdbg build for 5.6.35 and 8.0.28.

There are three graphs per version which group the microbenchmarks by the dominant operation: one for point queries, one for range queries, one for writes. 

Disclaimers:
  • Readability is much better via the spreadsheet so I did not make the graphs x-large here. 
  • For most of the graphs the axis with values doesn't start at 0 to improve readability
For MyRocks 5.6.35 with 1 client the throughput median for the rel build relative to rel_withdbg is 1.03 for point, 1.03 for range, 1.00 for writes.
For MyRocks 5.6.35 with 4 clients the throughput median for the rel build relative to rel_withdbg is 1.01 for point, 1.03 for range, 1.01 for writes.
For MyRocks 8.0.28 with 1 client the throughput median for the rel_native_lto build relative to rel_withdbg is 1.08 for point, 1.08 for range, 1.08 for writes.
For MyRocks 8.0.28 with 4 clients the throughput median for the rel_native_lto build relative to rel_withdbg is 1.07 for point, 1.11 for range, 1.08 for writes.
Results: all versions

These have results for MyRocks versions 5.6.35 and 8.0.28 on one graph using the rel build for 5.6.35 and the rel_native_lto build for 8.0.28. The result spreadsheet is here.

The graphs use relative throughput which is throughput for me / throughput for base case. When the relative throughput is > 1 then my results are better than the base case. When it is 1.10 then my results are ~10% better than the base case. The base case is the rel build with MyRocks 5.6.35.

There are regressions (more CPU/query) in MySQL releases from 5.6 to 8.0 and most appear to be above the storage engine level because the regressions here are not as bad as the results for upstream MySQL with InnoDB.

This table shows the median throughput for MyRocks 8.0.28 relative to 5.6.35 for the 1-client and 4-client benchmarks.

1-client4-clients
Point0.850.90
Range0.900.98
Write0.900.91

The microbenchmarks with the largest regressions from 5.6.35 to 8.0.28 are:

1-client4-clients
random-points.pre_range=10000.430.45
random-points_range=10000.450.50
scan_range=1000.780.84
insert_range=1000.700.73

For the microbenchmarks with the largest regression, I will do more to explain these in a future post:
  • random-points - the Lua file is oltp_inlist_select.lua and the SQL is here. The query is a SELECT statement with 1000 values in the in-list to fetch rows by an exact match on an index. My first guess is that this is from the optimizer doing more index dives for 8.0.28 than for 5.6.35 as I filed bug 91139 and blogged about this in 2017. However, the my.cnf I use have eq_range_index_dive_limit=10 so I have yet to explain this. Then I remembered that I reported another bug for the same microbenchmark that arrived around 8.0.22 and was fixed in 8.0.31 -- see bug 102037. I don't think MyRocks 8.0.28 has that fix yet.
  • scan - the Lua file is oltp_scan.lua and the SQL is here. The query is written to filter all rows via the WHERE clause (nothing matches). So it isn't clear whether the regression is from the storage engine or the MySQL code that evaluates the WHERE clause.
  • insert - the Lua file is oltp_insert.lua and the SQL is here
There are three graphs per version which group the microbenchmarks by the dominant operation: one for point queries, one for range queries, one for writes.

First the graphs for 1 client (1 thread).
And then the graphs for 4 clients (4 threads).
Summary statistics: per version

These are computed for the throughput relative to the rel_withdbg build. 

For MyRocks 5.6.35 with 1 client

rel_withdbgrel_o2rel
Point: avg1.011.09
Point: median0.991.03
Point: min0.970.99
Point: max1.241.74
Point: stddev0.0710.193
Range: avg0.991.04
Range: median0.991.03
Range: min0.931.00
Range: max1.041.16
Range: stddev0.0240.038
Write: avg0.991.00
Write: median1.001.00
Write: min0.960.96
Write: max1.021.02
Write: stddev0.0180.019

For MyRocks 5.6.35 with 4 clients

rel_withdbgrel_o2rel
Point: avg1.031.01
Point: median1.001.01
Point: min0.980.99
Point: max1.291.04
Point: stddev0.0790.015
Range: avg0.991.02
Range: median0.991.03
Range: min0.890.98
Range: max1.061.06
Range: stddev0.0360.022
Write: avg1.001.01
Write: median1.001.01
Write: min0.990.99
Write: max1.011.02
Write: stddev0.0080.008

For MyRocks 8.0.28 with 1 client

rel_withdbgrel_o2rel_nativerelrel_o2_ltorel_native_ltorel_lto
Point: avg1.001.011.011.041.081.10
Point: median1.001.011.011.031.081.09
Point: min0.960.990.990.970.991.05
Point: max1.031.031.041.101.181.26
Point: stddev0.0200.0110.0130.0360.0420.051
Range: avg1.001.011.021.051.091.08
Range: median1.001.011.021.041.081.08
Range: min0.960.991.000.981.071.06
Range: max1.021.041.051.111.121.12
Range: stddev0.0150.0130.0170.0330.0140.018
Write: avg1.011.011.011.061.081.07
Write: median1.001.011.021.061.081.07
Write: min0.990.990.991.031.041.04
Write: max1.031.021.031.081.101.10
Write: stddev0.0110.0110.0130.0140.0180.018

For MyRocks 8.0.28 with 4 clients

rel_withdbgrel_o2rel_nativerelrel_o2_ltorel_native_ltorel_lto
Point: avg0.991.000.991.021.061.05
Point: median1.011.021.011.041.071.08
Point: min0.800.770.750.760.820.78
Point: max1.011.041.031.091.141.16
Point: stddev0.0570.0710.0740.0830.0770.089
Range: avg1.001.021.021.041.101.08
Range: median1.011.021.031.051.111.08
Range: min0.960.950.950.971.061.03
Range: max1.031.031.051.081.121.13
Range: stddev0.0160.0210.0260.0290.0240.030
Write: avg1.011.001.011.051.081.07
Write: median1.011.001.011.061.081.07
Write: min1.000.990.991.021.041.04
Write: max1.021.021.021.081.101.08
Write: stddev0.0070.0090.0100.0170.0180.014

Summary statistics: per version

These are computed for the throughput from MyRocks 8.0.28 with the rel_native_lto build relative to the rel build in MyRocks 5.6.35

1 client (1 thread)

5635_rel8028_rel_native_lto
Point: avg0.79
Point: median0.85
Point: min0.43
Point: max0.97
Point: stddev0.167
Range: avg0.91
Range: median0.90
Range: min0.78
Range: max1.04
Range: stddev0.073
Write: avg0.87
Write: median0.90
Write: min0.70
Write: max0.92
Write: stddev0.064

4 clients (4 threads)

5635_rel8028_rel_native_lto
Point: avg0.86
Point: median0.90
Point: min0.45
Point: max1.01
Point: stddev0.159
Range: avg0.97
Range: median0.98
Range: min0.84
Range: max1.08
Range: stddev0.060
Write: avg0.89
Write: median0.91
Write: min0.73
Write: max0.99
Write: stddev0.077

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...