Monday, July 8, 2024

MyRocks vs InnoDB on cached sysbench

This post compares MyRocks and InnoDB using sysbench with a cached database. The first goal is to understand the differences in CPU overhead between MyRocks and InnoDB. The second goal is to identify things that can be improved in MyRocks.

tl;dr

  • There are small regressions from MyRocks 5.6 to 8.0, usually <= 10%. I assume most of this is from changes in upstream MySQL that are in code above the storage engine layer.
  • MyRocks uses more CPU than InnoDB on cached sysbench, thus InnoDB gets more QPS. While many conference papers explain IO read amplification as an issue for an LSM, the more likely issue is CPU read amplification as shown in the results here.
  • There are a few yet to be explained issues that hurt MyRocks write throughput on a 2-socket server
  • There is one microbenchmark for which MyRocks was (almost) always much faster than InnoDB -- update-index. The reason is that non-unique secondary index maintenance is read free with MyRocks (all done via a few Put() operations).
Updates:
  • changing the CPU frequency governor makes QPS ~2X better on the worst-case results with the Socket2 server (see here)
Builds

I compiled both upstream and FB MySQL from source and tested the following builds:
  • MySQL 8.0.37 - I used InnoDB from upstream MySQL 8.0.37
  • MyRocks 5.6.35 - FB MyRocks 5.6.35 at git sha 4f3a57a1 (as of 23/10/16) with RocksDB 8.7.0
  • MyRocks 8.0.28 - FB MyRocks 8.0.28 at git sha 4edf1eec_(as of 23/12/02) with RocksDB 8.7.0
  • MyRocks 8.0.32 - FB MyRocks 8.0.32 at git sha 49b37dfe (as of 24/05/29) with RocksDB 9.2.1

Hardware

I tested on five servers:
  • Small servers
    • SER4 - Beelink SER 4700u (see here) with 8 cores and a Ryzen 7 4700u CPU 
    • PN53 - ASUS ExpertCenter PN53 (see here) with 8 cores and an AMD Ryzen 7 7735HS CPU. The CPU on the PN53 is newer than the CPU on the SER4.
  • Medium server
    • C2D - a c2d-highcpu-32 instance type on GCP (c2d high-CPU) with 32 vCPU and 16 cores
  • Big servers
    • Socket2 - SuperMicro SuperWorkstation (see here) with 2-sockets, 12 cores/socket, 64G RAM and ext4 (SW RAID 0 over 2 NVMe devices)
    • Dell32 - Dell Precision 7865 Tower (see here) with a 32-corAMD Ryzen Threadripper PRO 5975WX, 128G RAM and ext4 (SW RAID 0 over 2 NVMe devices)
All servers use Ubuntu 22.04 with ext4. AMD SMT and Intel HT are disabled in all cases.

Benchmark

I used sysbench and my usage is explained here. There are 42 microbenchmarks and most test only 1 type of SQL statement. The database is cached by MyRocks and InnoDB.

The benchmark is run with:
  • SER4, PN53 - 1 thread, 1 table and 30M rows
  • C2D - 12 threads, 8 tables and 10M rows per table
  • Socket2 - 16 threads, 8 tables and 10M rows per table
  • Dell32 - 24 threads, 8 tables and 10M rows per table
  • each microbenchmark runs for 300 seconds if read-only and 600 seconds otherwise
  • prepared statements were enabled
The command lines for my helper scripts were:
# PN53, SER4 -> 1 client
bash r.sh 1 30000000 300 600 nvme0n1 1 1 1
# C2D -> 12 clients
bash r.sh 8 10000000 300 600 md0 1 1 12
# Socket2 -> 16 clients
bash r.sh 8 10000000 300 600 md0 1 1 16
# Dell32 -> 24 clients
bash r.sh 8 10000000 300 600 md0 1 1 24

Results

For the results below I split the 42 microbenchmarks into 5 groups -- 2 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. The spreadsheet with all data is here. For each microbenchmark group there is a table with summary statistics. I don't have charts because that would use too much space, but the results per microbenchmark are in the spreadsheets.

The numbers in the spreadsheets and the tables below are the relative QPS which is (QPS for my version) / (QPS for base case). When the relative throughput is > 1 then that version is faster than the base case.

For all results below the base case is InnoDB from MySQL 8.0.37.

Results: SER4

This server has 8 cores. The benchmark is run with 1 client (thread, connection). The numbers in the tables are the relative QPS as explained above.

Summary:
  • There are regressions in MyRocks from 5.6.35 to 8.0. I assume most are from upstream in code above the storage engine layer. Regressions are largest for writes.
  • The worst-case for MyRocks 8.0.32 vs InnoDB
    • point-1 - (0.53) points-covered-si_range=100
    • point-2 - (0.67) random-points_range=100
    • range-1 - (0.46) scan_range=100
    • range-2 - (0.63) read-only_range=100
    • writes - (0.59) update-inlist_range=100
  • The best-case for MyRocks 8.0.32 vs InnoDB
    • Note that all of the best-case results occur for the pre microbenchmarks that are run before the write-heavy tests which fragment the LSM tree.
    • point-1 - (0.77) point-query.pre_range=100
    • point-2 - (0.78) random-points.pre_range=10
    • range-1 - (0.65) range-notcovered-si.pre_range=100
    • range-2 - (0.71) read-only.pre_range=10
    • writes - (1.80) update-index_range=100
myrocks-5.6.35minmaxavgmedian
point-10.441.000.750.75
point-20.720.860.800.81
range-10.580.760.670.68
range-20.680.890.770.77
writes0.712.391.080.95
myrocks-8.0.28minmaxavgmedian
point-10.410.780.640.66
point-20.360.770.590.66
range-10.460.650.570.57
range-20.640.750.680.68
writes0.611.920.880.78
myrocks-8.0.32minmaxavgmedian
point-10.530.770.690.71
point-20.670.780.720.73
range-10.460.650.560.57
range-20.630.710.660.66
writes0.591.800.830.73

Results: PN53

This server has 8 cores. The benchmark is run with 1 client (thread, connection). The numbers in the tables are the relative QPS as explained above.

Summary:
  • There are regressions in MyRocks from 5.6.35 to 8.0. I assume most are from upstream in code above the storage engine layer. Regressions are largest for writes.
  • The worst-case for MyRocks 8.0.32 vs InnoDB
    • point-1 - (0.47) points-covered-si_range=100
    • point-2 - (0.67) random-points_range=100
    • range-1 - (0.45) scan_range=100 (0.45)
    • range-2 - (0.70) read-only_range=100
    • writes - (0.65) update-inlist_range=100
  • The best-case for MyRocks 8.0.32 vs InnoDB
    • point-1 - (0.79) point-query.pre_range=100, points-covered-pk.pre_range=100, points-notcovered-pk.pre_range=100
    • point-2 - (0.80) random-points.pre_range=1000
    • range-1 - (0.70) range-notcovered-si.pre_range=100
    • range-2 - (0.78) read-only.pre_range=10
    • writes - (1.54) update-index_range=100
myrocks-5.6.35minmaxavgmedian
point-10.660.950.810.80
point-20.740.870.800.78
range-10.590.730.680.71
range-20.730.870.790.78
writes0.781.831.010.90
myrocks-8.0.28minmaxavgmedian
point-10.380.830.610.63
point-20.350.780.560.59
range-10.480.710.620.62
range-20.710.800.750.75
writes0.661.600.870.79
myrocks-8.0.32minmaxavgmedian
point-10.470.790.700.73
point-20.670.800.740.74
range-10.450.700.600.61
range-20.700.780.730.73
writes0.651.540.840.76

Results: C2D

This server has 16 cores. The benchmark is run with 12 clients (threads, connections). The numbers in the tables are the relative QPS as explained above.

Summary:
  • Regressions from MyRocks 5.6.35 to MyRocks 8.0 are smaller here (with more concurrency) than for the small servers (with low concurrency)
  • The worst-case for MyRocks 8.0.32 vs InnoDB
    • point-1 - (0.46) points-covered-si_range=100
    • point-2 - (0.57) random-points_range=10
    • range-1 - (0.35) scan_range=100
    • range-2 - (0.60) read-only_range=100
    • writes - (0.54) delete_range=100
  • The best-case for MyRocks 8.0.32 vs InnoDB
    • point-1 - (0.77) point-query.pre_range=100, points-covered-pk.pre_range=100, points-notcovered-pk.pre_range=100
    • point-2 - (0.78) random-points.pre_range=10
    • range-1 - (0.70) range-notcovered-si.pre_range=100
    • range-2 - (0.72) read-only.pre_range=10
    • writes - (1.94) update-index_range=100
myrocks-5.6.35minmaxavgmedian
point-10.430.940.640.67
point-20.520.780.620.57
range-10.460.650.510.49
range-20.580.730.640.63
writes0.651.870.860.72
myrocks-8.0.28minmaxavgmedian
point-10.440.820.620.67
point-20.350.800.580.60
range-10.360.710.590.63
range-20.640.740.680.68
writes0.671.980.890.77
myrocks-8.0.32minmaxavgmedian
point-10.460.770.650.71
point-20.570.780.690.73
range-10.350.700.580.60
range-20.600.720.650.64
writes0.541.940.840.74

Results: Socket2

This server has 24 cores. The benchmark is run with 16 clients (threads, connections). The numbers in the tables are the relative QPS as explained above.

Updates:
  • changing the CPU frequency governor makes QPS ~2X better on the worst-case results with the Socket2 server (see here)
Summary:
  • Regressions from MyRocks 5.6.35 to MyRocks 8.0 are smaller here (with more concurrency) than for the small servers (with low concurrency)
  • MyRocks does much worse for writes relative to InnoDB here (large server) than above on the small and medium servers. The update-index microbenchmark is much faster in MyRocks than InnoDB above and below on the Dell32 server, but here no the only 2-socket server it is slower.That will be explained in a future post.
  • The worst-case for MyRocks 8.0.32 vs InnoDB
    • point-1 - (0.48) points-covered-si_range=100
    • point-2 - (0.62) random-points_range=100
    • range-1 - (0.48) scan_range=100
    • range-2 - (0.71) read-only_range=100
    • writes - (0.24) insert_range=100
  • The best-case for MyRocks 8.0.32 vs InnoDB
    • point-1 - (0.83) points-notcovered-pk.pre_range=100
    • point-2 - (0.86) random-points.pre_range=1000
    • range-1 - (0.77) range-notcovered-si.pre_range=100
    • range-2 - (0.81) read-only.pre_range=10000
    • writes - (0.78) write-only_range=10000
myrocks-5.6.35minmaxavgmedian
point-10.520.930.730.78
point-20.590.870.710.67
range-10.550.780.630.62
range-20.670.760.720.72
writes0.280.780.450.31
myrocks-8.0.28minmaxavgmedian
point-10.490.830.680.76
point-20.440.810.640.62
range-10.510.790.640.65
range-20.730.810.770.77
writes0.250.810.450.30
myrocks-8.0.32minmaxavgmedian
point-10.480.830.690.75
point-20.620.860.730.72
range-10.480.770.620.61
range-20.710.810.750.75
writes0.240.780.440.32

Results: Dell 32 

This server has 32 cores. The benchmark is run with 24 clients (threads, connections). The numbers in the tables are the relative QPS as explained above.

Summary:
  • Regressions from MyRocks 5.6.35 to MyRocks 8.0 are smaller here (with more concurrency) than for the small servers (with low concurrency)
  • MyRocks does much worse for writes relative to InnoDB here (large server) than above on the small and medium servers. That will be explained in a future post.
  • The worst-case for MyRocks 8.0.32 vs InnoDB
    • point-1 - (0.44) points-notcovered-si_range=100
    • point-2 - (0.49) random-points_range=100, random-points_range=1000
    • range-1 - (0.44) scan_range=100
    • range-2 - (0.67) read-only_range=100
    • writes - (0.41) update-inlist_range=100
  • The best-case for MyRocks 8.0.32 vs InnoDB
    • point-1 - (0.80) point-query.pre_range=100, points-covered-si.pre_range=100
    • point-2 - (0.80) random-points.pre_range=1000
    • range-1 - (0.73) range-notcovered-si.pre_range=100
    • range-2 - (0.78) read-only.pre_range=10000
    • writes - (1.21) update-index_range=100
myrocks-5.6.35minmaxavgmedian
point-10.440.910.650.73
point-20.470.800.620.60
range-10.430.720.570.56
range-20.650.820.720.70
writes0.481.360.680.58
myrocks-8.0.28minmaxavgmedian
point-10.430.830.620.71
point-20.360.790.580.55
range-10.440.750.610.63
range-20.690.790.740.73
writes0.401.230.620.52
myrocks-8.0.32minmaxavgmedian
point-10.440.800.640.74
point-20.490.800.660.70
range-10.440.730.590.60
range-20.670.780.720.71
writes0.411.210.610.51

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...