Tuesday, January 2, 2024

Updated Insert benchmark: MyRocks 5.6 and 8.0, small server, cached database

This has results for the Insert Benchmark using MyRocks 5.6 and 8.0 using a small server and cached workload. A recent writeup from the same benchmark using a medium server is here.

For old MyRocks 5.6.35 vs latest 5.6.35
  • There might be large regressions for the range query tests (qr*). These might also be noise. I have more work in progress to figure that out. I don't see such a large regression on a medium server.
For latest MyRocks 5.6.35 vs latest MyRocks 8.0.32
  • There might be large regressions for the range query tests (qr*). These might also be noise. I have more work in progress to figure that out.  I don't see such a large regression on a medium server.
Build + Configuration

I tested MyRocks 5.6.35, 8.0.28 and 8.0.32 using the latest code as of December 2023. I also repeated tests for older builds for MyRocks 5.6. These were compiled from source. All builds use CMAKE_BUILD_TYPE =Release.

MyRocks 5.6.35
  • fbmy5635_rel_221222
    • compiled with gcc 11.4.0 from git hash 4f3a57a1, RocksDB 8.7.0 at git hash 29005f0b
  • fbmy5635_rel_clang14_221222
    • compiled with clang 14.0.0 from git hash 4f3a57a1, RocksDB 8.7.0 at git hash 29005f0b
  • fbmy5635_rel_clang15_221222
    • compiled with clang 15.0.7 from git hash 4f3a57a1, RocksDB 8.7.0 at git hash 29005f0b
MyRocks 8.0.28
  • fbmy8028_rel_221222
    • compiled with gcc 11.4.0 from git hash 2ad105fc, RocksDB 8.7.0 at git hash 29005f0b
  • fbmy8028_rel_clang14_221222
    • compiled with clang 14.0.0 from git hash 2ad105fc, RocksDB 8.7.0 at git hash 29005f0b
  • fbmy8028_rel_clang15_221222
    • compiled with clang 15.0.7 from git hash 2ad105fc, RocksDB 8.7.0 at git hash 29005f0b
MyRocks 8.0.32
  • fbmy8032_rel_221222
    • compiled with gcc 11.4.0 from git hash 76707b44, RocksDB 8.7.0 at git hash 29005f0b
  • fbmy8032_rel_clang14_221222
    • compiled with clang 14.0.0 from git hash 76707b44, RocksDB 8.7.0 at git hash 29005f0b
  • fbmy8032_rel_clang15_221222
    • compiled with clang 15.0.7 from git hash 76707b44, RocksDB 8.7.0 at git hash 29005f0b
The older MyRocks 5.6 builds are
  • fbmy5635_rel_202104072149
    • compiled from code as of 2021-04-07 at git hash f896415f with RocksDB 6.19.0
  • fbmy5635_rel_202203072101
    • compiled from code as of 2022-03-07 at git hash e7d976ee with RocksDB 6.28.2
  • fbmy5635_rel_202205192101
    • compiled from code as of 2022-05-19 at git hash d503bd77 with RocksDB 7.2.2
  • fbmy5635_rel_202208092101
    • compiled from code as of 2022-08-09 at git hash 877a0e58 with RocksDB 7.3.1
  • fbmy5635_rel_202210112144
    • compiled from code as of 2022-10-11 at git hash c691c716 with RocksDB 7.3.1
  • fbmy5635_rel_202302162102
    • compiled from code as of 2023-02-16 at git hash 21a2b0aa with RocksDB 7.10.0
  • fbmy5635_rel_202304122154
    • compiled from code as of 2023-04-12 at git hash 205c31dd with RocksDB 7.10.2
  • fbmy5635_rel_202305292102
    • compiled from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.2.1
  • fbmy5635_rel_20230529_832
    • compiled from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.3.2
  • fbmy5635_rel_20230529_843
    • compiled from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.4.3
  • fbmy5635_rel_20230529_850
    • compiled from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.5.0
Most tests used the cza1_bee my.cnf files that are here for 5.6.35 and for 8.0. Some 8.0 tests used the cza1ps0_bee my.cnf file that disables the perf schema is here.

Benchmark
 
The test server is a Beelink SER 4700u with 8 cores, 16G RAM, Ubuntu 22.04, XFS and 1 m.2 device. The benchmark is run with 1 clients to avoid over-subscribing the CPU.

I used the updated Insert Benchmark so there are more benchmark steps described below. In order, the benchmark steps are:

  • l.i0
    • insert 20 million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.
  • l.x
    • create 3 secondary indexes per table. There is one connection per client.
  • l.i1
    • use 2 connections/client. One inserts 50M rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.
  • l.i2
    • like l.i1 but each transaction modifies 5 rows (small transactions).
  • qr100
    • use 3 connections/client. One does range queries for 1800 seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.
  • qp100
    • like qr100 except uses point queries on the PK index
  • qr500
    • like qr100 but the insert and delete rates are increased from 100/s to 500/s
  • qp500
    • like qp100 but the insert and delete rates are increased from 100/s to 500/s
  • qr1000
    • like qr100 but the insert and delete rates are increased from 100/s to 1000/s
  • qp1000
    • like qp100 but the insert and delete rates are increased from 100/s to 1000/s
Results

The performance reports are here for
The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.

Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: 
  • insert/s for l.i0, l.i1, l.i2
  • indexed rows/s for l.x
  • range queries/s for qr100, qr500, qr1000
  • point queries/s for qp100, qp500, qp1000
From the summary for 5.6.35
  • On l.x (index create) the clang 14/15 builds are slower, probably because there is a codegen perf bug in clang that is fixed in more recent releases.
  • Not much changes for most benchmark steps, except for the qr* steps that do range queries. I don't know yet whether this is a real regression or noise.
  • Throughput in fbmy5635_rel_221222 relative to fbmy5635_rel_202104072149
    • l.i0 - relative QPS is 0.96
    • l.x - relative QPS is 0.97
    • l.i1, l.i2 - relative QPS is 0.980.96
    • qr100, qr500, qr1000 - relative QPS is 0.590.530.51 
    • qp100, qp500, qp1000 - relative QPS is 0.960.99, 0.99
From the summary for 8.0.28
  • On l.x (index create) the clang 14/15 builds are slower, probably because there is a codegen perf bug in clang that is fixed in more recent releases.
  • Results are mixed from the cza1ps0_bee my.cnf that disables the perf schema
From the summary for 8.0.32
  • On l.x (index create) the clang 14/15 builds are slower, probably because there is a codegen perf bug in clang that is fixed in more recent releases.
  • Results are good from the cza1ps0_bee my.cnf that disables the perf schema
From the summary for 8.0
  • I need to figure out whether the differences in the qr* steps that do range queries are noise or regressions. I suspect this is noise.
  • Throughput in fbmy8032_rel_221222 relative to fbmy8028_rel_221222
    • l.i0 - relative QPS is 0.95
    • l.x - relative QPS is 0.99
    • l.i1, l.i2 - relative QPS is 0.970.97
    • qr100, qr500, qr1000 - relative QPS is 0.800.91, 1.16 
    • qp100, qp500, qp1000 - relative QPS is 0.950.960.99
From the summary for 5.6 and 8.0
  • Throughput in fbmy8032_rel_221222 relative to fbmy5635_rel_221222
    • l.i0 - relative QPS is 0.66
    • l.x - relative QPS is 0.86
    • l.i1, l.i2 - relative QPS is 0.800.78
    • qr100, qr500, qr1000 - relative QPS is 0.590.480.51 
    • qp100, qp500, qp1000 - relative QPS is 0.840.870.89

















No comments:

Post a Comment

Fixing some of the InnoDB scan perf regressions in a MySQL fork

I recently learned of Advanced MySQL , a MySQL fork, and ran my sysbench benchmarks for it. It fixed some, but not all, of the regressions f...