Tuesday, August 8, 2023

Checking MyRocks 5.6 for regressions with the Insert Benchmark and a small server

I found performance regressions on a large server with the Insert Benchmark when I compared builds from 2022 versus a current build. These builds were done using a complicated build script (special production compiler toolchains makes builds complex). In that previous post I claimed there was a ~15% regression for write-heavy benchmark steps and ~5% for read-heavy.

Ugh, the results below are bogus because I made mistakes in building MyRocks. The non-bogus results are here.

The rest of this post is not truthy.

The tests for these builds have to be repeated -- fbmy5635_rel_202104072149, fbmy5635_rel_202203072101, fbmy5635_rel_202205192101. Those builds were bad in part because writing C++ that can be compiled across multiple versions of g++ is non-trivial.

The good news is that I can't reproduce this problem using a much simpler build script and the small Beelink servers I have at home as the regression on the home servers is between 0% and 2%.

Builds

I used MyRocks from FB MySQL 5.6.35 using the rel build (CMAKE_BUILD_TYPE=Release, see here) with source from 2021 through 2023. The versions are:
  • fbmy5635_rel_202104072149 - from 20210407 at git hash (f896415fa0 MySQL, 0f8c041ea RocksDB), RocksDB 6.19
  • fbmy5635_rel_202203072101 - from 20220307 at git hash (e7d976ee MySQL, df4d3cf6fd RocksDB), RocksDB 6.28.2
  • fbmy5635_rel_202205192101 - from 20220519 at git hash (d503bd77 MySQL, f2f26b15 RocksDB), RocksDB 7.2.2
  • fbmy5635_rel_202208092101 - from 20220809 at git hash (877a0e585 MySQL, 8e0f4952 RocksDB), RocksDB 7.3.1
  • fbmy5635_rel_202210112144 - from 20221011 at git hash (c691c7160 MySQL, 8e0f4952 RocksDB), RocksDB 7.3.1
  • fbmy5635_rel_202302162102 - from 20230216 at git hash (21a2b0aa MySQL, e5dcebf7 RocksDB), RocksDB 7.10.0
  • fbmy5635_rel_202304122154 - from 20230412 at git hash (205c31dd MySQL, 3258b5c3 RocksDB), RocksDB 7.10.2
  • fbmy5635_rel_202305292102 - from 20230529 at git hash (b739eac1 MySQL, 03057204 RocksDB), RocksDB 8.2.1
  • fbmy5635_rel_jun23_7e40af677 - from 20230608 at git hash (7e40af67 MySQL, 03057204 RocksDB), RocksDB 8.2.1
Benchmark

The insert benchmark was run in two setups:

  • cached by RocksDB - all tables fit in the RocksDB block cache
  • IO-bound - the database is larger than memory

This benchmark used the Beelink server explained here that has 8 cores, 16G RAM and 1TB of NVMe SSD with XFS and Ubuntu 22.04. 

The benchmark is run with 1 client. The benchmark is a sequence of steps.

  • l.i0
    • insert X million rows across all tables without secondary indexes where X is 20 for cached and 800 for IO-bound
  • l.x
    • create 3 secondary indexes. I usually ignore performance from this step.
  • l.i1
    • insert and delete another 100 million rows per table with secondary index maintenance. The number of rows/table at the end of the benchmark step matches the number at the start with inserts done to the table head and the deletes done from the tail.
  • q100
    • do queries as fast as possible with 100 inserts/s/client and the same rate for deletes/s done in the background. Run for 3600 seconds.
  • q500
    • do queries as fast as possible with 500 inserts/s/client and the same rate for deletes/s done in the background. Run for 3600 seconds.
  • q1000
    • do queries as fast as possible with 1000 inserts/s/client and the same rate for deletes/s done in the background. Run for 3600 seconds.

Configurations

The configuration (my.cnf) files are here and I use abbreviated names for them in this post. For each variant there are two files -- one with a 1G block cache, one with a larger block cache. The larger block cache size is 8G when LRU is used and 6G when hyper clock cache is used (see tl;dr).

  • a (see here) - base config
  • a5 (see here) - enables subcompactions via rocksdb_max_subcompactions=2
Results

Performance reports are here for Cached by RocksDB and IO-bound
  • From the summaries that list average throughput for cached and for IO-bound the regression is at most 2%
  • From the metrics that list HW overhead per operation for cached and for IO-bound (see the cpupq column) the CPU overhead per operation is similar across versions
  • From compaction stats at the end of the last benchmark step (q1000) the cumulative stats for compaction are similar (see here, results are in date order)
















No comments:

Post a Comment