Saturday, January 16, 2021

Insert benchmark: MyRocks

This has results for the insert benchmark on a small server with a low-concurrency workload using MyRocks in MySQL 5.6.35 and 8.0.17. I also have posts for InnoDB and Postgres.

Overview

The test is run with 1 client and 1 table. In this case, client means there is at most 1 connection for writes and 1 for reads. For some tests the write and read connections do things concurrently. My test server has 4 CPU cores and hyperthreads are disabled.

The test is run for 3 table sizes: 20M, 100M and 500M rows. The test table fits in memory for the 20M and 100M sizes and is larger than memory for the 500M size. For the read+write test steps the working set is cached for the 500M size with MyRocks (but not with all DBMS engines).

The insert benchmark has several steps (explained here and here): load data, create secondary indexes, load more data, read+write and the read+write step is repeated using increasing rate limits for the writer (100/s, 200/s, 400/s, 600/s, 800/s and 1000/s).

Results, my.cnf and helper scripts are archived in github. I tried to make my.cnf options similar and the files are here for 5.6.35 and 8.0.17. Tests used FB MySQL 5.6.35 at git hash 4911e0 and 8.0.17 at git hash cf9dbc. These were latest as of early December 2020. The NUC servers use Ubuntu 20.04 and XFS.

Summary for MyRocks in 8.0.17 vs 5.6.35

  • 8.0.17 gets ~15% to 20% less throughput for insert-heavy
  • 8.0.17 gets ~5% less throughput for create index
  • 8.0.17 gets ~10% less throughput for range queries
  • CPU overhead is the problem for all of the regressions
  • Context switches might be a problem for create index

Results overview

For now I am sharing text summaries and in a few days I will share a longer report with graphs and tables. The text summaries are dense, which is good and bad. Dense means I can do comparisons for many DBMS configurations with all the data I need on one page. But dense also imposes a burden on the reader. The guide is here if you want to understand the output.

The report summaries are in github for the tests with 20M, 100M and 500M rows. The summary files per test step are listed below. The mrg.* files have absolute values for throughput and HW efficiency metrics. The rel.* files have the values relative to the base case and here the base case is MyRocks in 5.6.35. For relative throughput a value < 1 means that MyRocks in 8.0.17 is slower than in 5.6.35. For relative HW efficiency a value > 1 means that MyRocks in 8.0.17 uses more HW per operation than 5.6.35.

The files are:

  • load data (l.i0) - mrg.l.i0, rel.l.i0
  • create secondary indexes (l.x) - mrg.l.x, rel.l.x
  • load more data (l.i1) - mrg.l.i1, rel.l.i1
  • read+write with 100 inserts/s - mrg.q100.2, rel.q100.2
  • read+write with 200 inserts/s - mrg.q200.2, rel.q200.2
  • read+write with 400 inserts/s - mrg.q400.2, rel.q400.2
  • read+write with 600 inserts/s - mrg.q600.2, rel.q600.2
  • read+write with 800 inserts/s - mrg.q800.2, rel.q800.2
  • read+write with 1000 inserts/s - mrg.q1000.2, rel.q1000.2

Finally, I pasted the output from mrg.l.i0, mrg.l.x, mrg.l.i1, mrg.q100.2, mrg.q200.q, mrg.q800.2 and mrg.q1000.2 into one file (see mrg.some for 20M, 100M and 500M) to make this easier to read. I also did the same for the relative files (see rel.some for 20M, 100M and 500M). Below I discuss the output using the mrg.some and rel.some files.

Results

As explained above I use ratios to compare throughput and HW efficiency. The numerator is the value for MyRocks in 8.0.17 and the denominator is the value in 5.6.35. For throughput a ratio < 1 means that 8.0.17 is slower. For HW efficiency a ratio > 1 means that 8.0.17 uses more HW per operation.

Load data (l.i0)
  • This loads in PK order to a table that has no secondary indexes
  • Insert/s (ips) ratios were 0.80, 0.78, 0.77 for 20M, 100M and 500M row tables
  • CPU/insert (cpupq) ratios were 1.11, 1.18, 1.18 for ...
Create secondary indexes (l.x)
  • This creates 3 secondary indexes
  • Indexed rows/s (ips) ratios were 0.94, 0.96, 0.94 for 20M, 100M and 500M row tables
  • CPU/indexed row (cpupq) ratios were the 1.00, 1.00, 1.11 for ...
  • Context switches/indexed row ratios were 1.29, 1.14, 1.25 for ...
Load more data (l.i1)
  • This loads more data in PK order after the secondary indexes are created
  • Insert/s (ips) ratios were 0.84, 0.84, 0.84 for 20M, 100M and 500M row tables
  • CPU/insert (cpupq) ratios were 1.11, 1.16, 1.15 for ...

    Read+write with 100 inserts/s (q100.2)

    • This has one connection doing queries and another doing 100 inserts/s
    • Queries/s (qps) ratios were 0.92, 0.90, 0.91 for 20M, 100M and 500M row tables
    • CPU/insert (cpupq) ratios were 1.12, 1.13, 1.13 for ...

    Read+write with 200 inserts/s (q200.2)
    • This has one connection doing queries and another doing 200 inserts/s
    • Queries/s (qps) ratios were 0.92, 0.91, 0.91 for 20M, 100M and 500M row tables
    • CPU/insert (cpupq) ratios were 1.12, 1.13, 1.12 for ...
    Read+write with 800 inserts/s (q800.2)
    • This has one connection doing queries and another doing 800 inserts/s
    • Queries/s (qps) ratios were 0.92, 0.90, 0.90 for 20M, 100M and 500M row tables
    • CPU/insert (cpupq) ratios were 1.12, 1.14, 1.14 for ...
    Read+write with 1000 inserts/s (q1000.2)
    • This has one connection doing queries and another doing 1000 inserts/s
    • Queries/s (qps) ratios were 0.92, 0.90, 0.90 for 20M, 100M and 500M row tables
    • CPU/insert (cpupq) ratios were 1.12, 1.14, 1.14 for ...

    No comments:

    Post a Comment

    RocksDB on a big server: LRU vs hyperclock, v2

    This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...