Tuesday, January 19, 2021

Sysbench: IO-bound and MyRocks

This has results for IO-bound sysbench with MyRocks in FB MySQL 5.6.35 and 8.0.17. The test is similar to what I used for in-memory sysbench except the table has 400M rows instead of 10M and the test table is much larger than memory. The goal is to understand how performance and efficiency change from MySQL 5.6 to 8.0. I also have posts for InnoDB and Postgres.

Summary:

  • While it varies, throughput for SELECT is frequently ~10% less in 8.0.17 vs 5.6.35
  • Throughput for insert and update is 20% to 30% less in 8.0.17 vs 5.6.35
  • CPU overhead is the reason for regressions, as it was for in-memory sysbench with MyRocks. It is more of an issue for insert and update because they are CPU-bound while the read-only tests do more read IO per query and are less CPU-bound.

Overview

I use my sysbench fork. I have yet to change the code but added Lua scripts for new tests. Tests are run in a sequence (prepare, pause to let write back & compaction catch up, read-only, write-heavy, pause again, read-only, delete, insert) via all_small.sh which calls another helper script, run.sh, to run tests and collect HW metrics.

The tests use 1 table with 400M rows and each test is run for 300 seconds for 1, 2 and 3 threads. The test servers have 4 CPU cores with HT disabled, 16G of RAM and NVMe SSD. The test table is much larger than RAM and I call this an IO-bound setup.

Tests used FB MySQL 5.6.35 at git hash 4911e0 and 8.0.17 at git hash cf9dbc. These were latest as of early December 2020. The servers use Ubuntu 20.04 and XFS. 

Results

The tests are in 5 groups based on the sequence in which they are run: load, read-only run before write-heavy, write-heavy, read-only run after write-heavy and insert/delete. 

I have scripts that generate 3 summaries -- absolute throughput, relative throughput and HW efficiency. Absolute throughput is the QPS or TPS for a test. Relative throughput is the QPS or TPS relative to the base case. The HW efficiency report has absolute and relative results for CPU and IO per operation. In this post the base case is the result for MyRocks in MySQL 5.6.35.

I use ratios (relative throughput & relative HW efficiency) to explain performance. For this post the denominator (the base case) is MyRocks from MySQL 5.6.35 and the numerator is MyRocks from MySQL 8.0.17. A throughput ratio < 1 means that 8.0.17 is slower. For HW efficiency, CPU and IO per operation, a ratio > 1 means that MyRocks in 8.0.17 uses more CPU or IO per operation.

Files are in github including summaries for absolute throughputrelative throughput and HW efficiency. I annotate the results below.

Load:
  • Inserts/second ratio is 0.82 (here)
  • CPU/insert ratio is 1.23 (here)
Read-only before write-heavy:
  • For the first 4 tests that do point queries
    • QPS ratios are 0.97, 0.86, 0.92, 1.00 (here to here)
    • CPU/query ratios are 1.10, 1.40, 1.23, 1.14 (here to here)
  • The next 3 tests have range scans from oltp_read_write.lua with ranges of size 10, 100 & 10,000
    • QPS ratios are 0.95, 0.89, 1.07 (here to here)
    • CPU/query ratios are 1.10, 1.22, 0.95 (here to here)
  • The next 2 tests do point queries via in-lists that are covering and not covering for the PK index
    • QPS ratios are 0.90, 0.93 (here to here)
    • CPU/query ratios are 1.33, 1.21 (here to here)
  • The next 2 tests are similar to the previous but use the secondary index
    • QPS ratios are 0.74, 0.98 (here to here)
    • CPU/query ratios are 1.30, 1.11 (here to here)
    • IO read KB/query ratios are 1.72, 1.07
  • The next 2 tests do range queries that are covering and not covering for the PK index
    • QPS ratios are 0.93, 0.96 (here to here)
    • CPU/query ratios are 1.16, 1.09 (here to here)
  • The next 2 tests are similar to the previous but use the secondary index
    • QPS ratios are 0.94, 0.92 (here to here)
    • CPU/query ratios are 1.09, 1.22 (here to here)
Write-heavy
  • For the next 5 tests that are update-only
    • QPS ratios are 0.92, 0.86, 0.85, 0.80, 0.75 (here to here)
    • CPU/query ratios are 1.21, 1.24, 1.22, 1.21, 1.36 (here to here)
  • The next test is write-only that has the writes from oltp_read_write.lua
    • QPS ratio is 1.01 (here)
    • CPU/transaction ratio is 1.10 (here)
  • The next 2 tests are the traditional sysbench tests with ranges of size 10 & 100
    • QPS ratio is 0.92, 0.95 (here to here)
    • CPU/transaction ratios are 1.16, 1.12 (here to here)
Read-only after write-heavy includes tests that were run before write-heavy.
  • The next 3 tests have range scans from oltp_read_write.lua with ranges of size 10, 100 & 10,000
    • QPS ratios are 0.88, 0.94, 1.05 (here to here)
    • CPU/transaction ratios are 1.20, 1.13, 0.98 (here to here)
  • The next 5 tests do point queries
    • QPS ratios are 0.85, 0.86, 0.90, 1.00, 0.84 (here to here)
    • CPU/query ratios are 1.26, 1.42, 1.28, 1.24, 1.22 (here to here)
  • The next 2 tests do point queries via in-lists that are covering and not covering for the PK index
    • QPS ratios are 0.91, 0.93 (here to here)
    • CPU/query ratios are 1.27, 1.24 (here to here)
  • The next 2 tests are similar to the previous test but use the secondary index
    • QPS ratios are 0.96, 1.02 (here to here)
    • CPU/query ratios are 1.04, 1.08 (here to here)
  • The next 2 tests do range queries that are covering and not covering for the PK index
    • QPS ratios are 0.93, 0.92 (here to here)
    • CPU/query ratios are 1.19, 1.23 (here to here)
  • The next 2 tests are similar to the previous but use the secondary index
    • QPS ratios are 0.96, 0.94 (here to here)
    • CPU/query ratios are 1.08, 1.20 (here to here)
  • The next test does a single-threaded full scan of the test table with a filter so that the result set is empty.
    • Rows scanned/second ratio is 0.87 (here)
    • CPU/query ratio is 1.11 (here)
    Insert/delete

    • QPS ratio is 0.88 for delete and 0.72 for insert
    • CPU/statement ratios are 1.20 for delete and 1.38 for insert
    • IO read KB/statement ratios are 1.00 for delete and 1.57 for insert. Maybe that is a blip.

    No comments:

    Post a Comment

    RocksDB on a big server: LRU vs hyperclock, v2

    This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...