Friday, January 8, 2021

Sysbench: in-memory MyRocks, MySQL 5.6 & 8.0, take 3

This is my third attempt to report in-memory sysbench results for MyRocks in MySQL 5.6.35 and 8.0.17. In my second attempt I changed my.cnf so that both versions used the same value for eq_range_index_dive_limit. In this attempt I make sure the LSM tree shape is in a better shape during the read-only tests. Both attempts resolve some of the regressions between 5.6 and 8.0 reported in my first attempt.

The summary:

  • QPS for read-only tests in 8.0 is ~10% less than 5.6
  • QPS for write-only tests in 8.0 is 20% to 30% less than 5.6
  • QPS for the traditional sysbench test is similar between 5.6 and 8.0
  • Long range scans are more efficient in 8.0

There might be a take 4 if I have the time to debug the 20% to 30% regression for write-only QPS.

LSM tree shape

RocksDB has constraints on the LSM tree shape -- the memtable should use no more than X MB, L0 should have at most F files and then there is a target size (Gi) for each level (Li). These are more than just advisory -- writes can be stalled when these constraints are exceeded and the constraints also help RocksDB figure out where next to do compaction and whether to flush the memtable.

In the x6a my.cnf that I have been using the constraints are: 256M memtable, <= 4 files in the L0, 1G in the L1 and each level after L1 is 10X larger than the previous. I also disabled compression and each L0 file should be ~256M.

In my tests either the database is fully cached or all but the max level of the LSM tree is cached. So extra data in the smaller levels leads to more CPU overhead on queries, but not more reads from storage. The read penalty is real for LSM trees, but mostly about CPU despite what you might read on the interwebs.

I run sysbench tests in a sequence and each test runs for a fixed amount of time. For tests that do writes, the write rate isn't fixed so when there is variance in the write rate then the amount of writes done by each test can vary. A side-effect of this is that the LSM tree shape can vary (number of KV pairs in the memtable, number of files in L0, amount of data in Ln (n>1). For the Read-only after write-heavy tests this can have an impact on performance (I might do a post on this soon) and in some tests where I saw regressions with 8.0 the problem was that the L0 had several files for 8.0 while it was empty for the tests with 5.6.

I have seen this problem before and filed a feature request for it. I hope that a future RocksDB, or MyRocks, can be more clever and notice when workloads shift to a read-only or read-mostly phase and make the LSM tree more efficient for reads (shrink/flush the memtable, compact all data from L0 into L1, etc). For now I updated my helper scripts to do that via the following command which is run after the load and then again after the write-heavy tests finish. To be fair, for Postgres and InnoDB at that point my helper scripts prod them to write back dirty pages:
set global rocksdb_force_flush_memtable_and_lzero_now=1

This hasn't been a problem for me with the insert benchmark for two reasons. First, tests that do writes are run for a fixed number of writes rather than a fixed duration. Second, the insert benchmark test steps are either insert-only or inserts+reads. The issue is more likely for read-only test steps. 

Results

The tests are in 5 groups based on the sequence in which they are run: load, read-only run before write-heavy, write-heavy, read-only run after write-heavy and insert/delete. Each test is run for 180s at 1, 2 and 3 clients.

I have scripts that generate 3 summaries -- absolute throughput, relative throughput and HW efficiency. Absolute throughput is the QPS/TPS that a test gets for 1, 2, and 3 clients. Relative throughput is the QPS/TPS relative to the base case (MySQL 5.6.35). The HW efficiency report has absolute and relative results (base case is MySQL 5.6.35) for CPU and IO per operation.

I use ratios to explain performance. In this case MySQL 5.6.35 is the denominator and the numerator is the value for 8.0.17. A QPS ratio < 1 means the new version is slower. For HW efficiency I use CPU/operation and IO/operation (read & write). For CPU and IO per operation a ratio > 1 means the new version uses more CPU or IO per query.

The results are in github for absolute throughputrelative throughput and HW efficiency. I annotate the relative throughput and HW efficiency results below. 

Load:
  • Inserts/second ratio is 0.83, ~75k/s -> ~62k/s (see here)
  • CPU/insert ratio is 1.21 and CPU overhead explains the slowdown (see here)
Read-only before write-heavy:
  • QPS ratios are 0. 90, 0.90, 0.90, 0.89 for the first 4 tests (here to here)
    • These do point queries
    • CPU/query ratios are: 1.13, 1.13, 1.13, 1.12 (here to here)
  • QPS ratios are 0.96, 0.99, 1.13 for the next 3 tests (here to here
    • These do the range scans from oltp_read_write.lua with ranges of size 10, 100 & 10,000
    • CPU/query ratios are 1.08, 1.02, 0.92 (here to here). Long scans are better in 8.0 vs 5.6.
  • QPS ratios are 0.91, 0.89 for the next 2 tests (here to here)
    • These do point queries via in-lists that are covering and then not covering for the PK index
    • CPU/query ratios are 1.12, 1.14 (here to here).
  • QPS ratios are 0.92, 0.93 for the next 2 tests (here to here)
    • These are similar to the previous test, but use the secondary index.
    • CPU/query ratios are 1.08, 1.09 (here to here)
  • QPS ratios are 0.98, 0.86 for the next 2 tests (here to here)
    • These do range queries that are covering and then not covering for the primary index
    • CPU/query ratios are 1.07, 1.18 (here to here)
  • QPS ratios are 0.95, 0.90 for the next 2 tests (here to here)
    • These are similar to the previous test, but use the secondary index. 
    • CPU/query ratios are 1.09, 1.13 (here to here)
Write-heavy
  • QPS ratios are 0.82, 0.87, 0.83, 0.80, 0.71 for the next 5 tests (here to here)
    • These are update-only
    • CPU/statement ratios are 1.37, 1.19, 1.28, 1.27, 1.28 (here to here)
  • QPS ratio is 0.95 for the next test, write-only. See here.
    • This has the writes from oltp_read_write.lua. 
    • CPU/transaction ratio is 1.16. See here.
  • QPS ratios are 1.02, 0.99 for the next two tests, read-write (here to here)
    • These are the traditional sysbench tests (oltp_read_write.lua) with ranges of size 10 & 100
    • CPU/transaction ratios are 1.09, 1.01 (here to here)
Read-only after write-heavy includes tests that were run before write-heavy.
  • QPS ratio is 0.98, 0.99, 1.10 for the next 3 tests, read-only (here to here)
    • These have the queries (range scans) from oltp_read_write.lua with ranges of size 10, 100 and 10,000. Ratios here are similar to Read-only before write-heavy.
    • CPU/transaction ratios are 1.06, 1.04, 0.95 (here to here)
  • QPS ratios are 0.91, 1.01, 0.92, 0.92. 0.87 for the next 5 tests (here to here)
    • These do a variety of point queries. The first 4 were run in Read-only before write heavy, and ratios here are similar.
    • CPU/query ratios are 1.15, 1.03, 1.10, 1.09, 1.17 (here to here)
  • QPS ratios are 0.94, 0.92 for the next 2 tests (here to here)
    • These do point queries via in-lists that are covering and then not covering for the PK index
    • CPU/query ratios are 1.08, 1.11 (here to here)
  • QPS ratios are 0.95, 0.94 for the next 2 tests (here to here)
    • These are similar to the previous test, but use the secondary index.
    • CPU/query ratios are 1.06, 1.07 (here to here)
  • QPS ratios are 0.94, 0.92 for the next 2 tests (here to here)
    • These do range queries that are covering and then not covering for the primary index
    • CPU/query ratios are 1.10, 1.05 (here to here)
  • QPS ratios are 0.86, 0.93 for the next 2 tests (here to here)
    • These are similar to the previous test, but use the secondary index. 
    • CPU/query ratios are 1.13, 1.09 (here to here)
    Insert/delete

    • QPS ratio is 0.82 for the delete test and 0.73 for the insert test
    • CPU/statement ratio is 1.22 for delete and 1.39 for insert

    No comments:

    Post a Comment

    RocksDB on a big server: LRU vs hyperclock, v2

    This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...