Thursday, January 21, 2021

Summary of my search for regressions with MySQL & Postgres

This wraps up another round of benchmarks. My goal was to determine whether there were CPU regressions in MySQL and Postgres. I found more in MySQL than Postgres. 

I used the Insert Benchmark and sysbench. I also ran tests with TSBS but ran out of time to write up the results. Tests were run on small servers that I have at home. It would be nice to repeat these tests using larger servers in the cloud, perhaps someone else will do that.

The interesting result is that CPU regressions in MySQL reduce throughput by 10% to 20% between each major release. While my tests were limited to 5.6, 5.7 and 8.0 I saw similar problems years ago when I was able to compile older releases from source. Low-concurrency CPU regressions have been a thing with MySQL, and they still are a thing. Fortunately Postgres is the existence proof that the regressions can be avoided.

Sysbench results:

Insert benchmark results

Wednesday, January 20, 2021

Sysbench: MySQL vs Postgres for an IO-bound workload

This compares results for MySQL/InnoDB and Postgres via IO-bound sysbench using the data I shared for Postgres and MySQL. I previously shared a comparison for an in-memory workload with sysbench. My goal is to document how performance and efficiency have changed from MySQL 5.6 to 8.0 and from Postgres 11 to 13. I try to avoid benchmarketing but I am curious about how system software evolves over time. I share many numbers here (performance is a vector not a scalar) and am happy that one DBMS isn't strictly faster or more efficient than the other.

Summary:

  • On average, old Postgres was slower than old MySQL while new Postgres is faster than new MySQL courtesy of CPU regressions in new MySQL (for this workload, HW, etc)
  • The InnoDB clustered PK index hurts throughput for a few queries when the smaller Postgres PK index can be cached, but the larger InnoDB PK index cannot.
  • The InnoDB clustered PK index helps a few queries where Postgres must to extra random IO to fetch columns not in the PK index

Overview

Much more detail on the benchmarks is here for Postgres and here for MySQL/InnoDB. The summary is that I used 1 table with 400M rows and small servers. Tests were run for Postgres (11.10, 12.4 and 13.1) and MySQL (5.6.49, 5.7.31, 8.0.22) but my focus here is comparing Postgres 11.10 with MySQL 5.6.49 and then Postgres 13.1 with MySQL 8.0.22.

Results

The tests are in 5 groups based on the sequence in which they are run: load, read-only run before write-heavy, write-heavy, read-only run after write-heavy and insert/delete. 

I have scripts that generate 3 summaries -- absolute throughput, relative throughput and HW efficiency. Absolute throughput is the QPS or TPS for a test. Relative throughput is the QPS or TPS relative to the base case. The HW efficiency report has absolute and relative results for CPU and IO per operation.

I use ratios (relative throughput & relative HW efficiency) to explain performance. For this post the denominator (the base case) is MySQL and the numerator is Postgres. A throughput ratio < 1 means that Postgres is slower. For HW efficiency, CPU and IO per operation, a ratio > 1 means that Postgres uses more CPU or IO per operation.

I did two comparisons: one for Postgres 11 vs MySQL 5.6, another for Postgres 13 vs MySQL 8.0 and the files are in github. Files that end in old are for Postgres 11 vs MySQL 5.6 while files that end in new are for Postgres 13 vs MySQL 8.0. There are files for absolute throughput (old and new), relative throughput (old and new) and HW efficiency (old and new).

I am still figuring out how to present this data and balance both the ease of writing reports while making it not too hard to read. In this case I put the results for throughput and HW efficiency into a spreadsheet. The spreadsheet has many rows and 10 columns. The rows are the order in which the tests were run. The columns are:
  • Test - the name of the test
  • Range - the value of the range argument for Lua scripts that determines the number of rows processed for some of the tests.
  • qps.new, qps.old - throughput ratios
  • cpu/o - new, cpu/o - old  - ratios for CPU/operation
  • r/o - new, r/o - old - ratios for storage reads/operation
  • rKB/o - new, rKB/o - old - ratios for storage read KB/operation
The ratios have (Postgres value / MySQL value), new means this is for Postgres 13.1 vs MySQL 8.0.22 and old means this is for Postgres 11.10 vs MySQL 5.6.49. For the HW efficiency values (/operation) the operation is one of queries, statements, transactions.

In the spreadsheet I highlighted entries in gold when the throughput for Postgres improved by more than ~5% relative to MySQL from qps.old to qps.new (done for 19 of 43 tests). I also highlighted entries in silver when CPU/operation for Postgres improved by more than ~5% relative to MySQL from cpu/o - old to cpu/o - new (done for 26 of 43 tests). This makes it easier to understand the impact from CPU regressions.

New Postgres has improved relative to new MySQL. The geometric mean for throughput ratios is 0.95 for qps.old and 1.04 for qps.new. CPU efficiency is the reason for the improvement as the geometric mean for cpu/o is 1.12 for cpu/o - old and 0.88 for cpu/o - new. Neither MySQL nor Postgres dominate on these tests, MySQL is faster for some and Postgres is faster for others. But the advantage has shifted towards Postgres thanks to CPU efficiency. On average, old Postgres was slower than old MySQL while new Postgres is faster than new MySQL courtesy of CPU regressions in new MySQL. 

There were a few outliers where MySQL is 2X faster than Postgres or vice versa and I will explain these using the names from the Test column in the spreadsheet.
  • points-covered-pk.pre (row 10) - Postgres is more than 17X faster. The PK index is covering for the query used by this test. However, InnoDB has a clustered PK index that includes all columns for the table and is much larger than the Postgres PK index. The r/o columns from the spreadsheet show that InnoDB does ~30X more storage reads/query than Postgres and that would explain the throughput difference. This also explains why Postgres is much faster for:
    • range-covered-pk.pre (row 14) - Postgres is more than 3X faster
    • points-covered-pk (row 34) - Postgres is more than 20X faster
    • range-covered-pk (row 38) - Postgres is more than 3X faster
  • update-inlist, update-nonindex (rows 18 & 20) - these do updates that don't require secondary index maintenance and MySQL gets ~2X more updates/second. MySQL modifies leaf pages of the clustered PK index while Postgres modifies both PK index leaf pages and heap-organized table pages. That might explain why Postgres uses more CPU and random IO.
  • insert (row 44) - InnoDB is ~3X faster and I suspect the change buffer is the reason
Finally, there are tests that benefit from the InnoDB clustered PK index, although the benefit there is not as extreme as the impact above where it hurts InnoDB. One example are the read-only tests (rows 8-10 and 26-28). Those queries are also included in the read-write tests (rows 24, 25) that represent the traditional sysbench test. The benefit is that InnoDB does a PK index range scan to evaluate the query predicate and get all columns, while Postgres must fetch rows via random page reads from the heap-organized table to get columns not in the PK index.

Sysbench: IO-bound and InnoDB

This has results for IO-bound sysbench with InnoDB in MySQL 5.6, 5.7 and 8.0. The test is similar to what I used for in-memory sysbench except the table has 400M rows instead of 10M and the test table is much larger than memory. The goal is to understand how performance and efficiency change from MySQL 5.6 to 8.0. I also have posts for Postgres and MyRocks.

Summary:

  • MySQL 8.0 on read-heavy tests is mixed. Sometimes is is better than 5.6 and when worse the loss is mostly at most 10%.
  • MySQL 8.0 on write-heavy tests does worse. Many of these are CPU bound and throughput in 8.0 is frequently 20% to 30% worse than 5.6. But for some 8.0 does much better than 5.6.
  • MySQL 8.0 for full scans is ~17% slower than 5.6
  • New CPU overhead explains most of the regressions
  • There are large regressions on a few tests that are logically index-only and extra read IO is the problem. See slide 7 in Explaining MVCC GC.

Overview

I use my sysbench fork. I have yet to change the code but added Lua scripts for new tests. Tests are run in a sequence (prepare, pause to let write back & compaction catch up, read-only, write-heavy, pause again, read-only, delete, insert) via all_small.sh which calls another helper script, run.sh, to run tests and collect HW metrics.

The tests use 1 table with 400M rows and each test is run for 300 seconds for 1, 2 and 3 threads. The test servers have 4 CPU cores with HT disabled, 16G of RAM and NVMe SSD. The test table is much larger than RAM and I call this an IO-bound setup.

Tests used upstream MySQL 5.6.49, 5.7.31 and 8.0.22. The servers use Ubuntu 20.04 and XFS. 

Results

The tests are in 5 groups based on the sequence in which they are run: load, read-only run before write-heavy, write-heavy, read-only run after write-heavy and insert/delete. 

I have scripts that generate 3 summaries -- absolute throughput, relative throughput and HW efficiency. Absolute throughput is the QPS or TPS for a test. Relative throughput is the QPS or TPS relative to the base case. The HW efficiency report has absolute and relative results for CPU and IO per operation. In this post the base case is the result for InnoDB in MySQL 5.6.49.

I use ratios (relative throughput & relative HW efficiency) to explain performance. For this post the denominator (the base case) is InnoDB from MySQL 5.6.49 and the numerator is InnoDB from MySQL 5.7.31 or 8.0.22. A throughput ratio < 1 means that 5.7 or 8.0 are slower. For HW efficiency, CPU and IO per operation, a ratio > 1 means that 5.7 or 8.0 use more CPU or IO per operation.

Files are in github including summaries for absolute throughputrelative throughput and HW efficiency. I annotate the results below with a focus on the 5.6 vs 8.0 comparison.

Load:
  • Inserts/second ratio for 8.0 is 0.71 (here)
  • Inserts/second ratio for 5.7 is 1.08
  • CPU/insert ratio is 0.95 for 5.7 and 1.63 for 8.0 (here)
Read-only before write-heavy:
  • For the first 4 tests that do point queries
    • QPS ratios for 8.0 are 1.45, 1.09, 1.03, 0.88 (here to here)
    • QPS ratios for 5.7 are 1.43, 1.09, 1.02, 0.88
    • CPU/query ratios for 8.0 are 0.63, 0.90, 1.00, 1.33 (here to here)
  • The next 3 tests have range scans from oltp_read_write.lua with ranges of size 10, 100 & 10,000
    • QPS ratios for 8.0 are 0.88, 1.06, 1.12 (here to here)
    • QPS ratios for 5.7 are 0.93, 1.06, 1.25
    • CPU/query ratios for 8.0 are 1.28, 0.93, 0.90 (here to here)
  • The next 2 tests do point queries via in-lists that are covering and not covering for the PK index
    • QPS ratios for 8.0 are 1.03, 1.03 (here to here)
    • QPS ratios for 5.7 are 0.98, 1.02
    • CPU/query ratios for 8.0 are 0.99, 0.98 (here to here)
  • The next 2 tests are similar to the previous but use the secondary index
    • QPS ratios for 8.0 are 0.72, 0.93 (here to here)
    • QPS ratios for 5.7 are 0.93, 0.92
    • CPU/query ratios for 8.0 are 1.41, 1.23 (here to here)
    • IO read KB/query ratios for 8.0 are 1.39, 1.00
  • The next 2 tests do range queries that are covering and not covering for the PK index
    • QPS ratios for 8.0 are 1.02, 1.03 (here to here)
    • QPS ratios for 5.7 are 1.05, 1.04
    • CPU/query ratios for 8.0 are 0.99, 0.97 (here to here)
  • The next 2 tests are similar to the previous but use the secondary index
    • QPS ratios for 8.0 are 0.77, 0.95 (here to here)
    • QPS ratios for 5.7 are 0.84, 0.92
    • CPU/query ratios for 8.0 are 1.28, 1.11 (here to here)
    • IO read KB/query ratios for 8.0 are 1.29, 1.00
Write-heavy
  • For the next 5 tests that are update-only
    • QPS ratios for 8.0 are 1.07, 1.01, 1.51, 0.75, 0.63 (here to here)
    • QPS ratios for 5.7 are 1.11, 1.01, 1.22, 0.96, 1.23
    • CPU/statement ratios for 8.0 are 1.35, 1.86, 1.45, 2.23, 1.91 (here to here)
  • The next test is write-only that has the writes from oltp_read_write.lua
    • QPS ratio for 8.0 is 0.75 (here)
    • QPS ratio for 5.7 is 1.32
    • CPU/transaction ratio is 0.91 for 5.7 and 1.71 for 8.0 (here)
  • The next 2 tests are the traditional sysbench tests with ranges of size 10 & 100
    • QPS ratio for 8.0 is 0.62, 0.79 (here to here)
    • QPS ratio for 5.7 is 1.00, 1.06
    • CPU/transaction ratios for 8.0 are 2.04, 1.66 (here to here)
Read-only after write-heavy includes tests that were run before write-heavy.
  • The next 3 tests have range scans from oltp_read_write.lua with ranges of size 10, 100 & 10,000
    • QPS ratios for 8.0 are 0.88, 1.01, 1.13 (here to here). 
    • QPS ratios for 5.7 are 0.94, 1.03, 1.25
    • CPU/transaction ratios for 8.0 are 1.28, 0.99, 0.89 (here to here)
  • The next 5 tests do point queries
    • QPS ratios for 8.0 are 0.89, 1.09, 1.03, 1.00, 0.49 (here to here)
    • QPS ratios for 5.7 are 1.02, 1.09, 0.98, 1.00, 0.89
    • CPU/query ratios for 8.0 are 1.23, 0.89, 0.97, 1.32, 1.91 (here to here)
    • The regression for hot-points (QPS ratio = 0.49) is from bug 102037
  • The next 2 tests do point queries via in-lists that are covering and not covering for the PK index
    • QPS ratios for 8.0 are 1.00, 1.03 (here to here)
    • QPS ratios for 5.7 are 0.95, 0.98
    • CPU/query ratios for 8.0 are 1.04, 0.98 (here to here)
  • The next 2 tests are similar to the previous test but use the secondary index
    • QPS ratios for 8.0 are 0.72, 0.90 (here to here)
    • QPS ratios for 5.7 are 0.93, 0.89
    • CPU/query ratios for 8.0 are 1.40, 1.30 (here to here)
    • IO read KB/query ratios for 8.0 are 1.39, 0.99
  • The next 2 tests do range queries that are covering and not covering for the PK index
    • QPS ratios for 8.0 are 1.00, 1.03 (here to here)
    • QPS ratios for 5.7 are 1.02, 1.06
    • CPU/query ratios for 8.0 are 1.01, 0.96 (here to here)
  • The next 2 tests are similar to the previous but use the secondary index
    • QPS ratios for 8.0 are 0.77, 0.92 (here to here)
    • QPS ratios for 5.7 are 0.84, 0.94
    • CPU/query ratios for 8.0 are 1.29, 1.23 (here to here)
    • IO read KB/query ratios for 8.0 are 1.24, 1.00
  • The next test does a single-threaded full scan of the test table with a filter so that the result set is empty.
    • Rows scanned/second ratio is 0.83 for 8.0 (here)
    • Rows scanned/second ratio is 0.87 for 5.7
    • For 8.0 the CPU/query ratio is 0.96 and the IO read KB/query ratio is 0.83 (here)
    • It isn't in the tables I shared but the SSD read 624 MB/s for 5.6 vs 516 MB/s for 8.0.
    Insert/delete

    • QPS ratios for 8.0 are 1.33 for delete and 0.66 for insert
    • QPS ratios for 5.7 are 1.08 for delete and 0.85 for insert
    • CPU/statement ratios for 8.0 are 1.56 for delete and 2.94 for insert

    Sysbench: IO-bound and Postgres

    This has results for IO-bound sysbench with Postsgres versions 11.10, 12.4 and 13.1. The test is similar to what I used for in-memory sysbench except the table has 400M rows instead of 10M and the test table is much larger than memory. The goal is to understand how performance and efficiency change over time. I also have posts for InnoDB and MyRocks.

    Summary:

    • Many tests get between 5% and 10% more throughput in 13.1 relative to 11.10
    • Tests that do covering queries on secondary indexes do ~20% less read IO/query and I am not sure whether this is a benefit from index deduplication
    • There aren't significant regressions from 11.10 to 13.1

    Overview

    I use my sysbench fork. I have yet to change the code but added Lua scripts for new tests. Tests are run in a sequence (prepare, pause to let write back & compaction catch up, read-only, write-heavy, pause again, read-only, delete, insert) via all_small.sh which calls another helper script, run.sh, to run tests and collect HW metrics.

    The tests use 1 table with 400M rows and each test is run for 300 seconds for 1, 2 and 3 threads. The test servers have 4 CPU cores with HT disabled, 16G of RAM and NVMe SSD. The test table is much larger than RAM and I call this an IO-bound setup.

    Tests used Postgres versions 11.10, 12.4 and 13.1. The servers use Ubuntu 20.04 and XFS. 

    Results

    The tests are in 5 groups based on the sequence in which they are run: load, read-only run before write-heavy, write-heavy, read-only run after write-heavy and insert/delete. 

    I have scripts that generate 3 summaries -- absolute throughput, relative throughput and HW efficiency. Absolute throughput is the QPS or TPS for a test. Relative throughput is the QPS or TPS relative to the base case. The HW efficiency report has absolute and relative results for CPU and IO per operation. In this post the base case is the result for Postgres 11.10.

    I use ratios (relative throughput & relative HW efficiency) to explain performance. For this post the denominator (the base case) is Postgres 11.10 and the numerator is Postgres 12.4 or 13.1. A throughput ratio < 1 means that 12.4 or 13.1 are slower. For HW efficiency, CPU and IO per operation, a ratio > 1 means that 12.4 or 13.1 use more CPU or IO per operation.

    Files are in github including summaries for absolute throughputrelative throughput and HW efficiency. I annotate the results below with a focus on the 5.6 vs 8.0 comparison. The Postgres config files are also in github for 11.10, 12.4 and 13.1.

    Below I describe throughput and efficiency for 13.1 and 12.4 relative to 11.10. I use v13 for 13.1 and v12 for 12.4 to improve readability.

    Load:
    • Inserts/second ratio for v13 is 1.14 (here)
    • Inserts/second ratio for v12 is 1.01
    • CPU/insert ratio is 0.99 for v12 and 0.94 for v13 (here)
    • IO read KB/insert ratio is 1.00 for v12 and 1.34 for v13
    • IO write KB/insert ratio is 1.00 for v12 and 0.93 for v13
    Read-only before write-heavy:
    • For the first 4 tests that do point queries
      • QPS ratios for v13 are 1.07, 1.04, 0.96, 0.83 (here to here)
      • QPS ratios for v12 are 1.07, 1.02, 1.00, 1.00
      • CPU/query ratios for v13 are 0.85, 0.92, 1.05, 1.03 (here to here)
    • The next 3 tests have range scans from oltp_read_write.lua with ranges of size 10, 100 & 10,000
      • QPS ratios for v13 are 1.02, 0.92, 0.98 (here to here)
      • QPS ratios for v12 are 1.07, 0.99, 0.98
      • CPU/query ratios for v13 are 0.94, 1.10, 1.02 (here to here)
    • The next 2 tests do point queries via in-lists that are covering and not covering for the PK index
      • QPS ratios for v13 are 0.96, 1.00 (here to here)
      • QPS ratios for v12 are 0.98, 1.02
      • CPU/query ratios for v13 are 1.04, 0.98 (here to here)
    • The next 2 tests are similar to the previous but use the secondary index
      • QPS ratios for v13 are 1.07, 1.08 (here to here)
      • QPS ratios for v12 are 0.99, 1.00
      • CPU/query ratios for v13 are 0.99, 0.92 (here to here)
      • IO read KB/query ratios for v13 are 0.82, 0.91
    • The next 2 tests do range queries that are covering and not covering for the PK index
      • QPS ratios for v13 are 1.00, 0.99 (here to here)
      • QPS ratios for v12 are 1.03, 1.02
      • CPU/query ratios for v13 are 1.00, 1.03 (here to here)
    • The next 2 tests are similar to the previous but use the secondary index
      • QPS ratios for v13 are 1.14, 0.99 (here to here)
      • QPS ratios for v12 are 1.01, 1.01
      • CPU/query ratios for v13 are 0.91, 1.01 (here to here)
      • IO read KB/query ratios for v13 are 0.82, 1.01
    Write-heavy
    • For the next 5 tests that are update-only
      • QPS ratios for v13 are 1.02, 1.02, 1.01, 1.02, 0.97 (here to here)
      • QPS ratios for v12 are 1.01, 1.00, 1.01, 1.00, 0.99
      • CPU/statement ratios for v13 are 0.99, 0.99, 0.98, 0.98, 0.98 (here to here)
    • The next test is write-only that has the writes from oltp_read_write.lua
      • QPS ratio for v13 is 1.01 (here)
      • QPS ratio for v12 is 0.98
      • CPU/transaction ratio is 1.02 for v12 and 0.99 for v13 (here)
    • The next 2 tests are the traditional sysbench tests with ranges of size 10 & 100
      • QPS ratios for v13 are 0.98, 0.99 (here to here)
      • QPS ratios for v12 is 0.99, 1.01
      • CPU/transaction ratios for v13 are 1.04, 1.00 (here to here)
    Read-only after write-heavy includes tests that were run before write-heavy.
    • The next 3 tests have range scans from oltp_read_write.lua with ranges of size 10, 100 & 10,000
      • QPS ratios for v13 are 0.95, 1.00, 0.98 (here to here). 
      • QPS ratios for v12 are 0.93, 0.98, 0.98
      • CPU/transaction ratios for v13 are 1.10, 1.01, 1.01 (here to here)
    • The next 5 tests do point queries
      • QPS ratios for v13 are 1.08, 1.01, 1.00, 1.00, 0.96 (here to here)
      • QPS ratios for v12 are 1.08, 0.99, 1.00, 1.00, 0.99
      • CPU/query ratios for v13 are 0.89, 0.99, 0.99, 1.02, 1.03 (here to here)
    • The next 2 tests do point queries via in-lists that are covering and not covering for the PK index
      • QPS ratios for v13 are 0.97, 1.00 (here to here)
      • QPS ratios for v12 are 0.98, 0.98
      • CPU/query ratios for v13 are 1.03, 0.99 (here to here)
    • The next 2 tests are similar to the previous test but use the secondary index
      • QPS ratios for v13 are 1.08, 1.08 (here to here)
      • QPS ratios for v12 are 0.99, 1.00
      • CPU/query ratios for v13 are 0.97, 0.94 (here to here)
      • IO read KB/query ratios for v13 are 0.81, 0.91
    • The next 2 tests do range queries that are covering and not covering for the PK index
      • QPS ratios for v13 are 1.03, 0.99 (here to here)
      • QPS ratios for v12 are 1.05, 0.97
      • CPU/query ratios for v13 are 0.97, 1.02 (here to here)
    • The next 2 tests are similar to the previous but use the secondary index
      • QPS ratios for v13 are 1.13, 1.00 (here to here)
      • QPS ratios for v12 are 1.00, 1.02
      • CPU/query ratios for v13 are 0.92, 0.98 (here to here)
      • IO read KB/query ratios for v13 are 0.82, 1.00
    • The next test does a single-threaded full scan of the test table with a filter so that the result set is empty.
      • Rows scanned/second ratio for v13 is 1.14 (here)
      • Rows scanned/second ratio for v12 is 1.14
      • CPU/query ratio for v13 is 0.87 (here)
      • It isn't in the tables I shared but the SSD read 417 MB/s for v13
      Insert/delete

      • QPS ratios for v13 are 1.01 for delete and 1.01 for insert
      • QPS ratios for v12 are 1.02 for delete and 0.98 for insert
      • CPU/statement ratios for v13 are 0.99 for delete and 1.00 for insert

      Tuesday, January 19, 2021

      Sysbench: in-memory scans for MyRocks, InnoDB and Postgres

      I added a test that does a full scan of the test table(s) to the sequence of tests I use with sysbench and repeated the full sequence with an in-memory workload, but I only share the results for scan here. 

      Summary

      • For in-memory scans, InnoDB was fastest, then Postgres and MyRocks was the slowest. MyRocks was also reading from storage (SSD) on this test for a table that should fit in cache.
      • Throughput decreases by ~15% for InnoDB in MySQL 8.0 vs 5.6 (see CPU overhead)
      • Throughput increases by ~16% for Postgres 13.1 vs 11.10 (see CPU overhead)
      • I am wary of having strong conclusions for a test that takes < 10 seconds

      Overview

      As explained in a previous post I share numbers for absolute throughput, relative throughput and HW efficiency. The HW efficiency numbers are computed as metric / throughput and the rows scanned/second is the value for throughput in the denominator. The files that have all of the results are in github.

      Tests were run using 1 table with 10M rows and 1 client that scans the table. The SELECT statement has a filter predicate such that the result set is empty. The table is (should be) cached by the DBMS and the scan finishes in a few seconds and I am wary of putting too much faith into results from a short running test. A test like this is better done on a server with more memory than the 16G in my NUC servers. I shared the DBMS configs in previous posts and won't reshare here but I will mention that parallel query was disabled for Postgres.

      Finally, I suspect the clustered PK B-Tree index scanned for InnoDB has not suffered much fragmentation based on the sysbench workload and with more fragmentation the results for InnoDB will suffer.

      I use ratios to describe performance and efficiency and my usage is fully explained in a previous post. Below I will indicate which DBMS provides values for the numerator and denominator in the ratios. A throughput ratio > 1 means the DBMS in the numerator is faster than the one in the denominator. A HW efficiency ratio > 1 means the DBMS in the numerator uses more HW/operation than the one in the denominator.

      The results below are explained using several comparisons:

      1. MyRocks-only - MyRocks in MySQL 5.6.35 & 8.0.17
      2. InnoDB-only - InnoDB in MySQL 5.6.49, 5.7.31 & 8.0.22
      3. Postgres-only - Postgres in versions 11.10, 12.4 and 13.1
      4. InnoDB-vs-Postgres - MySQL 5.6.49 vs Postgres 11.10, MySQL 8.0.22 vs Postgres 13.1
      MyRocks-only
      • Scan rates for 5.6, 8.0 were 1.178, 1.034 in millions of rows/second
      • Scan rate for 8.0 relative to 5.6 was 0.88
      • CPU/row ratio for 8.0 relative to 5.6 was 1.18
      • Both 5.6 and 8.0 did ~100M/s of read IO for a table that should fit in cache
      InnoDB-only
      • Scan rates for 5.6, 5.7, 8.0 were 4.673, 3.939, 3.901 in millions of rows/second
      • Scan rates for 5.7, 8.0 relative to 5.6 were 0.84, 0.83
      • CPU/row ratio for 5.7, 8.0 relative to 5.6 were 1.17, 1.33
      Postgres-only
      • Scan rates for 11.10, 12.4, 13.1 were 1.917, 2.216, 2.216 in millions of rows/second
      • Scan rates for 12.4, 13.1 relative to 11.10 were 1.16, 1.16
      • CPU/row ratio for 12.4, 13.1 relative to 11.10 were 0.86, 0.86
      InnoDB-vs-Postgres
      • Scan rates are listed above
      • Scan rate for Postgres 11.10 relative to MySQL 5.6.49 was 0.41
      • Scan rate for Postgres 13.1 relative to MySQL 8.0.22 was 0.57

      Sysbench: IO-bound and MyRocks

      This has results for IO-bound sysbench with MyRocks in FB MySQL 5.6.35 and 8.0.17. The test is similar to what I used for in-memory sysbench except the table has 400M rows instead of 10M and the test table is much larger than memory. The goal is to understand how performance and efficiency change from MySQL 5.6 to 8.0. I also have posts for InnoDB and Postgres.

      Summary:

      • While it varies, throughput for SELECT is frequently ~10% less in 8.0.17 vs 5.6.35
      • Throughput for insert and update is 20% to 30% less in 8.0.17 vs 5.6.35
      • CPU overhead is the reason for regressions, as it was for in-memory sysbench with MyRocks. It is more of an issue for insert and update because they are CPU-bound while the read-only tests do more read IO per query and are less CPU-bound.

      Overview

      I use my sysbench fork. I have yet to change the code but added Lua scripts for new tests. Tests are run in a sequence (prepare, pause to let write back & compaction catch up, read-only, write-heavy, pause again, read-only, delete, insert) via all_small.sh which calls another helper script, run.sh, to run tests and collect HW metrics.

      The tests use 1 table with 400M rows and each test is run for 300 seconds for 1, 2 and 3 threads. The test servers have 4 CPU cores with HT disabled, 16G of RAM and NVMe SSD. The test table is much larger than RAM and I call this an IO-bound setup.

      Tests used FB MySQL 5.6.35 at git hash 4911e0 and 8.0.17 at git hash cf9dbc. These were latest as of early December 2020. The servers use Ubuntu 20.04 and XFS. 

      Results

      The tests are in 5 groups based on the sequence in which they are run: load, read-only run before write-heavy, write-heavy, read-only run after write-heavy and insert/delete. 

      I have scripts that generate 3 summaries -- absolute throughput, relative throughput and HW efficiency. Absolute throughput is the QPS or TPS for a test. Relative throughput is the QPS or TPS relative to the base case. The HW efficiency report has absolute and relative results for CPU and IO per operation. In this post the base case is the result for MyRocks in MySQL 5.6.35.

      I use ratios (relative throughput & relative HW efficiency) to explain performance. For this post the denominator (the base case) is MyRocks from MySQL 5.6.35 and the numerator is MyRocks from MySQL 8.0.17. A throughput ratio < 1 means that 8.0.17 is slower. For HW efficiency, CPU and IO per operation, a ratio > 1 means that MyRocks in 8.0.17 uses more CPU or IO per operation.

      Files are in github including summaries for absolute throughputrelative throughput and HW efficiency. I annotate the results below.

      Load:
      • Inserts/second ratio is 0.82 (here)
      • CPU/insert ratio is 1.23 (here)
      Read-only before write-heavy:
      • For the first 4 tests that do point queries
        • QPS ratios are 0.97, 0.86, 0.92, 1.00 (here to here)
        • CPU/query ratios are 1.10, 1.40, 1.23, 1.14 (here to here)
      • The next 3 tests have range scans from oltp_read_write.lua with ranges of size 10, 100 & 10,000
        • QPS ratios are 0.95, 0.89, 1.07 (here to here)
        • CPU/query ratios are 1.10, 1.22, 0.95 (here to here)
      • The next 2 tests do point queries via in-lists that are covering and not covering for the PK index
        • QPS ratios are 0.90, 0.93 (here to here)
        • CPU/query ratios are 1.33, 1.21 (here to here)
      • The next 2 tests are similar to the previous but use the secondary index
        • QPS ratios are 0.74, 0.98 (here to here)
        • CPU/query ratios are 1.30, 1.11 (here to here)
        • IO read KB/query ratios are 1.72, 1.07
      • The next 2 tests do range queries that are covering and not covering for the PK index
        • QPS ratios are 0.93, 0.96 (here to here)
        • CPU/query ratios are 1.16, 1.09 (here to here)
      • The next 2 tests are similar to the previous but use the secondary index
        • QPS ratios are 0.94, 0.92 (here to here)
        • CPU/query ratios are 1.09, 1.22 (here to here)
      Write-heavy
      • For the next 5 tests that are update-only
        • QPS ratios are 0.92, 0.86, 0.85, 0.80, 0.75 (here to here)
        • CPU/query ratios are 1.21, 1.24, 1.22, 1.21, 1.36 (here to here)
      • The next test is write-only that has the writes from oltp_read_write.lua
        • QPS ratio is 1.01 (here)
        • CPU/transaction ratio is 1.10 (here)
      • The next 2 tests are the traditional sysbench tests with ranges of size 10 & 100
        • QPS ratio is 0.92, 0.95 (here to here)
        • CPU/transaction ratios are 1.16, 1.12 (here to here)
      Read-only after write-heavy includes tests that were run before write-heavy.
      • The next 3 tests have range scans from oltp_read_write.lua with ranges of size 10, 100 & 10,000
        • QPS ratios are 0.88, 0.94, 1.05 (here to here)
        • CPU/transaction ratios are 1.20, 1.13, 0.98 (here to here)
      • The next 5 tests do point queries
        • QPS ratios are 0.85, 0.86, 0.90, 1.00, 0.84 (here to here)
        • CPU/query ratios are 1.26, 1.42, 1.28, 1.24, 1.22 (here to here)
      • The next 2 tests do point queries via in-lists that are covering and not covering for the PK index
        • QPS ratios are 0.91, 0.93 (here to here)
        • CPU/query ratios are 1.27, 1.24 (here to here)
      • The next 2 tests are similar to the previous test but use the secondary index
        • QPS ratios are 0.96, 1.02 (here to here)
        • CPU/query ratios are 1.04, 1.08 (here to here)
      • The next 2 tests do range queries that are covering and not covering for the PK index
        • QPS ratios are 0.93, 0.92 (here to here)
        • CPU/query ratios are 1.19, 1.23 (here to here)
      • The next 2 tests are similar to the previous but use the secondary index
        • QPS ratios are 0.96, 0.94 (here to here)
        • CPU/query ratios are 1.08, 1.20 (here to here)
      • The next test does a single-threaded full scan of the test table with a filter so that the result set is empty.
        • Rows scanned/second ratio is 0.87 (here)
        • CPU/query ratio is 1.11 (here)
        Insert/delete

        • QPS ratio is 0.88 for delete and 0.72 for insert
        • CPU/statement ratios are 1.20 for delete and 1.38 for insert
        • IO read KB/statement ratios are 1.00 for delete and 1.57 for insert. Maybe that is a blip.

        Sunday, January 17, 2021

        Insert Benchmark: InnoDB

        This has results for the insert benchmark on a small server with a low-concurrency workload using InnoDB in MySQL 5.6.49, 5.7.31 and 8.0.22. My previous posts are for the same workload with MyRocks and Postgres.

        Overview

        The test is run with 1 client and 1 table. In this case, client means there is at most 1 connection for writes and 1 for reads. For some tests the write and read connections do things concurrently. My test server has 4 CPU cores and hyperthreads are disabled.

        The test is run for 3 table sizes: 20M, 100M and 500M rows. The test table fits in memory for the 20M and 100M sizes and is larger than memory for the 500M size. For the read+write test steps the working set is not cached for the 500M size with InnoDB while it is cached for MyRocks and almost cached for Postgres.

        The insert benchmark has several steps (explained here and here): load data, create secondary indexes, load more data, read+write and the read+write step is repeated using increasing rate limits for the writer (100/s, 200/s, 400/s, 600/s, 800/s and 1000/s).

        Results, my.cnf and helper scripts are archived in github. I tried to make my.cnf options similar and the files are here for 5.6.49, 5.7.31 and 8.0.22. The NUC servers use Ubuntu 20.04 and XFS.

        Summary for InnoDB:

        • 8.0.22 gets ~35% less throughput on insert-heavy tests vs 5.6.49
          • 8.0.22 uses between 1.5X and 2X more CPU/insert vs 5.6.49
        • 8.0.22 and 5.7.31 are faster on create index vs 5.6.49
        • 8.0.22 gets ~10% less throughput on in-memory (20M, 100M rows) read-heavy tests vs 5.6.49
          • 8.0.22 uses ~20% more CPU/query vs 5.6.49
        • QPS on the read-heavy tests for the 500M row table in 5.6, 5.7 and 8.0 is much worse than for Postgres and MyRocks because InnoDB does more storage reads/query. I have yet to explain this. IO determines throughput in this case and the extra CPU overhead in 8.0 is less of an issue. I have yet to explain this but am not sure I want to spend more time on it.

        Results overview

        For now I am sharing text summaries and in a few days I will share a longer report with graphs and tables. The text summaries are dense, which is good and bad. Dense means I can do comparisons for many DBMS configurations with all the data I need on one page. But dense also imposes a burden on the reader. The guide is here if you want to understand the output.

        The report summaries are in github for the tests with 20M100M and 500M rows. The summary files per test step are listed below. The mrg.* files have absolute values for throughput and HW efficiency metrics. The rel.* files have the values relative to the base case and here the base case is InnoDB in 5.6.49, so the values from 5.6.49 are the denominator while values from 5.7.31 and 8.0.22 are the numerator. I share ratios for 5.7.31 and 8.0.22. For relative throughput a value < 1 means that InnoDB in 8.0 or 5.7 is slower than in 5.6. For relative HW efficiency a value > 1 means that InnoDB in 8.0 or 5.7 uses more HW per operation than 5.6.

        The files are:

        • load data (l.i0) - mrg.l.i0, rel.l.i0
        • create secondary indexes (l.x) - mrg.l.x, rel.l.x
        • load more data (l.i1) - mrg.l.i1, rel.l.i1
        • read+write with 100 inserts/s - mrg.q100.2, rel.q100.2
        • read+write with 200 inserts/s - mrg.q200.2, rel.q200.2
        • read+write with 400 inserts/s - mrg.q400.2, rel.q400.2
        • read+write with 600 inserts/s - mrg.q600.2, rel.q600.2
        • read+write with 800 inserts/s - mrg.q800.2, rel.q800.2
        • read+write with 1000 inserts/s - mrg.q1000.2, rel.q1000.2

        Finally, I pasted the output from mrg.l.i0, mrg.l.x, mrg.l.i1, mrg.q100.2, mrg.q200.q, mrg.q800.2 and mrg.q1000.2 into one file (see mrg.some for 20M100M and 500M) to make this easier to read. I also did the same for the relative files (see rel.some for 20M100M and 500M). Below I discuss the output using the mrg.some and rel.some files.

        Results

        As explained above I use ratios to compare throughput and HW efficiency. The numerator is the value from InnoDB in 5.7.31 and 8.0.22 while the denominator is the value in 5.6.49. For throughput a ratio < 1 means that 8.0 or 5.7 are slower. For HW efficiency a ratio > 1 means that 8.0 or 5.7 uses more HW per operation.

        The ratios are triples -- the results for the test with 20M, 100M and 500M rows. 

        Load data (l.i0)
        • This loads in PK order to a table that has no secondary indexes
        • Insert/s (ips) ratios were 0.91, 0.90, 0.96 for 5.7 and 0.61, 0.62, 0.65 for 8.0
        • CPU/insert (cpupq) ratios were 1.08, 1.15, 1.07 for 5.7 and 1.46, 1.46, 1.36 for 8.0
        • Context switch/insert ratios were 0.94, 0.95, 0.94 for 5.7 and 1.67, 1.56, 1.57 for 8.0
        Create secondary indexes (l.x)
        • This creates 3 secondary indexes
        • Indexed rows/s (ips) ratios were 1.50, 1.77, 1.64 for 5.7 and 1.42, 1.19, 1.54 for 8.0
        • CPU/indexed row (cpupq) ratios were 0.75, 0.67, 0.73 for 5.7 and 0.75, 1.00, 0.73 for 8.0
        • Write KB/indexed row (wkbpi) ratios were 0.88, 0.90, 0.90 for 5.7 and 0.78, 0.84, 0.90 for 8.0
        • Context switch/indexed row ratios were 1.79, 1.70, 1.61 for 5.7 and 1.55, 1.57, 1.68 for 8.0
        Load more data (l.i1)
        • This loads more data in PK order after the secondary indexes are created
        • Insert/s (ips) ratios were 0.91, 0.90, 1.12 for 5.7 and 0.68, 0.56, 0.69 for 8.0
        • CPU/insert (cpupq) ratios were 1.16, 1.21, 1.17 for 5.7 and 1.52, 1.89, 1.98 for 8.0
        • Context switch/indexed row ratios were 1.13, 1.07, 1.07 for 5.7 and 1.81, 2.64, 1.87 for 8.0

          Read+write with 100 inserts/s (q100.2)

          • This has one connection doing queries and another doing 100 inserts/s
          • Queries/s (qps) ratios were 1.05, 0.89, 0.31 for 5.7 and 0.90, 0.77, 0.63 for 8.0
          • CPU/insert (cpupq) ratios were 0.99, 1.16, 1.32 for 5.7 and 1.18, 1.38, 1.28 for 8.0

          Read+write with 200 inserts/s (q200.2)
          • This has one connection doing queries and another doing 200 inserts/s
          • Queries/s (qps) ratios were 1.04, 0.90, 1.05 for 5.7 and 0.89, 0.79, 1.04 for 8.0
          • CPU/insert (cpupq) ratios were 1.01, 1.15, 0.96 for 5.7 and 1.19, 1.34, 1.21 for 8.0
          Read+write with 800 inserts/s (q800.2)
          • This has one connection doing queries and another doing 800 inserts/s
          • Queries/s (qps) ratios were 1.01, 0.89, 1.03 for 5.7 and 0.88, 0.78, 1.00 for 8.0
          • CPU/insert (cpupq) ratios were 1.03, 1.12, 1.00 for 5.7 and 1.21, 1.33, 1.30 for 8.0
          Read+write with 1000 inserts/s (q1000.2)
          • This has one connection doing queries and another doing 1000 inserts/s
          • Queries/s (qps) ratios were 0.98, 0.86, 1.03 for 5.7 and 0.84, 0.76, 1.00 for 8.0
          • CPU/insert (cpupq) ratios were 1.06, 1.20, 1.02 for 5.7 and 1.26, 1.41, 1.28 for 8.0
          An odd result

          Postgres and InnoDB have similar QPS for the read+write tests for the 100M row table. But with the 500M row table QPS for InnoDB drops significantly while it doesn't for Postgres and the symptom is that the number of storage reads/query. I use the mrg.some summaries for tests with 100M (Postgres, InnoDB) and 500M rows (Postgres, InnoDB) along with other output that I have yet to put in github.

          The problem is that Postgres gets 7000+ QPS for both 100M and 500M row tables while InnoDB gets 6000+ QPS at 100M rows and less than 1000 QPS at 500M rows. The problem occurs for all versions of InnoDB -- from MySQL 5.6, 5.7 and 8.0.

          The facts and/or symptoms are:
          • Storage reads/query (rpq) increases from ~0.01 at 100M rows to ~4 at 500M rows for InnoDB
          • Table+index size at 500M rows
            • Is ~95G for Postgres and ~103G for InnoDB. See dbgb1 in mrg.some, measured via "ls"
            • Is ~32G for the PK index and ~35G for the secondary indexes for InnoDB in 5.7 per  SHOW TABLE STATUS. Alas the Index_length column is zero in 8.0.22, see bug 80453.
          • Insert buffer size (via Ibuf: size in SHOW ENGINE INNODB STATUS) is between 50k and 100k at 500M rows but less than 500 at 100M rows. The docs don't explain whether Ibuf: size counts entries or pages, but the comment column in IS.Innodb_Metrics states this is the number of pages, so between 10%. and 20% of buffer pool memory is used by the insert buffer.
          • History list length is < 100 for InnoDB at 500M rows
          Possible reasons for the change in IO with InnoDB at 500M rows:
          1. The read IO is from queries that are logically but not physically index-only for the secondary indexes. By logically index only I mean the queries only reference columns in the secondary index. By not physically index-only I mean that InnoDB has to read the base row in the clustered PK index as I explain on slide 7 in my talk on MVCC GC. I am not aware of counters to show when this happens. I added them to a patch long ago, but lost track of that change.
          2. The read IO is from pending work (purge, insert buffer merges). I write above that the history list length is small (< 100) so I doubt that purge is an issue. But the insert buffer is large (> 50,000) so I won't rule out insert buffer merges.
          3. InnoDB memory devoted to the insert buffer means that fewer secondary index pages stay in cache and the read IO comes from cache misses.
          4. The large redo logs (40G via 40 1G files, see my.cnf) don't fit in the OS page cache and too much IO is done to re-read them when redo is written. This was a problem years ago when InnoDB was doing 512 byte writes via buffered IO, as Linux would read 4096 byte filesystem pages before the first write to an uncached page, but I think that InnoDB has made changed to avoid that.
          I might repeat the test for 8.0.22 at least 2 more times 1) to get more data from Information and Performance Schema tables and 2) to add a pause before the read-write tests to reduce the pending work from the insert buffer. My scripts currently dump SHOW ENGINE INNODB STATUS at test end but have ignored the IS and PS tables. I need to fix that. However, even with the PS and IS tables I don't see a way to get IO per index, as I could get via SHOW INDEX STATS back in the day.

          For the cases where MySQL uses O_DIRECT (InnoDB database files) I will try the performance schema to see which objects get too much IO. But doing the same where buffered IO is used (binlog, InnoDB redo log) is harder. I would be happy to learn of an easy way to do this per-inode/file/file-descriptor for Linux, but expect that magic (BPF) is the only way to do that today. Perhaps I need to read the excellent books written by Brendan Gregg.