Sunday, January 17, 2021

Insert Benchmark: InnoDB

This has results for the insert benchmark on a small server with a low-concurrency workload using InnoDB in MySQL 5.6.49, 5.7.31 and 8.0.22. My previous posts are for the same workload with MyRocks and Postgres.

Overview

The test is run with 1 client and 1 table. In this case, client means there is at most 1 connection for writes and 1 for reads. For some tests the write and read connections do things concurrently. My test server has 4 CPU cores and hyperthreads are disabled.

The test is run for 3 table sizes: 20M, 100M and 500M rows. The test table fits in memory for the 20M and 100M sizes and is larger than memory for the 500M size. For the read+write test steps the working set is not cached for the 500M size with InnoDB while it is cached for MyRocks and almost cached for Postgres.

The insert benchmark has several steps (explained here and here): load data, create secondary indexes, load more data, read+write and the read+write step is repeated using increasing rate limits for the writer (100/s, 200/s, 400/s, 600/s, 800/s and 1000/s).

Results, my.cnf and helper scripts are archived in github. I tried to make my.cnf options similar and the files are here for 5.6.49, 5.7.31 and 8.0.22. The NUC servers use Ubuntu 20.04 and XFS.

Summary for InnoDB:

  • 8.0.22 gets ~35% less throughput on insert-heavy tests vs 5.6.49
    • 8.0.22 uses between 1.5X and 2X more CPU/insert vs 5.6.49
  • 8.0.22 and 5.7.31 are faster on create index vs 5.6.49
  • 8.0.22 gets ~10% less throughput on in-memory (20M, 100M rows) read-heavy tests vs 5.6.49
    • 8.0.22 uses ~20% more CPU/query vs 5.6.49
  • QPS on the read-heavy tests for the 500M row table in 5.6, 5.7 and 8.0 is much worse than for Postgres and MyRocks because InnoDB does more storage reads/query. I have yet to explain this. IO determines throughput in this case and the extra CPU overhead in 8.0 is less of an issue. I have yet to explain this but am not sure I want to spend more time on it.

Results overview

For now I am sharing text summaries and in a few days I will share a longer report with graphs and tables. The text summaries are dense, which is good and bad. Dense means I can do comparisons for many DBMS configurations with all the data I need on one page. But dense also imposes a burden on the reader. The guide is here if you want to understand the output.

The report summaries are in github for the tests with 20M100M and 500M rows. The summary files per test step are listed below. The mrg.* files have absolute values for throughput and HW efficiency metrics. The rel.* files have the values relative to the base case and here the base case is InnoDB in 5.6.49, so the values from 5.6.49 are the denominator while values from 5.7.31 and 8.0.22 are the numerator. I share ratios for 5.7.31 and 8.0.22. For relative throughput a value < 1 means that InnoDB in 8.0 or 5.7 is slower than in 5.6. For relative HW efficiency a value > 1 means that InnoDB in 8.0 or 5.7 uses more HW per operation than 5.6.

The files are:

  • load data (l.i0) - mrg.l.i0, rel.l.i0
  • create secondary indexes (l.x) - mrg.l.x, rel.l.x
  • load more data (l.i1) - mrg.l.i1, rel.l.i1
  • read+write with 100 inserts/s - mrg.q100.2, rel.q100.2
  • read+write with 200 inserts/s - mrg.q200.2, rel.q200.2
  • read+write with 400 inserts/s - mrg.q400.2, rel.q400.2
  • read+write with 600 inserts/s - mrg.q600.2, rel.q600.2
  • read+write with 800 inserts/s - mrg.q800.2, rel.q800.2
  • read+write with 1000 inserts/s - mrg.q1000.2, rel.q1000.2

Finally, I pasted the output from mrg.l.i0, mrg.l.x, mrg.l.i1, mrg.q100.2, mrg.q200.q, mrg.q800.2 and mrg.q1000.2 into one file (see mrg.some for 20M100M and 500M) to make this easier to read. I also did the same for the relative files (see rel.some for 20M100M and 500M). Below I discuss the output using the mrg.some and rel.some files.

Results

As explained above I use ratios to compare throughput and HW efficiency. The numerator is the value from InnoDB in 5.7.31 and 8.0.22 while the denominator is the value in 5.6.49. For throughput a ratio < 1 means that 8.0 or 5.7 are slower. For HW efficiency a ratio > 1 means that 8.0 or 5.7 uses more HW per operation.

The ratios are triples -- the results for the test with 20M, 100M and 500M rows. 

Load data (l.i0)
  • This loads in PK order to a table that has no secondary indexes
  • Insert/s (ips) ratios were 0.91, 0.90, 0.96 for 5.7 and 0.61, 0.62, 0.65 for 8.0
  • CPU/insert (cpupq) ratios were 1.08, 1.15, 1.07 for 5.7 and 1.46, 1.46, 1.36 for 8.0
  • Context switch/insert ratios were 0.94, 0.95, 0.94 for 5.7 and 1.67, 1.56, 1.57 for 8.0
Create secondary indexes (l.x)
  • This creates 3 secondary indexes
  • Indexed rows/s (ips) ratios were 1.50, 1.77, 1.64 for 5.7 and 1.42, 1.19, 1.54 for 8.0
  • CPU/indexed row (cpupq) ratios were 0.75, 0.67, 0.73 for 5.7 and 0.75, 1.00, 0.73 for 8.0
  • Write KB/indexed row (wkbpi) ratios were 0.88, 0.90, 0.90 for 5.7 and 0.78, 0.84, 0.90 for 8.0
  • Context switch/indexed row ratios were 1.79, 1.70, 1.61 for 5.7 and 1.55, 1.57, 1.68 for 8.0
Load more data (l.i1)
  • This loads more data in PK order after the secondary indexes are created
  • Insert/s (ips) ratios were 0.91, 0.90, 1.12 for 5.7 and 0.68, 0.56, 0.69 for 8.0
  • CPU/insert (cpupq) ratios were 1.16, 1.21, 1.17 for 5.7 and 1.52, 1.89, 1.98 for 8.0
  • Context switch/indexed row ratios were 1.13, 1.07, 1.07 for 5.7 and 1.81, 2.64, 1.87 for 8.0

    Read+write with 100 inserts/s (q100.2)

    • This has one connection doing queries and another doing 100 inserts/s
    • Queries/s (qps) ratios were 1.05, 0.89, 0.31 for 5.7 and 0.90, 0.77, 0.63 for 8.0
    • CPU/insert (cpupq) ratios were 0.99, 1.16, 1.32 for 5.7 and 1.18, 1.38, 1.28 for 8.0

    Read+write with 200 inserts/s (q200.2)
    • This has one connection doing queries and another doing 200 inserts/s
    • Queries/s (qps) ratios were 1.04, 0.90, 1.05 for 5.7 and 0.89, 0.79, 1.04 for 8.0
    • CPU/insert (cpupq) ratios were 1.01, 1.15, 0.96 for 5.7 and 1.19, 1.34, 1.21 for 8.0
    Read+write with 800 inserts/s (q800.2)
    • This has one connection doing queries and another doing 800 inserts/s
    • Queries/s (qps) ratios were 1.01, 0.89, 1.03 for 5.7 and 0.88, 0.78, 1.00 for 8.0
    • CPU/insert (cpupq) ratios were 1.03, 1.12, 1.00 for 5.7 and 1.21, 1.33, 1.30 for 8.0
    Read+write with 1000 inserts/s (q1000.2)
    • This has one connection doing queries and another doing 1000 inserts/s
    • Queries/s (qps) ratios were 0.98, 0.86, 1.03 for 5.7 and 0.84, 0.76, 1.00 for 8.0
    • CPU/insert (cpupq) ratios were 1.06, 1.20, 1.02 for 5.7 and 1.26, 1.41, 1.28 for 8.0
    An odd result

    Postgres and InnoDB have similar QPS for the read+write tests for the 100M row table. But with the 500M row table QPS for InnoDB drops significantly while it doesn't for Postgres and the symptom is that the number of storage reads/query. I use the mrg.some summaries for tests with 100M (Postgres, InnoDB) and 500M rows (Postgres, InnoDB) along with other output that I have yet to put in github.

    The problem is that Postgres gets 7000+ QPS for both 100M and 500M row tables while InnoDB gets 6000+ QPS at 100M rows and less than 1000 QPS at 500M rows. The problem occurs for all versions of InnoDB -- from MySQL 5.6, 5.7 and 8.0.

    The facts and/or symptoms are:
    • Storage reads/query (rpq) increases from ~0.01 at 100M rows to ~4 at 500M rows for InnoDB
    • Table+index size at 500M rows
      • Is ~95G for Postgres and ~103G for InnoDB. See dbgb1 in mrg.some, measured via "ls"
      • Is ~32G for the PK index and ~35G for the secondary indexes for InnoDB in 5.7 per  SHOW TABLE STATUS. Alas the Index_length column is zero in 8.0.22, see bug 80453.
    • Insert buffer size (via Ibuf: size in SHOW ENGINE INNODB STATUS) is between 50k and 100k at 500M rows but less than 500 at 100M rows. The docs don't explain whether Ibuf: size counts entries or pages, but the comment column in IS.Innodb_Metrics states this is the number of pages, so between 10%. and 20% of buffer pool memory is used by the insert buffer.
    • History list length is < 100 for InnoDB at 500M rows
    Possible reasons for the change in IO with InnoDB at 500M rows:
    1. The read IO is from queries that are logically but not physically index-only for the secondary indexes. By logically index only I mean the queries only reference columns in the secondary index. By not physically index-only I mean that InnoDB has to read the base row in the clustered PK index as I explain on slide 7 in my talk on MVCC GC. I am not aware of counters to show when this happens. I added them to a patch long ago, but lost track of that change.
    2. The read IO is from pending work (purge, insert buffer merges). I write above that the history list length is small (< 100) so I doubt that purge is an issue. But the insert buffer is large (> 50,000) so I won't rule out insert buffer merges.
    3. InnoDB memory devoted to the insert buffer means that fewer secondary index pages stay in cache and the read IO comes from cache misses.
    4. The large redo logs (40G via 40 1G files, see my.cnf) don't fit in the OS page cache and too much IO is done to re-read them when redo is written. This was a problem years ago when InnoDB was doing 512 byte writes via buffered IO, as Linux would read 4096 byte filesystem pages before the first write to an uncached page, but I think that InnoDB has made changed to avoid that.
    I might repeat the test for 8.0.22 at least 2 more times 1) to get more data from Information and Performance Schema tables and 2) to add a pause before the read-write tests to reduce the pending work from the insert buffer. My scripts currently dump SHOW ENGINE INNODB STATUS at test end but have ignored the IS and PS tables. I need to fix that. However, even with the PS and IS tables I don't see a way to get IO per index, as I could get via SHOW INDEX STATS back in the day.

    For the cases where MySQL uses O_DIRECT (InnoDB database files) I will try the performance schema to see which objects get too much IO. But doing the same where buffered IO is used (binlog, InnoDB redo log) is harder. I would be happy to learn of an easy way to do this per-inode/file/file-descriptor for Linux, but expect that magic (BPF) is the only way to do that today. Perhaps I need to read the excellent books written by Brendan Gregg.

    No comments:

    Post a Comment

    RocksDB on a big server: LRU vs hyperclock, v2

    This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...