Monday, July 17, 2023

Tuning InnoDB for the insert benchmark on a medium server, part 2

This continues work I shared in June for InnoDB with the Insert Benchmark on a medium server. In this post I have results for more my.cnf variations with MySQL 8.0.33.

tl;dr

  • I am certain I don't understand how innodb_io_capacity and innodb_io_capacity_max are supposed to work in MySQL 8. I suspect I am not alone.
  • MySQL 8.0.33 is almost always faster than 5.6.51 on a medium server with medium concurrency. This result is different from what I get on my small servers where the benchmarks are run with low concurrency.
  • Does GCP provide any way of detecting a slowly failing SSD? I'd rather not scrape dmesg output to figure this out. I have several benchmark runs that are in doubt because of a failing SSD (those that use the a17, a18 and a19 configs).
  • I am still chasing a workaround for stalls from the page cleaner falling behind, this also affects queries because they might get stalled by doing single-page flushes to make pages clean for reuse. One workaround is Percona Server.
  • Despite testing many variations of my.cnf (19 for MySQL 8.0.33) I have not found significant improvements to the config I have been using for years
Updates

How should I set innodb_io_capacity? It would be great if that variable really did specify how man IOPs InnoDB should try to use. But I doubt that is true given furious flushing and innodb_flush_sync.

Benchmark

The insert benchmark was run in three configurations.

  • cached by InnoDB - InnoDB buffer pool caches all tables
  • cached by OS - the 4G InnoDB buffer pool does not call all tables but the OS page cache does
  • IO-bound - the database is larger than memory

I used the rel build for MySQL 5.6.56 and the rel_lto builds for MySQL 5.7.40 and 8.0.33. The builds are explained here. In all cases I build from source using upstream MySQL.

The test HW is a c2-standard-30 server from GCP with 15 cores, hyperthreads disabled, 120G RAM and 1.5TB of XFS via SW RAID 0 striped over four local NVMe devices. The OS is Ubuntu 22.04.

The benchmark is run with 8 clients and a client per table. The benchmark is a sequence of steps.

  • l.i0
    • insert X million rows across all tables without secondary indexes where X is 100 for cached and 500 for IO-bound
  • l.x
    • create 3 secondary indexes. I usually ignore performance from this step.
  • l.i1
    • insert and delete another X million rows per table with secondary index maintenance where X is 100M for cached and 10M for IO-bound. The number of rows/table at the end of the benchmark step matches the number at the start. The inserts are done to the table head and the deletes are done from the tail.
  • q100
    • do queries as fast as possible with 100 inserts/s/client and the same rate for deletes/s done in the background. Run for 1800 seconds.
  • q500
    • do queries as fast as possible with 500 inserts/s/client and the same rate for deletes/s done in the background. Run for 1800 seconds.
  • q1000
    • do queries as fast as possible with 1000 inserts/s/client and the same rate for deletes/s done in the background. Run for 1800 seconds.

Configurations

The benchmark was run first using the base configurations. Then it was repeated for other configurations that changed or added a small number of options. I use a1 through a19 to name them here but the full name for a configuration file has the pattern my.cnf.cy10X_gcp_c2s30 or my.cnf.cy10X_4g_gcp_c2s30 where X has the value a1 ... a19. All of the config files are here.

For cached by InnoDB and IO-bound the configurations have an 80G InnoDB buffer pool and don't use buffered IO (innodb_flush_method = O_DIRECT_NO_FSYNC), with one exception -- the a4 config uses innodb_flush_method = fsync.

For cached by OS the configurations have a 4G InnoDB buffer pool and use buffered IO (innodb_flush_method = fsync). 

The base configurations are here:

The configurations tested are:

  • a1 - adds innodb_use_native_io=off
  • a2 - increases innodb_io_capacity[_max] to 20k/40k
  • a3 - adds innodb_flush_sync=OFF
  • a4 - adds innodb_flush_method=fsync
  • a5 - adds innodb_change_buffering=none
  • a6 - adds innodb_max_dirty_pages_pct_lwm=40, innodb_max_dirty_pages_pct=50
  • a7 - adds innodb_lru_scan_depth=2048
  • a8 - adds innodb_lru_scan_depth=4096
  • a9 - adds innodb_lru_scan_depth=8192
  • a10 - adds innodb_purge_threads=8, innodb_page_cleaners=8
  • a11 - reduces innodb_redo_log_capacity to 16G
  • a12 - reduces innodb_redo_log_capacity to 32G
  • a13 - adds innodb_max_purge_lag=50000, innodb_max_purge_lag_delay=1000000, innodb_purge_threads=8
  • a14 - adds innodb_change_buffering=inserts
  • a15 - changes innodb_idle_pct to default
  • a16 - reduces innodb_io_capacity to 2000 and innodb_io_capacity_max to 4000
  • a17 - adds innodb_doublewrite=OFF (not safe for production)
  • a18 - adds innodb_buffer_pool_instances=4, innodb_page_cleaners=4
  • a19 - adds innodb_max_dirty_pages_pct_lwm=20, innodb_max_dirty_pages_pct=50
All of the configurations were tested for MySQL 8.0.33. Only the first three or four (a1 to a4) were tested for 5.6.51 and 5.7.40.

Performance

For each of the workloads below I analyze the performance in four parts:
  • average throughput - I look at the average queries and inserts per second to see if these are much better or worse than the base config
  • response time - I look at max response time and the response time distributions to see if a config makes them much better or much worse than the base config
  • graphs - there are graphs for the per-second query, insert and delete rates. I look at the graphs to see if the rates degrade over time or have too many stalls.
  • SLA - there is a target rate for the background inserts and deletes which is one of 100, 500 or 1000 per second per client. There is an SLA failure when the actual rate is too much smaller than the target.
Reports are here:
Results: cached by InnoDB

Summary
  • average throughput
    • For 5.6.51 and 5.7.40 the base config is best
    • For 8.0.33 relative to the base config only the a8 config has better or similar results for all benchmark steps and the difference was small (<= 1% better on two, the same on three) 
    • From 5.6 vs 5.7 vs 8.0, MySQL 8.0 gets between 1.40X and 1.58X more throughput than MySQL on the write-heavy benchmark steps (l.i0, l.x, l.i1). But on the read-heavy benchmark steps (q100, q500, q1000) it is slower and gets between 0.73X and 0.81X the QPS vs MySQL 5.6.
  • response time
    • For 5.6.51 and 5.7.40 there is nothing odd in the response time histograms
    • For 8.0.33 there are bad write stalls with a17 which is reasonable because that config has a much larger average insert rate. On the l.i1 benchmark steps there are bad write stalls for a11 and a19. I assume the problem with a11 is more frequent checkpoint and with a19 it might have been a failing SSD.
  • graphs
    • For 5.6 vs 5.7 vs 8.0 the results for l.i0 and for l.i1 are good (not much variance). For q1000 there is an odd step in the QPS chart for 5.7.40
  • SLA
    • For 5.7.40 the a1 config fails the insert-rate SLA for q100 and q500 and almost for q1000.
    • For 8.0.33 the a1 config fails the insert-rate SLA for q100.
Results: cached by OS

Summary
  • average throughput
    • For 5.6.51 the base config is best
    • For 5.7.40 the a1 config is best. It uses synchronous IO rather than AIO. The read IO done here is from the OS page cache.
    • For 8.0.33 the a1 config is best. I ignore a17 because that isn't safe for prod. The impact from a17 is huge in part because it reduces stalls when the free list is empty (page flushing has less work to do).
    • From 5.6 vs 5.7 vs 8.0, MySQL 8.0 is almost strictly faster than 5.6. It is slightly slower on l.i1, similar on q1000 and gets up to 1.6X more throughput on the other benchmark steps
  • response time
    • For 5.6.51 the a2 config has the largest write stalls
    • For 5.7.40 the base, a1 and a2 configs have the largest write stalls and a3 is better
    • For 8.0.33 the a16, a17 and a18 configs have the largest write stalls but that might have been caused by a failing SSD
  • graphs
    • For 5.6 vs 5.7 vs 8.0 the results for l.i0 aren't that interesting. For l.i1 the IPS charts are similar but it is hard to spot differences on the max insert response time charts. For q1000 the IPS charts are visually appealing but have too much variance and the QPS graphs also have too much variance (from single-page flush stalls).
  • SLA
    • For 8.0.33 the a5, a13, a14, a17 and a18 configs have some insert-rate SLA failures. The failures for a17 and a18 might be caused by a failing SSD.
Results: IO-bound

Summary
  • average throughput
    • For 5.6.51 and 5.7.40 the base config is best
    • For 8.0.33 the base config is good but several configs have similar results including a2, a3, a7, a8, a15 and a16. I am certain I don't understand how innodb_io_capacity and innodb_io_capacity_max are supposed to work. I suspect I am not alone. It is odd that both increasing and decreasing innodb_io_capacity are good for perf -- a2 increases it, a16 decreases it.
    • From 5.6 vs 5.7 vs 8.0, MySQL 8.0 is strictly faster than 5.6 and gets between 1.03X and 1.67X more throughput
  • response time
    • For 5.6.51 all of the configs have multi-second write stalls on write-heavy benchmark steps
    • For 5.7.40 all configs were similar WRT write stalls
    • For 8.0.33 write-stalls were bad for a17, a18 and a19 but the problem might have been a failing SSD
  • graphs
    • For 5.6 vs 5.7 vs 8.0 the results for l.i0 have two almost horizontal lines for the IPS and max response time charts. Results for l.i1 have a curve that slowly increases and I will repeat the benchmark with a longer run time to see if it eventually becomes stable. One possible reason for this is that there are many page splits at the start of the benchmark step, which followed the index creation step (l.x). For q1000 the QPS charts show a slow improvement, similar to the IPS charts from l.i1. I will increase the runtime for this benchmark step as well to see if the curve eventually settles into a horizontal line.
  • SLA
    • For 5.6.51 the a4 config fails the insert-rate SLA for q100, q500 and q1000.
    • For 5.7.40 the a4 config fails the insert-rate SLA for q100 and almost for q500 and q1000.
    • For 8.0.33 the a5, a13, a17, a18 and a19 configs have some failures for the insert-rate SLA. The failures for a17, a18 and a19 might be from a failing SSD.



















2 comments:

  1. "(...) don't understand how innodb_io_capacity and innodb_io_capacity_max are supposed to work (...)" - I believe it's not a coincidence that the Percona Server changes we did made the server to consult these variables less and less as the changes were developed. The same applied, to a smaller extent, to innodb_lru_scan_depth too. The endgame of removing them (or keeping them as non-loaded server background work rate indicator) was not reached though.

    ReplyDelete
    Replies
    1. I want them to reflect their names -- as in they specify how many IOPs InnoDB can use. And the docs here even suggest that was the intent:
      https://dev.mysql.com/doc/refman/8.0/en/innodb-configuring-io-capacity.html

      But I am reasonably certain that is far from true. And clearly, with innodb_flush_sync it isn't true.
      https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_flush_sync

      Delete

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...