Thursday, August 3, 2023

Tuning InnoDB for the Insert Benchmark on a large server

I used the Insert Benchmark to find a good my.cnf for InnoDB on a large server with MySQL 8.0, 5.7 and 5.6. Results for InnoDB on a medium server are here.

tl;dr

  • Should you enable innodb_max_purge_lag? When disabled, and it is disabled by default, the write throughput is better but the database size can grow to be 2X or more larger than expected. If the insert rate is going to be high because the database or working set is cached then this should be enabled to avoid too much purge lag.
  • The values of innodb_io_capacity and innodb_io_capacity_max have less of an impact than I expect and I remain confused about how they behave. Results are similar with them set to 10k/20k (base config), 20k/40k (a2 config) and 2k/4k (a16 config).
  • Set innodb_buffer_pool_instances and innodb_page_cleaners to the same value
I suspect that InnoDB code for purge and the page cleaner needs to be revisited if it is to provide high QoS for write-intensive workloads. It would also be great if the feature request in bug 74919 were implemented. On the less than bright side, I doubt any of this will be addressed. On the bright side MyRocks does much better for this workload.

This report includes results from testing many variations of my.cnf -- 3 for 5.6.51, 4 for 5.7.40 and 22 for 8.0.33. What doesn't surprise me is that most of the changes have no significant impact.
 
What changes will I make to the configuration I use in the future? I am wary of too much workload-specific tuning because that doesn't scale. I support tuning based on HW capacity with a few options set to indicate what to favor in the read-write-space amplification tradeoff (see RUM Conjecture). What I expect to change is:
  • Enable innodb_max_purge_lag - always for cached workloads, sometimes for others
  • Set innodb_page_cleaners and innodb_buffer_pool_instances to the same value. I am not sure whether I should set innodb_purge_threads to that value as well.
Builds

I used upstream MySQL 8.0.33, 5.7.40 and 5.6.51. All used CMAKE_BUILD_TYPE=Release. While I didn't set WITH_LTO for the 8.0.33 build from the compiler command lines I see -flto=thin. All of the builds are named myX_rel_native_lto (for X in 5651, 5740, 8033) but that name is only accurate for my8033_rel_native_lto. I should have called the others my5740_rel and my5651_rel.

Benchmark

The insert benchmark was run in three configurations.

  • cached by InnoDB - all tables are cached by InnoDB. Uses O_DIRECT.
  • cached by OS - all tables are cached by the OS page cache but not by InnoDB
  • IO-bound - the database is larger than memory

For cached by InnoDB and IO-bound the configurations have a 200G InnoDB buffer pool and don't use buffered IO (innodb_flush_method = O_DIRECT_NO_FSYNC), with one exception -- the a4 config uses innodb_flush_method = fsync.

For cached by OS the configurations have a 4G InnoDB buffer pool and use buffered IO (innodb_flush_method = fsync). 

The test HW has 80 cores with hyperthreads enabled, 2 sockets, 256G of RAM and fast local-attached NVMe storage.

The benchmark is run with 24 clients and a client per table. The benchmark is a sequence of steps.

  • l.i0
    • insert X million rows per table where X is 20 for cached and 500 for IO-bound
  • l.x
    • create 3 secondary indexes. I usually ignore performance from this step.
  • l.i1
    • insert and delete another X million rows per table with secondary index maintenance where X is 40 for cached by InnoDB, 5 for cached by OS and 5 for IO-bound. The number of rows/table at the end of the benchmark step matches the number at the start with inserts done to the table head and the deletes done from the tail. 
  • q100
    • do queries as fast as possible with 100 inserts/s/client and the same rate for deletes/s done in the background. Run for 3600 seconds.
  • q500
    • do queries as fast as possible with 500 inserts/s/client and the same rate for deletes/s done in the background. Run for 3600 seconds.
  • q1000
    • do queries as fast as possible with 1000 inserts/s/client and the same rate for deletes/s done in the background. Run for 3600 seconds.

Configurations

All of the my.cnf files are here.

The base configurations are here:

  • for cached by InnoDB and IO-bound: 5.6.515.7.408.0.33. The naming pattern is my.cnf.cy10_u$x where x is "" for the base config and a1 ... a21 otherwise.
  • for cached by OS: 5.6.515.7.408.0.33. The naming pattern is my.cnf.cy10_4g_u$x where x is "" for the base config and a1 ... a21 otherwise.

The configurations tested are:

  • base - see links above
  • a1 - adds innodb_use_native_io=off
  • a2 - increases innodb_io_capacity[_max] to 20k/40k
  • a3 - adds innodb_flush_sync=OFF and increases innodb_io_capacity[_max] to 20k/40k
  • a4 - adds innodb_flush_method=fsync
  • a5 - adds innodb_change_buffering=none
  • a6 - adds innodb_max_dirty_pages_pct_lwm=40, innodb_max_dirty_pages_pct=50
  • a7 - adds innodb_lru_scan_depth=2048
  • a8 - adds innodb_lru_scan_depth=4096
  • a9 - adds innodb_lru_scan_depth=8192
  • a10 - adds innodb_purge_threads=8, innodb_page_cleaners=8
  • a11 - reduces innodb_redo_log_capacity to 16G
  • a12 - reduces innodb_redo_log_capacity to 32G
  • a13 - adds innodb_max_purge_lag=50k, innodb_max_purge_lag_delay=1M, innodb_purge_threads=8
  • a14 - adds innodb_change_buffering=inserts
  • a15 - changes innodb_idle_pct to default
  • a16 - reduces innodb_io_capacity to 2000 and innodb_io_capacity_max to 4000
  • a17 - adds innodb_doublewrite=OFF (not safe for production)
  • a18 - adds innodb_buffer_pool_instances=4, innodb_page_cleaners=4
  • a19 - adds innodb_max_dirty_pages_pct_lwm=20, innodb_max_dirty_pages_pct=50
  • a20 - like a13 but with innodb_max_purge_lag=200k
  • a21 - like a13 but with innodb_max_purge_lag=500k
Most of the configurations were used for MySQL 8.0.33. Only the first three or four (a1 to a4) were tested for 5.6.51 and 5.7.40.

Performance

For each of the workloads below I analyze the performance in four parts:
  • average throughput - I look at the average queries and inserts per second to see if these are much better or worse than the base config
  • response time - I look at max response time and the response time distributions to see if a config makes them much better or much worse than the base config
  • graphs - there are graphs for the per-second query, insert and delete rates. I look at the graphs to see if the rates degrade over time or have too many stalls.
  • SLA - there is a target rate for the background inserts and deletes which is one of 100, 500 or 1000 per second per client. There is an SLA failure when the actual rate is too much smaller than the target.
Reports are here:
Results: cached by InnoDB

Summary
  • average throughput
    • For 5.6 the base config is best. All of the configs suffer from unconstrained database size growth -- none of them enable innodb_max_purge_lag. See the dbgb1 column for l.x and l.i1 here which is measured at the end of each benchmark step as the database grows from ~65G to ~217G.
    • For 5.7 the base config is good but the create index (the l.x benchmark step) is ~2X faster with the configs that increase innodb_io_capacity (a2, a3). All of the configs suffer from unconstrained database size growth -- none of them enable innodb_max_purge_lag. See the dbgb1 column for l.x and l.i1 here which is measured at the end of each benchmark step as the database grows from ~65G to ~217G.
    • For 8.0 the a13, a20 and a21 configs improve query perf on the read+write benchmark steps (q100, q500 and q1000) at the cost of write throughput on the l.i1 benchmark step. The a13, a20 and a21 configs also keep the database from becoming 2X larger than it should be during the l.i1 benchmark step (see dbgb1 here). Note that 2X larger might have been 4X larger had I run the benchmark step for more time. 
    • For 5.6 vs 5.7 vs 8.0 - write throughput improves from 5.6 to 5.7 to 8.0 while query throughput gets worse. The results here are better the results on a small server because while 8.0 has more CPU overhead it does better with concurrency.
  • response time
  • graphs
    • For 8.0 and l.i1 while the a13, a20 and a21 configs avoid too much size growth they also greatly increase the variance in the insert rate -- see the charts for base, a13, a20 and a21.
  • SLA
    • Everything sustained the target insert rate during q100, q500 and q1000. 
    • Except for a13, a20 and a21 that enable innodb_max_purge_lag the other configs do not prevent unconstrained database growth when purge falls behind (see the dbgb1 column here)
Results: cached by OS

Summary
  • average throughput
    • For 5.6 the base config is best
    • For 5.7 the base config is best
    • For 8.0 the a13, a20 and a21 configs improve query perf on the read+write benchmark steps (q100, q500 and q1000) at the cost of write throughput on the l.i1 benchmark step. 
    • For 5.6 vs 5.7 vs 8.0 results here are more varied than above for Cached by InnoDB. For l.i0 (inserts without secondary indexes) and l.ix (create index) the perf gets better from 5.6 to 8.0. While for l.i1 (inserts+deletes with secondary indexes) the perf is best for 5.7 while 5.6 and 8.0 get similar results (maybe single-page flush makes them both slower). For the read+write steps (q100, q500, q1000) perf also improves from 5.6 to 8.0.
  • response time
    • For 5.6 results are not interesting.
    • For 5.7 there are 5+ second insert stalls during l.i1 (see the max column here) and multi-second stalls for queries, inserts and deletes during q1000 courtesy of single-page flushing (yes, even queries can stall from doing that).
    • For 8.0 there are intermittent ~10+ second write stalls for most configs except a5, a13 and a14 (see here). Stalls during q1000 are less severe, while a13 and a20 have good results so do several other configs that don't enable innodb_max_purge_lag.
  • graphs
    • For 8.0 and l.i1 compare the insert rate charts for the base, a13, a20 and a21 configs. There is more variance when innodb_max_purge_lag is disabled.
  • SLA
    • Everything failed to sustain the target insert rate during q1000, many failed during q500 but all sustained it during q100.
    • Unconstrained database growth with innodb_max_purge_lag is less of an issue here than for Cached by InnoDB because write stalls from single-page flush means the insert rate here is less than 10% of what it was for Cached by InnoDB and purge doesn't fall behind.
Results: IO-bound

Summary
  • average throughput
    • For 5.6 the base config is best
    • For 5.7 the base config is best
    • For 8.0 the base config is good. The a13, a20 and a21 reduce the amount of database size growth. They also get better query performance at the cost of worse insert throughput which also makes them to fail to sustain the target insert rates during q100, q500 and q1000. There are no winners here but the benefit in space savings for a13, a20 and a21 here is much less than it is for the Cached by InnoDB workload. The a10 and a18 configs are generally good for perf however it isn't clear whether the issue is that innodb_buffer_pool_instances=4 is better than =8, or that the problem was using =8 when innodb_page_cleaners =4. I will just assume that innodb_buffer_pool_instances should use the same value as innodb_page_cleaners.
    • For 5.6 vs 5.7 vs 8.0 for the l.i0 benchmark step perf improves from 5.6 to 8.0 For the other benchmark steps the perf with 5.7 and 8.0 is similar and much better than 5.6. 
  • response time
    • For 5.6 and for 5.7 results are not interesting
    • For 8.0 and l.i1 results the histograms in are better for a14 and a18 (ignoring the max column). For q1000 the results don't vary much by config.
  • graphs
    • For 8.0 and l.i1 compare the fascinating insert rate charts for the base, a13, a20 and a21 configs. Enabling innodb_max_purge_lag adds a lot of variance. Results during q1000 are similar.
  • SLA
    • For 5.6 the target insert rate is sustained for q100 but not for q500 or q1000. The database size grows by 1.5X from the end of l.i1 to the end of q100 because purge falls behind (see dbgb1 here) and from b-tree fragmentation.
    • For 5.7 the target insert rate is sustained in all cases. But the database size grows by 1.37X from the end of l.i1 to the end of q100 because purge falls behind (see dbgb1 here) and from b-tree fragmentation.
    • For 8.0 the target insert rate is not sustained by a4, a13, a20 and a21 configs during q1000 and not by a13 or a20 during q500. Database size grows by ~1.37X for most configs but that is reduced to ~1.25X for the configs (a13, a20, a21) that enable innodb_max_purge_lag (see dbgb1 here). Perhaps 2/3 of the growth when innodb_max_purge_lag is disabled is from b-tree fragmentation.
Tuning purge and the read vs write tradeoff

The summary for InnoDB purge is (see the purge configuration guide) is that tuning it is complicated. When innodb_max_purge_lag is disabled, which is true by default, then
  1. Write throughput is better (more than 2X better for the Cached by InnoDB workload)
  2. The database can grow to be 2X or more larger than it should be if purge falls behind and there are delete-marked rows not getting removed fast enough (which happens here).
  3. Queries can be much slower than expected (tens of seconds vs a few milliseconds) if they must encounter and skip too many delete-marked rows. This happens on tables that have a queue pattern (insert to one end, delete from the other end).
Which matters most? Well, that depends but I prefer to enable it and avoid disk full errors from unconstrained growth and intermittent queries that are 1000X slower than expected. Unfortunately when innodb_max_purge_lag is enabled, long-open snapshots can be a problem. Figuring out how to implement the feature request in bug 74919 would help a lot.

For an example of the tradeoffs I will use the cached by InnoDB workload. 
  • For unconstrained growth, at the end of the l.x benchmark step the database size is 70.6G (see dbgb1 here). But at the end of the l.i1 benchmark step that follows l.x and does inserts and deletes as fast as possible the database size exceeds 220G for all of the configurations that don't enable innodb_max_purge_lag (see dbgb1 here) while it is ~110G for the configurations (a13, a20, a21) that enable it.
  • The insert (and delete) rate drops from ~240k/s with innodb_max_purge_lag disabled to ~90k/s with it enabled in the l.i1 benchmark step (see the summaries)
  • See below for the slow queries that happen when innodb_max_purge_lag is disabled
From the Cached by InnoDB workload this table lists the query response time in seconds for the select min(transactionid) query run at the start of the l.i1 and q100 benchmark steps. The slow response times (>= 5 seconds for all configs except a13, a20, a21) at the start of q100 happen because purge falls behind and the query spends much time dealing with delete-marked rows (see here for more info on the issue).

config  l.i1     q100
base    0.001    6.265
ua1     0.001   25.729
ua2     0.001    6.087
ua3     0.001    6.114
ua4     0.001    5.358
ua5     0.001    6.035
ua6     0.001    6.110
ua7     0.001    5.894
ua8     0.001    6.145
ua9     0.013    5.902
ua10    0.001    5.933
ua11    0.001    6.172
ua12    0.072    6.154
ua13    0.001    0.001
ua14    0.001    6.043
ua15    0.001    6.046
ua16    0.074    6.683
ua17    0.001    5.899
ua18    0.001    8.599
ua19    0.017    6.163
ua20    0.001    0.084
ua21    0.001    0.147




    No comments:

    Post a Comment

    RocksDB on a big server: LRU vs hyperclock, v2

    This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...