I previously shared results for the Insert Benchmark, InnoDB and a medium server. Now I have updated results that include Percona Server and a few new configs (a20, a21, a22). The good news is that Percona Server does better than upstream on the write-intensive parts of the benchmark. The bad news is that it does fix the problem of bad performance.
Several of the benchmark steps here (l.i1, q100, q500, q1000) were run for more time than they were in the previous report.
tl;dr
- Setting innodb_empty_free_list_algorithm to backoff with Percona Server does not fix the bad performance from single-page flushing
- The slow query from too much MVCC GC debt can be avoided by making sure that innodb_page_cleaners, innodb_purge_threads and innodb_buffer_pool_instances have the same value. They won't by default.
- Limiting max purge lag avoids some queries taking 1000X longer than needed, but at a big cost in write throughput. Query rates also suffer in many cases with the configs that limit max purge lag which isn't expected. Regardless, enabling limits for max purge lag is an important topic to consider.
- InnoDB doesn't suffer much from the fairness issue
- For Percona Server vs upstream using a similar config ...
- With a cached workload Percona Server has similar write throughput and slightly worse read throughput
- With an IO-bound workload Percona Server has better write throughput and similar read throughput
- Must not forget about bug 109595 as that reduces query rates for q100, q500 and q1000
The insert benchmark was run in two setups.
- cached by InnoDB - all tables are cached by InnoDB
- IO-bound - the database is larger than memory
The benchmark used a c2-standard-30 server from GCP with Ubuntu 22.04, 15 cores, hyperthreads disabled, 120G of RAM and 1.5T of storage from RAID 0 over 4 local NVMe devices with XFS.
The benchmark is run with 8 clients and 8 tables (client per table). The benchmark is a sequence of steps.
- l.i0
- insert X million rows per table where X is 20 for cached and 500 for IO-bound
- l.x
- create 3 secondary indexes. I usually ignore performance from this step.
- l.i1
- insert and delete another X million rows per table with secondary index maintenance where X is 200 for cached by InnoDB and 50 for IO-bound. The number of rows/table at the end of the benchmark step matches the number at the start with inserts done to the table head and the deletes done from the tail.
- q100, q500, q1000
- do queries as fast as possible with 100, 500 and 1000 inserts/s/client and the same rate for deletes/s done in the background. Run for 7200 seconds.
Configurations
All of the my.cnf files for 8.0, 5.7 and 5.6.
The base configurations are here:
- for upstream MySQL 8.0, 5.7 and 5.6. The naming pattern is my.cnf.cy10$x_gcp_c2s30 where x is a for the base config and a1 ... a22 otherwise.
- for Percona Server 8.0.33 (ps1 and ps2, see below)
The configurations tested are:
- base - see links above
- a2 - increases innodb_io_capacity[_max] to 20k/40k
- a7 - adds innodb_lru_scan_depth=2048
- a10 - adds innodb_purge_threads=8, innodb_page_cleaners=8
- a12 - reduces innodb_redo_log_capacity to 32G
- a13 - adds innodb_max_purge_lag=50k, innodb_max_purge_lag_delay=1M, innodb_purge_threads=8
- a17 - adds innodb_doublewrite=OFF (not safe for production)
- a18 - adds innodb_buffer_pool_instances=4, innodb_page_cleaners=4
- a19 - adds innodb_max_dirty_pages_pct_lwm=20, innodb_max_dirty_pages_pct=50
- a20 - like a13 but with innodb_max_purge_lag=200k
- a21 - like a13 but with innodb_max_purge_lag=500k
- a22 - combines a10 and a20
- ps1 - for Percona Server, same as base but sets innodb_empty_free_list_algorithm to legacy
- ps2 - for Percona Server, same as base but sets innodb_empty_free_list_algorithm to backoff
- Cached by InnoDB: 5.6 vs 5.7 vs 8.0 and 8.0 only
- IO-bound: 5.6 vs 5.7 vs 8.0 and 8.0 only
- average throughput
- fairness between benchmark clients
- SLA for background inserts and deletes
- response time histograms
- performance at 1-second intervals
- purge lag
- The impact of MVCC GC on some queries
- For l.i1 the a10, a13, a20, a21 and a22 configs reduce the write rate by ~15%. All but the a10 config enable limits on max purge lag and the reduction is expected for them. The reduction for a10 is surprising. The a10 config sets innodb_purge_threads and innodb_page_cleaners to 8 to match the default for innodb_buffer_pool_instances. Note that the a18 config also sets purge_threads, page_cleaners and buffer_pool_instances to the same value, but the value is 4 for a18, and there isn't a reduction for a18.
- For q100 the query throughput is ~1.4X better for the configs (a13, a20, a21, a22) that limit max purge lag. For those configs there is less MVCC GC debt which means there is less overhead from the work done to reduce that debt. The throughput is also better for a10 (see the previous bullet point).
- For q500 and q1000 the throughput for the the max purge limiting configs is similar to the others because the history list length has been reduced for all configs (the debt was repaid during q100).
- For l.i0 (initial load) throughput in 5.7 is 1.88X better than 5.6 while in 8.0 it is only ~1.6X better than 5.6. From the rps (storage reads/second), rpq (storage reads/insert) and cpupq (CPU/insert) columns the difference is mostly due to 5.6 doing read IO (I don't know why) that isn't done by 5.7 and 8.0 while the regression from 5.7 to 8.0 might be from new CPU overhead -- see here.
- For l.x the create index step is fastest for 5.7 and then regressions from 5.7 to 8.0. There is more CPU and read IO overhead in 8.0 vs 5.7 -- see here for the rps, rpq and cpupq columns.
- For l.i1 (inserts & deletes with secondary indexes) the write rates for 5.7 and 8.0 are ~1.6X better than for 5.6
- For q100 the query rate drops from 5.6 to 5.7 to 8.0 as the CPU overhead (see cpupq here) increases
- For q500 and q1000 the query rates are closer and 5.6 suffers from more synchronization overhead (see cspq here) while 8.0 benefits from code improvements related to concurrency.
- Upstream InnoDB is similar to Percona Server for writes and slightly better for reads here. The difference for reads appears to be from CPU overhead (see cpupq here for q100, q500 and q1000).
- For l.i1 the configs that limit max purge lag (a13, a20, a21, a22) get ~25% of the write rate that other configs sustain. This is a big price to pay for better query performance.
- For q100 most of the configs that limit purge lag (a13, a20, a22) get better query rates but a21 does not. From metrics I see the CPU overhead (cpupq) and storage read overhead (rps, rpq) are too high for a21.
- For q500 and q1000 the query rates suffer for the configs that limit max purge lag. This is odd.
- Percona Server does better than upstream InnoDB (comparing to the base config) on the l.i1 benchmark step and similar on query throughput.
- Results for the a10 config are hard to explain -- throughput for l.i1, q100, q500 and q1000 are all worse than the base config. From the metrics here I see more mutex contention (cspq) and more CPU overhead (cpupq). This isn't an issue for a18, which is similar buts sets the options to =4 rather than =8. Maybe the CPU is oversubscribed or innodb_purge_threads=8 is a problem.
- For l.i0 (initial load) throughput is 1.97X better in 5.7 vs 5.6 while 8.0 is only ~1.7X better than 5.6 because 8.0 has more CPU overhead than 5.7 (see cpupq here)
- For l.x (create index) perf is ~1.8X better in 5.7 vs 5.6 while 8.0 is only ~1.5X better in upstream 8.0 and ~1.4X better in Percona Server. The difference between 5.7 and 8.0 is from more CPU and read IO overhead (see cpupq, rps and rpq here). The difference between upstream and Percona is from CPU overhead (see cpupq here).
- For l.i1 (inserts+delete with secondary indexes) 5.7 and Percona Server are ~1.1X faster than 5.6 while upstream 8.0 is slightly slower than 5.6. Percona Server uses less CPU and read IO than upstream (see cpupq, rps and rpq here).
- For q100, q500 and q1000 5.7 is ~1.2X faster than 5.6 while 8.0 (upstream and Percona Server) are ~1.3X faster than 5.6 because they use less CPU (see cpupq here).
Cached by InnoDB
- Cached by InnoDB
- 8.0 - target rates were sustained
- 5.6 vs 5.7 vs 8.0 - 5.6.51 failed for the q100 step
- IO-bound
- 8.0 - a13, a20 and a22 configs for upstream 8.0.33 failed for the q1000 step. These configs enable limits on the max purge lag to avoid too much MVCC GC debt and the limits were reached during q1000 for which the targets are 8000 inserts/s and 8000 delete/s (8x1000). The value from limits on purge lag can be seen in the Impact of MVCC GC section below where a query can take ~100 seconds without a purge lag limit vs ~30 milliseconds with the limit.
- 5.6 vs 5.7 vs 8.0 - target rates were sustained
- Cached by InnoDB
- 8.0
- not that interesting
- 5.6 vs 5.7 vs 8.0
- results for 5.7 and 8.0 are mostly better than 5.6
- IO-bound
- 8.0
- on the l.i1 step the a21 config has a bad outlier, 18.493s for a write, while Percona Server has outliers that are worse than upstream.
- 5.6 vs 5.7 vs 8.0
- For l.i1 the insert rate and and max response time charts are similar between upstream 8.0 (base config) and Percona Server (ps2 config).
- For q1000 Percona Server with the ps2 config has less variance for inserts (see here) vs upstream InnoDB (see here).
- For l.i0 upstream 8.0 is similar to Percona Server
- For l.i1 the insert rate charts are similar -- a slow rise over ~4000s then stable while all have a bit of response time variance
- For q100 there is more query throughput variance with Percona Server and the ps2 config vs upstream 8.0
- For q1000 there is more query throughput variance with Percona Server and the ps2 config vs upstream 8.0
This query can be a problem for InnoDB (regardless of tuning) and for MyRocks (dependent on tuning). It isn't a problem for Postgres.
A blog post with more details is here.
The query response time in seconds for the query done at the start of the benchmark step. It gets slow starting with the steps that follow l.i1 because l.i1 creates too much MVCC GC debt. Some configs (a13, a20, a21) avoid the slow query by enabling a limit on the max purge lag. Other configs (a10, a22) avoid the slow query by making sure that the innodb_page_cleaners, innodb_purge_threads and innodb_buffer_pool_instances have the same value.
Cached by InnoDB
IO-bound
Conclusion?
ReplyDeleteI don't think that Percona Server fixes the problem of InnoDB write back falling behind leading to single-page flush stalls.
Delete