Small Datum: Tuning for the Insert Benchmark: InnoDB

This presents performance and efficiency results for InnoDB on the Insert Benchmark using MySQL 8.0.21 and a small server with 4 CPU cores, 16G of RAM and a fast SSD. The goal is to understand the impact of my.cnf options and I have results for 18 different configurations.

Summary

I use scripts to generate the performance reports. The reports are here for the in-memory, less IO-bound and more IO-bound database sizes. Some of the conclusions are specific to this workload and HW so be careful about applying this advice, but I suspect that innodb_dedicated_server should be used in most cases. A summary of the results is below.

It is hard to improve on innodb_dedicated_server. Much of the benefit for innodb_dedicated_server comes from using a large enough value for innodb_log_files_in_group. See the summaries for the in-memory, less IO-bound and more IO-bound database sizes.
The minimal configuration, my.cnf.cx0, is lousy. This isn't a big deal because it is trivial to add innodb_dedicated_server=ON to my.cnf.
The configs that use 8kb and 4kb for innodb_page_size, my.cnf.cx10a and my.cnf.cx10b, have poor create index performance (see l.x) for the in-memory and less IO-bound database sizes. I can't explain that but the HW efficiency metrics are worse for the context switch rate (cspq) and CPU overhead (cpupq).
Query throughput (qps) with the more IO-bound database was best for the configs that use 8kb and 4kb page sizes, cx10a and cx10b. The reason is less storage read IO, see rpq.
Setting innodb_buffer_pool_instances=1 also hurts the create index performance as it increases the CPU overhead (cpupq) for the in-memory and less IO-bound database sizes.
While my purpose isn't to compare InnoDB, MyRocks and Postgres the QPS results for InnoDB on the more IO-bound (500M rows) database size are odd. InnoDB does ~4 disk reads per query (see rpq) and is therefore limited to ~650 QPS per client. For Postgres, QPS is much better while rpq is close to zero.

There is much detail in the reports that I mostly ignore, including response time histograms and the HW efficiency metrics. But they can be used to explain the results and answer questions.

Tests

The insert benchmark was run for three database sizes that I call in-memory, less IO-bound and more IO-bound. The database was cached in memory for the in-memory size but not for the others. The initial load size was 20M rows for in-memory, 100M rows for less IO-bound and 500M rows for more IO-bound. Additional rows were inserted after the initial load, but most of the inserts were done by the initial load.

Tests were run at most 2 threads/connections -- one for inserts and another for queries -- and I consider that to be 1 insert benchmark client.

The benchmark has several steps as described here -- initial load, create 3 secondary indexes, load more data, read+write. Each read+write step is run for 30 minutes with a query thread that runs as fast as possible and a writer that that is rate limited to 100, 100, 200, 200, 400, 400, 600, 600, 800, 800, 1000 and 1000 inserts/s. The load more data step inserts 20M, 10M and 10M more rows for the in-memory, less IO-bound and more IO-bound workloads.

In the performance reports, the steps are named:

l.i0 - initial load
l.x - create 3 secondary indexes
l.i1 - load more data
q100.2, q200.2, ..., q1000.2 - read+write where the insert rate limit is 100, 200, ..., 1000 and the .2 means the second run at each rate. The reports focus on the second run.

Configurations

All tests were done with fsync-on-commit disabled to avoid becoming an fsync latency benchmark. The option names below leave off innodb_ to save space. I also leave off my.cnf. when naming config files.