Wednesday, January 3, 2024

innodb_log_writer_threads and the Insert Benchmark

I am wary of innodb_log_writer_threads=ON. It is on by default and has been a problem for me in this past. It would be great to learn from people for whom it is useful. This is a follow up to a previous post where I mentioned that things looked bad with innodb_log_writer_threads. I opened bug 113485 to suggest that one of the following should be done: make the default =OFF or at least let innodb_dedicated_server disable it on small servers.

The MySQL docs suggest only using =ON for high-concurrency workloads, alas it is =ON by default.

Dedicated log writer threads can improve performance on high-concurrency systems, but for low-concurrency systems, disabling dedicated log writer threads provides better performance.

tl;dr, v1

  • innodb_log_writer_threads seems to make things worse most of the time
  • the workaround is innodb_log_writer_threads=OFF
  • sadly, it is =ON by default

tl;dr, v2

  • Sometimes innodb_log_writer_threads helps, more often it doesn't in my tests
  • innodb_log_writer_threads increases the frequency of fsyncs per commit by a large amount -- between 3X and 200X depending on the setup. The impact from this is less obvious on the 40-core server that has a fast fsync. The impact is really bad on a server that doesn't have a fast fsync.
  • innodb_log_writer_threads shouldn't be used on small servers with <= X CPU cores. For the servers I tested X=8 but I suspect it is even larger.
  • It can be lousy for performance when fsync latency is high (several millsecs)
  • I filed a feature request to change the default for innodb_log_writer_threads to =OFF and/or detect when the number of CPU cores is not large and disable it by default.

The bugs

The redo log code was changed in a big way in MySQL 8.0 and my experience with that has not been great. It was nice to get the ability to disable the new features, but that (innodb_log_writer_threads) didn't arrive until 8.0.22.
  • I reported bug 90670 for MySQL 8.0.11. This is a crashing bug that was fixed in 8.0.13. It was found via sysbench. I assume I could have found it with the Insert Benchmark.
  • I reported bug 90890 for MySQL 8.0.11. This is a perf bug that was fixed in 8.0.14. It was found via the Insert Benchmark. The perf bug is that the CPU overhead/operation had doubled vs 5.7 releases.
  • I reported bug 90993 for MySQL 8.0.11. This is a crashing bug that was fixed in 8.0.13. It was found via the Insert Benchmark.
  • I reported bug 102238 for MySQL 8.0.22. This is a perf bug that is still open and the workaround is to use innodb_log_writer_threads=OFF. 
The tuning options

When innodb_log_writer_threads=ON there will be more spinning, which not only means more CPU can be burned, but also that there are 3 new config options for tuning how the spin wait loops happen. By my count that is 3 options too many. I did not try to tune these. From the docs on this feature the config options are:
Benchmark

I share results from 3 servers
  • 8-core
    • 8 CPU cores, 16G RAM, XFS, 1 m.2 device, Ubuntu 22.04
    • benchmark uses 1 client
  • 32-core
    • 32 cores, hyperthreads off, 128G RAM, XFS with SW RAID 0 over 2 m.2 devices, Ubuntu 22.04
    • fsync latency is not great on this host, maybe ~5 millisecs
    • benchmark uses 12 clients
  • 40-core
    • 40 cores, 80 HW threads, hyperthreads on, 256G RAM, XFS with SW RAID 0 over 4 SSDs
    • fsync latency is much better than on the 32-core host, maybe <= 200 microsecs
    • benchmark uses 16, 24, 32 and 40 clients
I used the Insert Benchmark with a cached database. With X clients there were X tables and a client per table. I focus on the first three benchmarks steps that are write-heavy. The spreadsheet with all results is here. The benchmark steps are:
  • l.i0
    • does the initial load in PK order without secondary indexes and 1 connection/client. This inserts 20M rows/table.
    • each commit inserts 100 rows for big transactions or 10 rows for small transactions. Inserts are in key order so this only makes a few pages dirty. And there are no secondary indexes.
  • l.x
    • creates 3 secondary indexes per table. There is 1 connection/client.
  • l.i1
    • does random inserts matched by random deletes. There are 2 connections/client -- one for inserts, one for deletes. This step is the most likely to make the CPU oversubscribed.
    • each commit inserts 50 rows for big transactions or 50 rows for small transactions. For each row there are also 3 secondary indexes to maintain which increases the amount of redo per commit. Inserts are in PK order but not in order for any of the secondary indexes so these make more pages dirty compared to l.i0.
The benchmark was repeated in 2 configurations -- for innodb_log_writer_threads =ON and =OFF. The my.cnf files are here. There are 2 files per server -- one with innodb_log_writer_threads =ON and one with it =OFF. Both have sync_binlog=1 and innodb_flush_logs_at_trx_commit=1. The my.cnf files are here. I did not tune the 3 innodb_log_writer_threads options.

The benchmark was repeated for two workload types -- big and small transactions. For big transactions I used the Insert Benchmark as-is so that the rows/commit is 100 for l.i0 and 50 for l.i1. For small transactions I reduced that to 10 for l.i0 and 5 for l.i1. 

Results

Throughput in the charts below measures the following
  • l.i0 - inserts/second
  • l.x - indexed rows/second
  • l.i1 - inserts/second
These charts show the throughput for MySQL with innodb_log_writer_threads =OFF relative =ON. A value greater than 1 means that MySQL is faster with =OFF. Below I use LWT in place of innodb_log_writer_threads.

For the 40-core server
  • l.i0 - throughput is always (slightly) better with LWT=OFF
  • l.x - throughput is always (slightly) better with LWT=ON
  • l.i1 - results are mixed.
    • The best case for LWT=ON is with 40 clients and big transactions. TODO was CPU saturated? Note that LWT=ON does better relative to LWT=OFF as the concurrency increases.
    • The best cases for LWT=OFF are with lower concurrency levels.
For the 32-core server
  • I had to use log scale because the differences were huge for l.i1
  • Fsync latency on this host might be ~5 milliseconds which is large
  • LWT=OFF is up to ~5X faster for l.i0 and up to ~100X for l.i1 relative to LWT=ON
  • I try to explain the performance differences in the sections that follow
For the 8-core server
  • LWT=OFF is always faster than =ON, up to 3X faster
The big problem

The big problem is that with innodb_log_writer_threads =ON the number of fsyncs per commit is between 3X and 200X larger vs =OFF. The extra details about iostat, vmstat and the fsync frequency (via the OS fsyncs counters) are here for l.i0 and for l.i1.

My helper scripts archive the output from SHOW ENGINE INNODB STATUS at the end of each benchmark step and from that I grep the line with OS fsyncs. The l.i0 and l.i1 benchmark steps do the same number of inserts for LWT =ON and =OFF so I just compute the ratio of (fsyncs with =ON) / (fsyncs with =OFF) and the results are much worse then I expected. I didn't try to change the 3 options related to the LWT feature, other than innodb_log_writer_threads=OFF.

The table below lists the fsync ratio which is:
(fsyncs with innodb_log_writer_threads =ON) / (fsyncs with it =OFF)

Server# clientstransaction sizestepfsync ratio
40-core24smalll.i0~3.5
40-core24bigl.i0~3.2
40-core40smalll.i0~4.2
40-core40bigl.i0~5
24-core12smalll.i1~18
24-core12bigl.i1~200
40-core24smalll.i1~4.6
40-core24bigl.i1~7.5
40-core40smalll.i1~5.8
40-core40bigl.i1~12.3


Explaining: 40-core server

Details from iostat and vmstat are here for l.i0 and for l.i1
  • context switches/operation (cs/q) are larger with LWT=ON
  • CPU/operation (cpu/q) is larger with LWT=ON
Explaining: 32-core server

Details from iostat and vmstat are here for l.i0 and for l.i1
  • context switches/operation (cs/q) are much larger with LWT=ON
  • CPU/operation (cpu/q) is much larger with LWT=ON
Explaining: 8-core server

Details from iostat and vmstat are here for l.i0 and for l.i1
  • context switches/operation (cs/q) are much larger with LWT=ON
  • CPU/operation (cpu/q) is much larger with LWT=ON


No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...