Are there well known names for the things that trigger dirty page writeback with an update-in-place B-Tree? Some of my performance discussions would be easier if I could point to those definitions.
I spent many years keeping InnoDB, an update-in-place B-Tree, happy in production and a shorter time with WiredTiger, a copy-on-write random (CoW-R) B-Tree. At work I recently revisited the topic of checkpoint and wondered if there were good names for the things that trigger page dirty page writeback. Alas, my InnoDB expertise isn't what it used to be.
InnoDB implements fuzzy checkpointing. While the reference manual page for this is too brief, the advice on that page is excellent -- a larger redo log is good because that reduces write-amplification. A short pitch for fuzzy checkpointing is that it avoids writeback storms, the kind that used to plague Postgres. When you ask a storage device to write GBs of data in a short amount of time there will be a queue and requests (reads, log fsync) that encounter that queue will wait. By redo log here I mean the sum of the sizes of the N redo logs you will use with InnoDB. The write-amp is reduced because LSN triggered writeback (using the terminology defined below) is reduced.
Back to my point about the things that trigger writeback. I will suggest names here. Hopefully I will learn that these have already been named and I can use the existing names. Otherwise I will try to use the names in the future.
The triggers are:
- shutdown - for InnoDB writeback is done on shutdown when innodb_fast_shutdown = 0 or 1. Writeback is not done when it is set to 2, but that means crash recovery is needed on startup. Back in the day when we did web-scale MySQL with spinning disks there were many slow shutdowns and at times the MySQL provided script that drove the shutdown would timeout (we modified that script to avoid timeouts).
- LRU - writeback a dirty page when it reaches the end of the LRU but the memory can't be reused to store another block until the dirty page has been written back. This can occur when a user query needs to read a block into the buffer pool but all pages are in use (single-page writeback with InnoDB, bad for performance). But InnoDB also has a background thread that schedules writeback for dirty pages that are close to the LRU tail to avoid single-page writeback and the innodb_lru_scan_depth option controls how far from the tail it searches.
- capacity - writeback dirty pages because the buffer pool has too many dirty pages. InnoDB has innodb_max_dirty_pages_pct to define too many. A limit is needed because it controls how long shutdown and crash recovery will take. With InnoDB these pages will be found from the end of the flush (dirty page) list but I want to distinguish this trigger from the previous one. Also with InnoDB this is triggered by one of the many tasks done by background threads.
- LSN - writeback dirty pages for which the oldest commit is too old. For InnoDB too old means that the redo log that has changes for that commit will soon be recycled. There are many great resources for this topic written by others. With InnoDB this is triggered by a background thread.
- Laurynas Biveinis reminded me about innodb_lru_scan_depth. It is funny I forgot about that because long ago I had a production ready rewrite of it because the older code suffered from mutex contention. But before I could deploy it upstream had their own rewrite. I wasn't aware of their work-in-progress. But the new tests I wrote for my version found at least one bug in their rewrite, so it wasn't a complete waste of time.
- While I write about update-in-place B-Tree above this is really about update-in-place and also includes heap-organized tables as used by Postgres, Oracle, etc
- From Inaam, correction about usage of the flush list and names that I like for the writeback triggers: shutdown flushing, checkpoint flushing, dirty page flushing and LRU flushing
- Much info from JFG about InnoDB internals related to writeback