Wednesday, July 22, 2020

Postgres, vacuum and the insert benchmark

I have been running benchmarks with Postgres for a few months and am slowly learning enough about Postgres to trust the results that I get. When I first read about Postgres vacuum I didn't think it would be important for the insert benchmark because the workload is insert-only. This post explains why I was wrong.

Vacuum arrived in the original POSTGRES storage system in the 1980s. Today it serves several purposes: 1) reclaim space for old versions of rows that are no longer visible, 2) update statistics for the query planner, 3) update bits in the visibility map to help index-only queries and 4) preventing transactionID wraparound.

I didn't expect vacuum to help much with the insert benchmark because it doesn't do updates or deletes and there is no space for vacuum to reclaim. Also, query plans were OK and there weren't enough transactions for wraparound to be an issue.

What I missed was the impact of the visibility map on the index-only range scans run during the insert benchmark. These bits are only set by vacuum and if the bit is not set for a heap page, then the base row must be read from the heap page during an index-only scan for any index entries that point to that heap page. The impact from reading the base row is significant when the working set is not cached but still visible when the working set is cached.

Fortunately, the insert benchmark pattern of inserts seems to be friendly to Postgres in that a heap page will become full after some inserts and then doesn't get more inserts. So once the visibility map bit gets set it remains set for that heap page.

Details

Autovacuum is triggered for a table based on the number of updates and deletes done to the table. So while an insert-only workload can benefit from vacuum when it also does range scans, the inserts won't trigger vacuum. I haven't read the code to confirm that inserts don't trigger autovacuum. The main docs aren't clear as the routine vacuuming section states that When enabled, autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples. But the docs for autovacuum_vacuum threshold state that it is only updates and deletes: Specifies the minimum number of updated or deleted tuples needed to trigger a VACUUM in any one table.

I now run the insert benchmark as a sequence of steps: load some data with only PK indexes, create secondary indexes, load more data, then read+write. The read+write step does short range scans as fast as possible and the writes are rate-limited inserts where there is a target on the insert rate per second. As part of this I optionally run vacuum after creating the secondary indexes. Each read+write test runs for 1 hour and vacuum is started at the end of each hour. While my scripts wait for vacuum to finish after creating the indexes, and don't wait for it to finish during the read+write tests that doesn't matter much because vacuum is fast during the insert benchmark.

Vacuum internals

This is a brief description. Hopefully it is truthy. The key point is that vacuum can require full index scans for every index of the vacuumed table. That can take a long time. Alas, this isn't an issue for the insert benchmark because it is insert-only and doesn't create old versions of rows so there are no full scans of all indexes during autovacuum.

The Internals of Postgres web-site is a great place to start after reading the online Postgres docs if you want to learn more -- see chapter 5 and chapter 6. Note that chapter 6 of the Internals site lists steps 3 before step 2 and I need to figure that out before I write too many more blog posts on this topic. There are many useful blog posts on this topic including this blog post.

I don't think vacuum in original POSTGRES did full index scans. I have read some of the discussion that explains why a full index scan is done today but I didn't save links to that. My preference is for vacuum to do index probes rather than full index scans to reduce the time between consecutive vacuums for the same table.

Regardless let me explain some of the work that vacuum can do to make it obvious that vacuum for a table can take a long time, and does when I run Linkbench.
  1. Scan heap pages for which visibility map bits are not set. From each heap page if a row is an old version (because delete or update) and not visible to any current or future transaction then copy the row's CTID (into an array?)
  2. If CTIDs were found in the previous step then scan all secondary indexes for this table and for each index entry determine whether its CTID is in the array from step 1. If yes then reclaim the space for that index entry.
  3. Repeat step 1 but this time reclaim the space for the dead rows.
Finally, the memory used to buffer CTIDs is determined by maintenance_work_mem. If that limit is reached then the steps (and full index scans) are repeated.


4 comments:

  1. When checksums are enabled (or wal_log_hints is ON), read queries might generate WAL (by writing FPI_FOR_HINT records), which might be unexpected. This is particularly problematic for insert-only workload, when autovacuum doesn't trigger automatically and manual vacuum is not run after a bulk load. PostgreSQL 13 should alleviate the need to run VACUUM manually, btw, a welcome improvement. See Improvement 2 here https://amitkapila16.blogspot.com/2020/05/improved-autovacuum-in-postgresql-13.html

    ReplyDelete
    Replies
    1. When inserts trigger vacuum is that a full vacuum? Because if my workload does 1 delete and then 1M inserts I want visibility map bits set but I don't want full scans of all indexes to reclaim space from the 1 delete.

      Delete
    2. (For others) Unless configured otherwise, yes. In this scenario it'd scan all indexes but doesn't really run a "VACUUM FULL" - https://twitter.com/petervgeoghegan/status/1286327819723628546?s=19

      Delete
    3. I don't know when you mean with that tweet.

      It does an scan of each index, then for each index entry probes what would an array of size 1 in this case. While the array probes will be fast the index scans can take a long time.

      So sure, there is other work that vacuum might do which won't be done here but those index scans can take a long time which means there is a lag between consecutive vacuums for the table.

      Delete

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...