In 2020 I published benchmark results to understand the impact of DBMS tuning options for the insert benchmark with MySQL and Postgres (here and here). One of the results was that wal_compression = pglz used more CPU than expected. I asked aloud whether support for lz4 would help.
Well, the Postgres community has delivered and either lz4 or zstd can be used in place of pglz in the upcoming Postgres version 15. Docs are here. Support for lz4 to compress TOAST was added in version 14.
tl;dr - wal_compression=lz4 or =zstd are a big improvement over =pglz
Postgres optionally writes page images (the pre-image) to the WAL the first time a page is to be written back after a checkpoint. This provides protection from torn-page writes. InnoDB solves this via the doublewrite buffer. An LSM like RocksDB doesn't modify previously written database pages so it doesn't have something similar.
Writing pages to the redo log increases the frequency at which log files must be rotated. A possible side effect of that is to trigger checkpoints more often which means more page writeback IO and this can hurt performance.
Compressing those page images can reduce the write rate which then reduces the rate at which checkpoint happens (and also reducing the rate at which checkout IO happens). Prior to Postgres version 15 the only option for wal_compression was pglz. With Postgres version 15 there are new options -- lz4 and zstd. My github pages repo has a benchmark report for the Insert Benchmark run on a small server with an IO-bound workload to understand the impact of this option.
Results for IO-bound
This section has results for a benchmark where the database is much larger than RAM. The report is here. For this benchmark I used four configurations:
- cx5 - wal_compression=off
- cx5a - wal_compression=pglz
- cx5b - wall_compression=lz4
- cx5c - wall_compression=zstd
Summarizing the summary from the (shell script generated) report:
- Loads in PK order without secondary indexes have similar performance regardless of the wal_compression setting. This isn't a surprise as PK order loads should have the least writeback per insert. The metrics are here.
- Create secondary index is ~1.7X faster with wal_compression=lz4 compared to =pglz. From the metrics with pglz the CPU overhead is almost 2X larger than with lz4 (see cpupq column). The performance with =lz4 is the same as with =off.
- Inserts that are in PK order but random order for the 3 secondary indexes are faster with =lz4 or =zstd than with =off or =pglz where =lz4 is ~1.4X faster than =pglz. From the metrics the use of =lz4 or =zstd reduces IO somewhat (see wkbpi column) and CPU a lot (see cpupq column).