Thursday, June 1, 2023

Tuning Postgres and the new insert benchmark, round 3

This has results for the insert benchmark on a medium server when Postgres caches the database. The previous post is for a similar setup except the database was cached by the OS but not by Postgres. 

tl;dr

  • the base config I have been using is good once I tuned autovacuum to be more agressive
  • the big improvement is from adding autovacuum_scale_factor=0.05 and autovacuum_insert_scale_factor=0.05 
  • adding wal_compression=lz4 has a small benefit
  • several of the changes made performance much worse
  • increasing autovacuum_work_mem didn't help here but will probably help on IO-bound tests

Benchmark

This continues to use Postgres 15.3, a c2-standard-30 server with 15 cores and the insert benchmark with delete_per_insert enabled to avoid growing the table so that write-heavy benchmark steps can run for a long time while keeping the working set in memory.

The configurations all use shared_buffers=70G and the database is cached by Postgres. All of the config files are here. The base config file is cx7a_gcp_c2s30 and then the other config files make a small change to the base. The biggest win is already in the base config, and is the tuning done to make autovacuum more aggressive (see here). Of the configs, and using abbreviated names:

  • one of them has a significant benefit: a16
  • a few of them have a small benefit: a1, a2, a6, a9
  • many of them hurt performance:  a5, a7, a10, a11, a12, a13, a14, a15, a17
  • a few of them can be ignored: a3, a4, a8
The configurations are:

  • conf.diff.cx7a1_gcp_c2s30 (a1) - adds wal_compression=lz4
  • conf.diff.cx7a2_gcp_c2s30 (a2) - adds backend_flush_after=256kb
  • conf.diff.cx7a3_gcp_c2s30 (a3) - adds full_page_writes=off (not for production!)
  • conf.diff.cx7a4_gcp_c2s30 (a4) - adds checkpoint_timeout=1800s, which was already set to 1800s in the base config so this can be ignored
  • conf.diff.cx7a5_gcp_c2s30 (a5) - adds min_wal_size=16384
  • conf.diff.cx7a6_gcp_c2s30 (a6) - adds backend_flush_after=256kb, initdb --wal-segsize 256
  • conf.diff.cx7a7_gcp_c2s30 (a7) - adds max_wal_size=16GB
  • conf.diff.cx7a8_gcp_c2s30 (a8) - adds shared_buffers=8GB, ignore this, I was just curious about the benefit of doubling shared buffers from the base config
  • conf.diff.cx7a9_gcp_c2s30 (a9) - adds checkpoint_timeout=900s
  • conf.diff.cx7a10_gcp_c2s30 (a10) - adds autovacuum_work_mem=256MB
  • conf.diff.cx7a11_gcp_c2s30 (a11) - adds autovacuum_work_mem=1GB
  • conf.diff.cx7a12_gcp_c2s30 (a12) - adds wal_buffers=64MB
  • conf.diff.cx7a13_gcp_c2s30 (a13) - adds wal_buffers=64MB, initdb --wal-segsize 256
  • conf.diff.cx7a14_gcp_c2s30 (a14) - adds backend_flush_after=1MB
  • conf.diff.cx7a15_gcp_c2s30 (a15) - adds max_wal_size=32GB
  • conf.diff.cx7a16_gcp_c2s30 (a16) - adds autovacuum_scale_factor=0.05, autovacuum_insert_scale_factor=0.05
  • conf.diff.cx7a17_gcp_c2s30 (a17) - reduces bgwriter_lru_maxpages from 5000 to 1000
Results

I start by comparing the impact of the config changes with the results for the base config.

The summary is here. Relative to the base config the a16 config provides the largest improvement to average throughput for l.i1 (from 154202 to 249299 inserts/s). Some configs made it much worse than the base case: a5, a7, a8, a10, a11, a12, a13, a14, a15, a17.

Next I consider the impact on worst-case response time for inserts (see here). Some configs made it worse than the base case: a7, a10, a12, a13.

Finally, I look at the charts for per-second insert rates vs time (see here). The a16 config has the best average throughput and the chart for a16 is a bit different than the others.

The conclusion is that I should definitely use a16 and probably use a1, a2, a6 and a9 for this workload.











No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...