Wednesday, August 16, 2023

Tuning Postgres 15.2 and 15.3 for the Insert Benchmark on a small server

I used the Insert Benchmark to find a good configuration for Postgres 15.2 and 15.3 on a small server. Results for Postgres 15 on a medium server are here.

tl;dr

  • I had to repeat many benchmarks a few times to make sure I ran things for a long enough time so that results would be stable. I am happy to have finished this so I can move on to testing Postgres 16 beta releases.
Which configurations will I use in the future for the small server?

Builds

I compiled Postgres 15.2 and 15.3 from source. The builds are named o3_native_lto and used -O3 -march=native -mtune=native -flto.

Benchmark

The insert benchmark was run in two setups.

  • cached by Postgres - all tables are cached by Postgres
  • IO-bound - the database is larger than memory

This benchmark used the Beelink server explained here that has 8 cores, 16G RAM and 1TB of NVMe SSD with XFS and Ubuntu 22.04. 

The benchmark is run with 1 client and 1 table. The benchmark is a sequence of steps.

  • l.i0
    • insert X million rows per table where X is 20 for cached and 800 for IO-bound
  • l.x
    • create 3 secondary indexes. I usually ignore performance from this step.
  • l.i1
    • insert and delete another X million rows per table with secondary index maintenance where X is 200 for cached and 20 for IO-bound. The number of rows/table at the end of the benchmark step matches the number at the start with inserts done to the table head and the deletes done from the tail. This step took ~15,000 seconds for cached ~9,000 seconds for IO-bound.
  • q100, q500, q1000
    • do queries as fast as possible with 100, 500 and 1000 inserts/s/client and the same rate for deletes/s done in the background. Run for 7200 seconds.
Configurations

All of the configurations are here and the base configuration, from which the following configs are derived, is here. The additional configs are:
  • a1 - adds wal_compression=lz4
  • a10 - adds autovacuum_work_mem=256MB
  • a12 - adds wal_buffers=64MB
  • a15 - adds max_wal_size=10GB
  • a16 - adds autovacuum_vacuum_scale_factor=0.05, autovacuum_vacuum_insert_scale_factor=0.05
  • a24 - adds autovacuum_vacuum_cost_limit=4000
  • a25 - reduces shared_buffers to 2560MB
Results

Reports are here for Cached by Postgres, IO-bound.

I analyze these in terms of:
  • average throughput
  • fairness between benchmark clients
  • SLA for background inserts and deletes
  • response time histograms
Results: average throughput

Cached by Postgres (see here)
  • For l.i0 and l.x results are similar for all configs
  • The database size grows by ~1.68X from the end of l.x to the end of l.i1. This is reasonable.
  • For l.i1
    • PG 15.3 gets ~10% more throughput than 15.2 with some configs (base, a1, a10, a12, a15, a24). From the HW perf metrics 15.3 uses less CPU/operation than 15.2 (see the cpupq column)
    • PG 15.2 and 15.3 get ~2X more throughput with the a16 config vs the base config. From the HW perf metrics the a16 config uses less CPU/operation and writes less to storage per operation (see cpupq and wkbpi columns)
    • PG 15.2 gets ~15% less throughput with the a25 config vs the base config
  • For q100, q500 and q1000 results are similar for all configs
  • For l.i0 and l.x results are similar for all configs
  • For l.i1
    • Bad configs that get at least 10% less throughput than the base config with 15.2
      • With PG 15.2 - a12, a15, a25
      • With PG 15.3 - a1, a12, a15 (results for a15 and a25 are odd)
    • Good configs that get at least 10% more throughput than the base config with 15.2
      • With PG 15.2 - a10, a16, a24
      • With PG 15.3 - base, a10, a24, a25
    • For q100, q500 and q1000 results are similar for all configs
What is a good config? 
  • For Cached by Postgres it might combine a1, a10, a16 and a24.
  • For IO-bound it might combine a10, a16 and a24.

Results: fairness between benchmark clients

Fairness is a big problem for Postgres on larger servers but not here on a small server because there is only one client. By fairness I mean do the benchmark clients finish at the same time during the l.i1 benchmark step.

Results: SLA for background inserts and deletes

From the last table in the Summary sections
  • Cached by Postgres
    • All versions sustain the target insert rates
  • IO-bound
    • All versions sustain the target insert rates for q100 and q500
    • Many versions fail to sustain the target insert rate (1000/s) for q1000
      • For PG 15.2 the failures are with: base, a1, a10, a16, a24
      • For PG 15.3 the failures are with: base, a24, a25
Results: response time histograms

Histograms are here for Cached by Postgres. Histograms are similar for all configs.

Histograms are here for IO-bound
  • For l.i0 the a15 config max response time is 1.157s while others are <= 100ms
  • For l.i1 the histograms are similar
  • For q100, q500 and q1000 the histograms are similar















No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...