Small Datum: Tuning Postgres 15.2 and 15.3 for the Insert Benchmark on a small server

Wednesday, August 16, 2023

Tuning Postgres 15.2 and 15.3 for the Insert Benchmark on a small server

I used the Insert Benchmark to find a good configuration for Postgres 15.2 and 15.3 on a small server. Results for Postgres 15 on a medium server are here.

tl;dr

I had to repeat many benchmarks a few times to make sure I ran things for a long enough time so that results would be stable. I am happy to have finished this so I can move on to testing Postgres 16 beta releases.

Which configurations will I use in the future for the small server?

Builds

I compiled Postgres 15.2 and 15.3 from source. The builds are named o3_native_lto and used -O3 -march=native -mtune=native -flto.

Benchmark

The insert benchmark was run in two setups.

cached by Postgres - all tables are cached by Postgres
IO-bound - the database is larger than memory

This benchmark used the Beelink server explained here that has 8 cores, 16G RAM and 1TB of NVMe SSD with XFS and Ubuntu 22.04.

The benchmark is run with 1 client and 1 table. The benchmark is a sequence of steps.

l.i0

insert X million rows per table where X is 20 for cached and 800 for IO-bound

create 3 secondary indexes. I usually ignore performance from this step.

l.i1

insert and delete another X million rows per table with secondary index maintenance where X is 200 for cached and 20 for IO-bound. The number of rows/table at the end of the benchmark step matches the number at the start with inserts done to the table head and the deletes done from the tail. This step took ~15,000 seconds for cached ~9,000 seconds for IO-bound.

q100, q500, q1000

do queries as fast as possible with 100, 500 and 1000 inserts/s/client and the same rate for deletes/s done in the background. Run for 7200 seconds.

Configurations

All of the configurations are here and the base configuration, from which the following configs are derived, is here. The additional configs are:

a1 - adds wal_compression=lz4
a10 - adds autovacuum_work_mem=256MB
a12 - adds wal_buffers=64MB
a15 - adds max_wal_size=10GB
a16 - adds autovacuum_vacuum_scale_factor=0.05, autovacuum_vacuum_insert_scale_factor=0.05
a24 - adds autovacuum_vacuum_cost_limit=4000
a25 - reduces shared_buffers to 2560MB

Results

Reports are here for Cached by Postgres, IO-bound.

I analyze these in terms of:

average throughput
fairness between benchmark clients
SLA for background inserts and deletes
response time histograms

Results: average throughput

Cached by Postgres (see here)

For l.i0 and l.x results are similar for all configs
The database size grows by ~1.68X from the end of l.x to the end of l.i1. This is reasonable.
For l.i1

PG 15.3 gets ~10% more throughput than 15.2 with some configs (base, a1, a10, a12, a15, a24). From the HW perf metrics 15.3 uses less CPU/operation than 15.2 (see the cpupq column)
PG 15.2 and 15.3 get ~2X more throughput with the a16 config vs the base config. From the HW perf metrics the a16 config uses less CPU/operation and writes less to storage per operation (see cpupq and wkbpi columns)
PG 15.2 gets ~15% less throughput with the a25 config vs the base config

For q100, q500 and q1000 results are similar for all configs

IO-bound

For l.i0 and l.x results are similar for all configs
For l.i1

Bad configs that get at least 10% less throughput than the base config with 15.2

With PG 15.2 - a12, a15, a25
With PG 15.3 - a1, a12, a15 (results for a15 and a25 are odd)

Good configs that get at least 10% more throughput than the base config with 15.2

With PG 15.2 - a10, a16, a24
With PG 15.3 - base, a10, a24, a25

For q100, q500 and q1000 results are similar for all configs

What is a good config?

For Cached by Postgres it might combine a1, a10, a16 and a24.
For IO-bound it might combine a10, a16 and a24.

Results: fairness between benchmark clients

Fairness is a big problem for Postgres on larger servers but not here on a small server because there is only one client. By fairness I mean do the benchmark clients finish at the same time during the l.i1 benchmark step.

Results: SLA for background inserts and deletes

From the last table in the Summary sections

Cached by Postgres

All versions sustain the target insert rates

IO-bound

All versions sustain the target insert rates for q100 and q500
Many versions fail to sustain the target insert rate (1000/s) for q1000

For PG 15.2 the failures are with: base, a1, a10, a16, a24
For PG 15.3 the failures are with: base, a24, a25

Results: response time histograms

Histograms are here for Cached by Postgres. Histograms are similar for all configs.

Histograms are here for IO-bound.

For l.i0 the a15 config max response time is 1.157s while others are <= 100ms
For l.i1 the histograms are similar
For q100, q500 and q1000 the histograms are similar

Small Datum

Wednesday, August 16, 2023

Tuning Postgres 15.2 and 15.3 for the Insert Benchmark on a small server

No comments:

Post a Comment

Postgres 18 beta1: small server, IO-bound Insert Benchmark (v2)