This post has results for Postgres 18rc1 vs sysbench on small and large servers. Results for Postgres 18beta3 are here for a small and large server.
tl;dr
- Postgres 18 looks great
- I continue to see small CPU regressions in Postgres 18 for range queries that don't do aggregation on low-concurrency workloads. I have yet to explain that.
- The throughput for the scan microbenchmark has more variance with Postgres 18. I assume this is related to more or less work getting done by vacuum but I have yet to debug the root cause.
The servers are:
- small
- an ASUS ExpertCenter PN53 with AMD Ryzen 7735HS CPU, 32G of RAM, 8 cores with AMD SMT disabled, Ubuntu 24.04 and an NVMe device with ext4 and discard enabled.
- large32
- Dell Precision 7865 Tower Workstation with 1 socket, 128G RAM, AMD Ryzen Threadripper PRO 5975WX with 32 Cores and AMD SMT disabled, Ubuntu 24.04 and and NVMe device with ext4 and discard.
- large48
- an ax162s from Hetzner with an AMD EPYC 9454P 48-Core Processor with SMT disabled
- 2 Intel D7-P5520 NVMe storage devices with RAID 1 (3.8T each) using ext4
- 128G RAM
- Ubuntu 22.04 running the non-HWE kernel (5.5.0-118-generic)
- small server
- for versions 17.6 (x10a_c8r32) and 18 (x10b_c8r32)
- large servers
- for versions 17.6 (x10a_c32r128) and 18 (x10b_c32r128)
Benchmark
For all servers the read-heavy microbenchmarks run for 600 seconds and the write-heavy for 900 seconds.
- small server - 1 table, 50M rows
- large servers - 8 tables, 10M rows per table
- small server - 1
- large32 server - 24
- large48 servcer- 40
I provide charts below with relative QPS. The relative QPS is the following:
(QPS for some version) / (QPS for Postgres 17.6)
The numbers highlighted in yellow below might be from a small regression for range queries that don't do aggregation. But note that this does reproduce for the full table scan microbenchmark (scan). I am not certain it is a regression as this might be from non-deterministic CPU overheads for read-heavy workloads that are run after vacuum. I hope to look at CPU flamegraphs soon.
The improvements on the scan microbenchmark come from using less CPU. But I am skeptical about the improvements. I might have more work to do to make the benchmark less subject to variance from MVCC GC (vacuum here). I also struggle with that on RocksDB (compaction), but not on InnoDB (purge).
I have not see the large improvements for the insert and delete microbenchmarks on previous tests on that large server. I assume this is another case where I need to figure out how to reduce variance when I run the benchmark.