Earlier this year I spent time comparing Postgres 16 beta1 with Postgres 15 to check for regressions, then realized I needed to confirm the configurations I use were good and that took longer than expected. Now I return to testing the beta releases. A post about Postgres 15 on the same hardware is here.
In this post I compare Postgres 15.3, 15.4, 16 beta1, 16 beta2 and 16 beta3 using the Insert Benchmark on a medium server (15 cores, 120G RAM).
tl;dr
- I am repeating the tests to get more results
- Create index is ~8% faster in 16 beta vs 15.x
- For the write heavy benchmark steps the lack of fairness during the l.i1 (inserts+deletes) benchmark step is a big problem for all versions
- For the read+write benchmark steps pg16 beta3 struggles or fails to sustain the target insert rates in the IO-bound setups. With each release from 15.4 to 16 beta3 the results get worse.
I compiled Postgres 15.3, 15.4, 16 beta1, 16 beta2 and 16 beta3 from source. The builds are named o3_native_lto which is shorthand for using: -O3 -march=native -mtune=native -flto.
The insert benchmark was run in two setups.
- cached by Postgres - all tables are cached by Postgres
- IO-bound - the database is larger than memory
The benchmark used a c2-standard-30 server from GCP with Ubuntu 22.04, 15 cores, hyperthreads disabled, 120G of RAM and 1.5T of storage from RAID 0 over 4 local NVMe devices with XFS.
The benchmark is run with 8 clients and 8 tables (client per table). The benchmark is a sequence of steps.
- l.i0
- insert X million rows per table where X is 20 for cached and 500 for IO-bound
- l.x
- create 3 secondary indexes. I usually ignore performance from this step.
- l.i1
- insert and delete another X million rows per table with secondary index maintenance where X is 200 for cached by InnoDB and 40 for IO-bound. The number of rows/table at the end of the benchmark step matches the number at the start with inserts done to the table head and the deletes done from the tail. This step took 6000+ seconds for Cached by Postgres and 25,000+ seconds for IO-bound.
- q100, q500, q1000
- do queries as fast as possible with 100, 500 and 1000 inserts/s/client and the same rate for deletes/s done in the background. Run for 3600 seconds.
- wal_compression is lz4 for a27 and off for a28
- autovacuum_vacuum_cost_limit is 2000 for a27 and 4000 for a28
- max_wal_size is 70G for a27 and 32GB for a28
- average throughput
- fairness between benchmark clients
- SLA for background inserts and deletes
- For l.i1 there is a ~5% regression from 15.3 to 16 beta3. From the HW perf metrics I see ...
- A small increase (48 -> 49) in CPU/operation (see cpupq column) and average CPU utilization (vmstat us + sy, see cpups column) is 78.3 for 15.3 vs 76.5 for 16 beta3, which might be from doing more IO or an increase in IO latency.
- But from the wkbpi (KB written to storage/operation) I see that 16 beta3 does slightly less write IO than 15.3.
- The wmbps (MB/sec written to storage) numbers are correlated with the throughput implies that either storage write latency varies or some versions are able to do writeback faster.
- Results for the other benchmark steps are similar across versions
- For l.x (create index) 16 beta releases are ~8% faster than 15.x perhaps because writeback is faster. From the HW perf metrics the write rate to storage (wmbps is MB/s written to storage) is higher for 16 beta
- For l.i1 (inserts+deletes) 16 beta1 and beta2 are 11% to 15% slower than 15.3, but 16 beta3 is only 2% slower. From HW perf metrics the problem is more CPU overhead (cpupq is CPU/operation) but it is hard to tell whether that is CPU consumed by user threads or by background (vacuum). Given the lack of fairness (see section below) it is harder to reason about the root causes.
- For read+write the results for q100 and q500 are similar across versions. For q1000 versions 15.4 and 16 beta3 do much better perhaps because they failed to sustain the target insert+delete rates -- they get 7209/s and 7486/s when the target is 8000/s.
- For l.x (create index) 16 beta releases are ~8% faster than 15.x
- For l.i1 (inserts+deletes) results for all versions are similar excluding 16 beta2
- For read+write the results for q100 and q500 results for all versions are similar excluding 16 beta2 (again, what happened there?). But for q1000 only 15.3 was able to sustain the target insert rate of ~8000/s and the rate that can be sustained drops in each release following 15.3. The HW perf metrics for q1000 can be confusing. I see a large reduction in rmbps (read MB/s from storage) and wmbps (write MB/s from storage). Did storage or Postgres get slower?
- Cached by Postgres - all versions sustain the target insert rates
- IO-bound with a27 - 15.3 and 16 beta3 don't sustain the target insert rates for q1000 and 16 beta 2 almost fails for q1000
- IO-bound with a28 - all versions after 15.3 fail to sustain the target insert rates for q1000 and the failure gets worse from 15.4 to 16 beta3
No comments:
Post a Comment