I used the insert benchmark to search for CPU regressions in Postgres. The workload is CPU bound, fits in the Postgres buffer pool and has low concurrency. I used Postgres versions 11.19, 12.14, 13.10, 14.7, 15.1 and 15.2.
Once again, Postgres has done a great job at avoiding performance regressions.
- From version 11.19 to 12.14
- Write throughput is similar to 4% better in 12.14 (see l.i0, l.i1 below)
- Read throughput is ~10% better in 12.14 (see q100.1, q500.1, q1000.1 below)
- From version 12.14 to 15.2 performance is similar
- The o3_native_lto build has the best performance and is from 5% to 20% faster than the def build. The o3_native_lto build benefits a little bit from -O3 and CPU specific optimizations, but mostly from link time optimization
The benchmark server is a Beelink SER 4700u described here with 8 AMD cores, 16G of RAM and 1T of NVMe SSD. The OS is Ubuntu 22.04 and the filesystem is XFS.
Benchmarks were repeated for two configurations:
- cached by Postgres - all data fits in the Postgres buffer pool
- cached by OS - all data fits in the OS page cache but not the Postgres buffer pool. The buffer pool size is 1G and the database was ~10G at test end.
- l.i0 - insert 20 million rows without secondary indexes
- l.x - create 3 secondary indexes. I usually ignore results from this step.
- l.i1 - insert another 20 million rows with the overhead of secondary index maintenance
- q100.1 - do queries as fast as possible with 100 inserts/s/thread done in the background
- q500.1 - do queries as fast as possible with 500 inserts/s/thread done in the background
- q1000.1 - do queries as fast as possible with 1000 inserts/s/thread done in the background
- 1 client, 1 table - used 1 client & 1 table. This is best for spotting CPU regressions.
- 4 clients, 4 tables - used 4 clients & 4 tables, each client has their own table
- 4 clients, 1 table - used 4 clients sharing 1 table