Thursday, May 4, 2023

The insert benchmark, Postgres versions 11 to 15, a small server

I used the insert benchmark to search for CPU regressions in Postgres. The workload is CPU bound, fits in the Postgres buffer pool and has low concurrency. I used Postgres versions 11.19, 12.14, 13.10, 14.7, 15.1 and 15.2.

Once again, Postgres has done a great job at avoiding performance regressions.

tl;dr

  • From version 11.19 to 12.14
    • Write throughput is similar to 4% better in 12.14 (see l.i0, l.i1 below)
    • Read throughput is ~10% better in 12.14 (see q100.1, q500.1, q1000.1 below)
  • From version 12.14 to 15.2 performance is similar
  • The o3_native_lto build has the best performance and is from 5% to 20% faster than the def build. The o3_native_lto build benefits a little bit from -O3 and CPU specific optimizations, but mostly from link time optimization

Benchmarks

An overview of the insert benchmark is here and here. The insert benchmark was run for a cached database with both 1 and 4 clients. For 1 client the benchmark used 1 table. For 4 clients the benchmark was first run with 4 tables (1 client/table) and then again with 1 table (all clients shared the table). The read+write steps (q100.1, q500.1, q1000.1) were run for 1800 seconds each.

The benchmark server is a Beelink SER 4700u described here with 8 AMD cores, 16G of RAM and 1T of NVMe SSD. The OS is Ubuntu 22.04 and the filesystem is XFS.

Benchmarks were repeated for two configurations:
  • cached by Postgres - all data fits in the Postgres buffer pool
  • cached by OS - all data fits in the OS page cache but not the Postgres buffer pool. The buffer pool size is 1G and the database was ~10G at test end. 
The configuration files are here for:
The benchmark is a sequence of steps:

  • l.i0 - insert 20 million rows without secondary indexes
  • l.x - create 3 secondary indexes. I usually ignore results from this step.
  • l.i1 - insert another 20 million rows with the overhead of secondary index maintenance
  • q100.1 - do queries as fast as possible with 100 inserts/s/thread done in the background
  • q500.1 - do queries as fast as possible with 500 inserts/s/thread done in the background
  • q1000.1 - do queries as fast as possible with 1000 inserts/s/thread done in the background
Builds

For Postgres 15.1 I have results for several builds. A description of the builds is here. For Postgres versions 11.19, 12.14, 13.10, 14.7 and 15.2 I only have results for the o3_native_lto build as that tends to have the best performance.

Reports

Performance summaries generated by shell scripts are below. A short guide to these results is here. There are two types of reports. The version 15.1 builds report compares performance for Postgres 15.1 using different compiler optimizations. The versions 11 to 15 report compares performance for Postgres versions 11.19, 12.14, 13.10, 14.7, 15.1 and 15.2. For each type of report there are three variations:
  • 1 client, 1 table - used 1 client & 1 table. This is best for spotting CPU regressions.
  • 4 clients, 4 tables - used 4 clients & 4 tables, each client has their own table
  • 4 clients, 1 table - used 4 clients sharing 1 table
Reports for cached by Postgres:
Reports for cached by OS:



No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...