Thursday, February 1, 2024

Updated Insert benchmark: Postgres 9.x to 16.x, large server, cached database

This has results for Postgres vs the updated Insert Benchmark with a cached workload and 24-core server with results from Postgres versions 9.0 through 16.

tl;dr

  • Postgres does a great job at avoiding regressions over time
  • Postgres 16.1 is a lot faster than 9.0.23, between ~1.2X and ~10X depending on the workload

Build + Configuration

See the previous report for more details. I used these versions: 9.0.23, 9.1.24, 9.2.24, 9.3.25, 9.4.26, 9.5.25, 9.6.24, 10.23, 11.22, 12.17, 13.13, 14.10, 15.5, 16.1. 

The configuration files are in subdirectories from here. Search for files named conf.diff.cx9a2_c24r64 which exist for each major version of Postgres.

The Benchmark

The benchmark is explained here and is run with 16 clients.

The test server is described here. It is a SuperMicro SuperWorkstation 7049A-T with 2 sockets, 24 cores/socket, hyperthreading disabled, 64G RAM and an NVMe SSD. It runs Ubuntu 22.04 and the database filesystem uses XFS with discard enabled.

The benchmark steps are:

  • l.i0
    • insert 20 million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.
  • l.x
    • create 3 secondary indexes per table. There is one connection per client.
  • l.i1
    • use 2 connections/client. One inserts 16M rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.
  • l.i2
    • like l.i1 but each transaction modifies 5 rows (small transactions) and 4M rows total
    • Waiting, vacuum and checkpoint are done at the end of this test step to reduce variance in the steps that follow.
  • qr100
    • use 3 connections/client. One does range queries for 1800 seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.
  • qp100
    • like qr100 except uses point queries on the PK index
  • qr500
    • like qr100 but the insert and delete rates are increased from 100/s to 500/s
  • qp500
    • like qp100 but the insert and delete rates are increased from 100/s to 500/s
  • qr1000
    • like qr100 but the insert and delete rates are increased from 100/s to 1000/s
  • qp1000
    • like qp100 but the insert and delete rates are increased from 100/s to 1000/s
Results

The performance report is here.

The summary has 3 tables. The first shows absolute throughput by DBMS tested per benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the insert rate for the read-write benchmark steps that have background inserts and deletes and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.

Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: 
  • insert/s for l.i0, l.i1, l.i2
  • indexed rows/s for l.x
  • range queries/s for qr100, qr500, qr1000
  • point queries/s for qp100, qp500, qp1000
Below I use colors to highlight the relative QPS values with red for <= 0.95, green for >= 1.05 and grey for values between 0.95 and 1.05.

  • The base case is pg9023_def which means Postgres 9.0.23
  • For most of the read-write benchmark steps throughput improves a lot from 9.1.24 to 9.2.24 and has been stable since then. The exception is the last step (qp1000) for which throughput is flat. It might be that writeback and/or vacuum hurts query throughput by that point.
  • For the write-heavy steps (l.i0, l.x, l.i1, l.i2) throughput improves a lot
    • l.i0 - things get a lot better in Postgres 9.4.26
    • l.x - things get worse from 9.3.25 through 10.23 and then improve with 11.22
    • l.i1 - things get a lot better in Postgres 9.5.25 and then again in 12.17
    • l.i2 - things get better in 9.5, worse in 9.6 through 11, better in 12 and then are stable. I assume most of the changes are from problems and improvements related to query planner CPU overhead during DELETE statements (see the comments about get_actual_variable_range)
  • Comparing throughput in Postgres 16.1 to 9.0.23
    • Write-heavy
      • l.i0, l.x, l.i1, l.i2 - relative QPS is 3.121.1710.422.14
    • Range queries
      • qr100, qr500, qr1000 - relative QPS is 1.231.341.49
    • Point queries
      • qp100, qp500, qp1000 - relative QPS is 1.251.291.46

No comments:

Post a Comment

How efficient is RocksDB for IO-bound, point-query workloads?

How efficient is RocksDB for workloads that are IO-bound and read-only? One way to answer this is to measure the CPU overhead from RocksDB a...