Saturday, June 8, 2024

Postgres 17beta1 vs sysbench on a large server

This has results from the sysbench benchmark for many versions of Postgres on a large server with a cached and high-concurrency workload. Results from a small server are here. This work was done by Small Datum LLC.

tl;dr

  • There are no regressions and many improvements. This is an impressive release.
  • Postgres 17beta1 does better for writes relative to 10.23 here with high concurrency than it does on the small server with less concurrency. This commit might explain the improvements.
  • The hot-points query became ~2X faster in 17beta1 both on the large and small server

Builds and configuration

I tested Postgres versions 10.23, 11.22, 12.19, 13.15, 14.12, 15.7, 16.3 and 17beta1. Everything was compiled from source. The config files are here and I used the conf.diff.cx9a2_c32r128 variants.

Benchmarks

I used sysbench and my usage is explained here. There are 42 microbenchmarks and most test only 1 type of SQL statement.

Tests were run on a Dell 7865 Tower with 32 cores (1 socket, SMT disabled, AMD Ryzen Threadripper Pro 5975WX), 128G RAM, 2 m.2 SSD (2 TB each, RAID SW 0, XFS) and Ubuntu 22.04. The server is described here as v7.

The benchmark is run with 24 threads (concurrent connections) sharing 8 tables. There were 10M rows per table, 80M total. The command line was:
bash r.sh 8 10000000 300 600 md0 1 1 24
In all cases
  • each microbenchmark runs for 300 seconds if read-only and 600 seconds otherwise
  • prepared statements were enabled
Results

For the results below I split the 42 microbenchmarks into 5 groups -- 2 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. The spreadsheet with all data and charts is here. For each group I present a chart and a table with summary statistics.

All of the charts have relative throughput on the y-axis where that is (QPS for $me) / (QPS for $base), $me is some DBMS version (for example Postgres 16.3) and $base is Postgres 10.23. The y-axis doesn't start at 0 to improve readability. When the relative throughput is > 1 then that version is faster than the base case.

The legend under the x-axis truncates the names I use for the microbenchmarks and I don't know how to fix that other than sharing links (see above) to the Google Sheets I used.

Summary statistics

This section presents summary statistics of the throughput from Postgres 17beta1 relative to 10.23 for each microbenchmark group. A value greater than one means that 17beta1 gets more throughput than 10.23 which is good news. And there is much good news in the results below. The most extreme good news is the result from the hot-points microbenchmark where 17beta1 gets ~2X more QPS than previous versions. The results here are similar to the results from the small server with one exception. The improvement for writes (median and average) with Postgres 17beta1 is much larger here than on the small server.

minmaxavgmedianstdev
point-11.002.051.131.050.31
point-21.001.051.031.030.02
range-11.021.141.071.070.03
range-21.031.071.041.040.02
writes1.061.671.311.340.19

Graphs

There is one graph per microbenchmark group. The y-axis doesn't begin at zero to improve readability.

Graphs for point queries, part 1. The line for hot points (the left-most group) is truncated for 17beta1 because the value is ~2 and I want to focus on readability for results close to 1.

Full table scan became faster starting in Postgres 14 and the improvement remains.

Several of the microbenchmarks became much faster in 17beta1. Hopefully we can eventually assign credit to some of the changes that explain the improvements.


Why are writes faster in 17beta1?

I have a file that includes results from vmstat and iostat per microbenchmark and it includes both absolute and normalized values where normalized means it is metric / QPS which lets me see how much hardware is used per operation. The file is here and the legend for it is:
  • cpu/o - vmstat us+sy / QPS, CPU per operation
  • cs/o - vmstat cs / QPS, context switches per operation
  • r/o - iostat r/s / QPS, storage reads per operation
  • rKB/o - iostat read KB / QPS, storage KB read per operation
  • wKB/o - iostat write KB / QPS, storage KB written per operation
  • o/s - operations/s (throughput, QPS, etc)
  • dbms - specifies the database version and config file
There are large speedups in Postgres 17beta1 for update-index and update-nonindex. From the metrics for update-index the context switch rate is half that compared to the rate in 16.3 and the CPU overhead (cpu/o) also dropped. And the changes in metrics for update-nonindex are similar. Something was done to reduce mutex contention in 17beta1 which is good news.



    No comments:

    Post a Comment

    RocksDB on a big server: LRU vs hyperclock, v2

    This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...