Monday, September 9, 2024

Postgres 17rc1 vs sysbench on small & large servers: looking great

This has benchmark results for Postgres 15.8, 16.4 and 17 (beta3, rc1) using sysbench with large and small servers. A recent result for Postgres 17 beta3 from a large server is here. The server in this case is an ax162-s from Hetzner.

This work was done by Small Datum LLC.

    tl;dr

    • 17rc1 looks great - there are no big regressions and several big improvements
    • There might be small regressions (~2%) from Postgres 15 and 16 to 17 but this benchmark was not setup to diagnose that.
    Builds, configuration and hardware

    I compiled Postgres versions 15.8, 16.4, 17beta3 and 17rc1 from source using -O2 -fno-omit-frame-pointer.

    The servers are:
    • small
      • The server is named v5 or Beelink SER7 here and has 8 AMD cores with SMT disabled, 16G of RAM and uses Ubuntu 22.04 and ext4 with 1 NVMe device.
    • large
      • ax162-s from Hetzner with 48 cores, AMD 128G RAM and AMD SMT disabled. It uses Ubuntu 22.04 and storage is ext4 using SW RAID 1 over 2 locally attached NVMe devices. More details on it are here. At list prices a similar server from Google Cloud costs 10X more than from Hetzner.

    The configuration files for the large server are in the pg* subdirectories here with the name conf.diff.cx10a_c32r128.

    The configuration files for the small server are in the pg* subdirectories here with the name conf.diff.cx10a_c8r32.

    Benchmark

    I used sysbench and my usage is explained here. There are 42 microbenchmarks and most test only 1 type of SQL statement. Benchmarks are run with the database cached by Postgres.

    For the large server the tests run with 8 tables and 10M rows/table. There are 40 client threads, read-heavy microbenchmarks run for 180 seconds and write-heavy run for 300 seconds. The command line to run all tests was: bash r.sh 8 10000000 180 300 md2 1 1 40

    For the small server the tests run with 1 tables and 50M rows. There is 1 client thread, read-heavy microbenchmarks run for 180 seconds and write-heavy run for 300 seconds. The command line to run all tests was: bash r.sh 1 50000000 180 300 nvme0n1 1 1 1

    Results

    For the results below I split the 42 microbenchmarks into 5 groups -- 2 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. The spreadsheet with all data is here.

    Values from iostat and vmstat divided by QPS are here for the small server and the large server. This can help to explain why something is faster or slower because it shows how much HW is used per request.

    The numbers in the spreadsheets are the relative QPS. When the relative QPS is > 1 then $version is faster than Postgres 15.8.  When it is 3.0 then $version is 3X faster than the base case.

    The relative QPS is the following where $version is one of 16.4, 17beta3, 17rc1:
    (QPS for $version) / (QPS for Postgres 15.8)

    Results: charts 

    Notes on the charts

    • the y-axis shows the relative QPS
    • the y-axis starts at 0.80 to make it easier to see differences
    • in some cases the y-axis truncates the good outliers, cases where the relative QPS is greater than 1.5. I do this to improve readability for values near 1.0. Regardless, the improvements are nice.
    Point queries, part 1
    • Small Server
      • The relative QPS in hot-points exceeds 2.0 for Postgres 17 (beta3 & rc1). The y-axis truncates that excellent result and the result is explained by a reduction in the CPU overhead per-query (see cpu/o here).
      • Otherwise, 17rc1 gets between 5% less and 2% more QPS vs Postgres 15.8
    • Large server
      • The relative QPS in hot-points is almost 3.0 for Postgres 17 (beta3 and rc1). The y-axis truncates that excellent result. The CPU overhead per-query is greatly reduced (see cpu/o here).
      • Otherwise, 17rc1 gets between 3% less and 6% more QPS vs Postgres 15.8
    Point queries, part 2
    • Small server
      • Postgres 17rc1 has similar QPS as 15.8 and 16.4
    • Large server
      • Postgres 17rc1 has similar QPS as 15.8 and might be ~2% slower than 16.4
    Range queries, part 1
    • Small server
      • While it looks like there is a regression for scan performance, the scan microbenchmark seems to have more variance than I can explain so I ignore that for now.
      • Postgres 17rc1 has similar QPS as 15.8 and 16.4
    • Large server
      • The result for scan is excellent, but see my comment above for Small Server
      • Postgres 17rc1 has similar QPS as 15.8 and 16.4
    Range queries, part 2
    • Small server
      • Postgres 17rc1 has similar QPS as 15.8 and 16.4
    • Large server
      • Postgres 17rc1 has similar QPS as 15.8 and 16.4
    Writes
    • Small server
      • Postgres 17rc1 is between 2% and 20% faster than 15.8
    • Large server
      • Postgres 17beta3 and 17rc1 are much faster than 15.8 and 16.4
      • Great outliers for update-nonindex and update-one are truncated for Postgres 17rc1

    No comments:

    Post a Comment

    RocksDB on a big server: LRU vs hyperclock, v2

    This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...