Wednesday, September 4, 2024

Postgres 17beta3 vs sysbench on a large server: looking great

This has benchmark results for Postgres 15.8, 16.4 and 17 beta3 using sysbench and a large server.  A recent result for Postgres 17 beta3 from a medium server is here. The server in this case is an ax162-s from Hetzner and this is my first result with it.

This work was done by Small Datum LLC.

    tl;dr

    • 17beta3 looks great
    • 17beta3 and 16.4 have similar performance in most cases
    • 17beta3 does much better than 16.4 on the hot-points test and half of the write tests
    Builds, configuration and hardware

    I compiled Postgres versions 15.8, 16.4 and 17beta3 from source using -O2 -fno-omit-frame-pointer.

    The server is a ax162-s from Hetzner with 48 cores, AMD 128G RAM and AMD SMT disabled. It uses Ubuntu 22.04 and storage is ext4 using SW RAID 1 over 2 locally attached NVMe devices. More details on it are here. At list prices a similar server from Google Cloud costs 10X more than from Hetzner.

    The configuration files are in the pg* subdirectories here with the name conf.diff.cx9a2a_c32r128.

    Benchmark

    I used sysbench and my usage is explained here. There are 42 microbenchmarks and most test only 1 type of SQL statement. Benchmarks are run with the database cached by Postgres with 8 tables and 10M rows/table. There are 40 client threads, read-heavy microbenchmarks run for 300 seconds and write-heavy run for 600 seconds.

    The command lines for my helper script was:
    bash r.sh 8 10000000 300 600 md2 1 1 40

    Results

    For the results below I split the 42 microbenchmarks into 5 groups -- 2 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. The spreadsheet with all data is here.

    Values from iostat and vmstat divided by QPS are here. This can help to explain why something is faster or slower because it shows how much HW is used per request.

    The numbers in the spreadsheets are the relative QPS. When the relative QPS is > 1 then $version is faster than Postgres 15.8. The relative QPS is:
    (QPS for $version) / (QPS for Postgres 15.8)

    Results: summary statistics

    These tables show summary statistics per group of microbenchmarks. The first table is for Postgres 16.4 and the second for 17beta3. The numbers are the relative QPS where the base case is Postgres 15.8.

    • Results for 16.4 and 17beta3 are similar based on median values except for the write microbenchmarks where 17beta3 does much better
    • Results for 17beta3 are even better based on average values
    • In addition to significant improvements for writes, 17beta3 has a huge positive outlier on the hot-points microbenchmark where it is 2.67X faster than 15.8.

    16.4minmaxavgmedian
    point-11.001.021.011.01
    point-21.001.031.021.02
    range-10.871.020.991.01
    range-20.991.041.011.01
    writes0.811.030.960.99
    17beta3minmaxavgmedian
    point-10.972.671.150.99
    point-20.971.010.990.99
    range-10.961.010.991.00
    range-21.001.041.021.02
    writes1.011.851.331.22

    Results: charts 

    Notes on the charts

    • the y-axis shows the relative QPS
    • the y-axis starts at 0.85 to make it easier to see differences
    • for the first graph, the y-axis truncates one outlier (hot-points) from Postgres 17beta where the relative QPS is 2.67. That is a great improvement
    Point queries, part 1
    • the value for hot-points with 17beta is an outlier at 2.67 and truncated. From vmstat metrics there is much less CPU per query in 17beta3 for hot points (see here).
    Point queries, part 2
    Range queries, part 1
    Range queries, part 2
    Writes
    • 17beta3 does much better than previous releases on 6 of the 10 microbenchmarks. From vmstat metrics the CPU overhead and context switch rate is much lower for 17beta3 (see here).



     

    No comments:

    Post a Comment

    RocksDB on a big server: LRU vs hyperclock, v2

    This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...