Monday, April 24, 2023

Revisiting perf regressions in Postgres, a larger server and sysbench, part 2

My previous post has results for sysbench from Postgres when the database fits in the buffer pool. In this post I have results for a configuration where the workload fits in the OS page cache but not the database buffer pool.

The benchmarks tries to answer two questions. The context is a server with fast IO (reading from the OS page cache).

  • How does perf change from Postgres version 11 to version 15?
  • What is the impact from compiler optimizations?

tl;dr

  • The o3_native_lto build has the best performance as it did in the previous post. It provides ~4% more throughput.
  • For perf changes from 11.19 to 15.2
    • Point queries get about 5% more QPS in 15.2
    • Range queries get about 3% more QPS in 15.2
    • Writes get about 9% more QPS in 15.2
  • There are outliers in all of the microbenchmark groups (point query, range query & writes)

Benchmark

See my previous post for more details.

The workload was configured to fit in the OS page cache but not in the 4G Postgres buffer pool. The config file, conf.diff.cx7_gcp_c2s60_4g, is here for Postgres 11121314 and 15.

I use sysbench to run 42 microbenchmarks and each microbenchmark is put in one of three groups based on the dominant operation: point query, range query, writes.

Results for all versions

The result spreadsheet is here. See the pgall.4g tab. The chart doesn't show the full name for each benchmark. Consult the spreadsheet.

The graphs use relative throughput which is throughput for me / throughput for base case. When the relative throughput is > 1 then my results are better than the base case. When it is 1.10 then my results are ~10% better than the base case. The base case is Postgres 11.19.

There are outliers for all of the microbenchmark groups. The bullet points list the relative throughput for version 15.2 versus 11.19. The microbenchmark names on the graphs below are cut off so refer to the spreadsheet:
  • point
    • points-covered-pk_range=100 - relative throughput is 1.56 and the big change is from version 13.7 to version 14.7. This does point queries on a covering PK index.
  • range
    • range-notcovered-pk_range=100 - relative throughput is 1.31 and the big change is from version 11.19 to version 12.14. This does range scans on a non-covering PK index.
    • read-only_range=100 - relative throughput is 1.21 and the big change is from version 11.19 to version 12.14. This does range queries that scan 100 rows with aggregation.
    • read-only_range=10 - relative throughput is 1.38 and the big change is from verson 11.19 to 12.14. Queries are like read-only_range=10 except they only scan 10 rows.
  • writes
    • update-index_range=100 - relative throughput is 1.20 and the improvements arrive in several versions (12, 14, 15). This does updates that require index maintenance.
    • update-inlist_range=100 - relative throughput is 1.33 and the big improvement arrives is from version 11.19 to version 12.14.
    • update-nonindex_range=100 - relative throughput is 1.17 and the improvements arrive in versions 12.14 and 14.7.
Summary statistics:

o3_native_lto12.1413.1014.715.115.2
Point: avg1.011.031.081.081.09
Point: median1.011.031.051.051.05
Point: min0.960.970.900.920.96
Point: max1.031.091.531.521.56
Point: stddev0.0160.0250.1280.1230.129
Range: avg1.061.071.081.091.08
Range: median1.011.021.031.031.03
Range: min0.970.930.940.980.94
Range: max1.351.381.371.401.38
Range: stddev0.1270.1400.1330.1280.126
Write: avg1.061.051.081.111.11
Write: median1.021.031.101.101.09
Write: min0.960.960.801.001.00
Write: max1.431.281.291.331.33
Write: stddev0.1410.0990.1320.1040.103

Results for Postgres 15.1

The result spreadsheet is here. See the pg151.r4g tab. The chart doesn't show the full name for each benchmark. Consult the spreadsheet.

The graphs use relative throughput which is throughput for me / throughput for base case. When the relative throughput is > 1 then my results are better than the base case. When it is 1.10 then my results are ~10% better than the base case. The base case is Postgres 11.19.
Summary statistics:

pg151_defo2_nofpo3o3_nativeo3_native_lto
Point: avg0.991.001.021.04
Point: median0.991.001.021.04
Point: min0.970.990.981.02
Point: max1.001.011.031.06
Point: stddev0.0070.0050.0130.009
Range: avg0.981.001.031.06
Range: median0.981.001.031.05
Range: min0.961.001.011.03
Range: max1.011.021.051.10
Range: stddev0.0140.0060.0120.026
Write: avg1.001.011.011.03
Write: median1.001.001.011.03
Write: min0.980.991.001.01
Write: max1.041.051.071.08
Write: stddev0.0180.0190.0200.020

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...