Sunday, January 17, 2021

Insert Benchmark: Postgres is still boring

This has results for the insert benchmark on a small server with a low-concurrency workload using Postgres in versions 11.10, 12.4 and 13.1 My previous posts are for the same workload with MyRocks and InnoDB. Boring is good news. It means I haven't found serious regressions.

Overview

The test is run with 1 client and 1 table. In this case, client means there is at most 1 connection for writes and 1 for reads. For some tests the write and read connections do things concurrently. My test server has 4 CPU cores and hyperthreads are disabled.

The test is run for 3 table sizes: 20M, 100M and 500M rows. The test table fits in memory for the 20M and 100M sizes and is larger than memory for the 500M size. For the read+write test steps the working set for queries is mostly cached by Postgres, cached by MyRocks and not cached by InnoDB.

The insert benchmark has several steps (explained here and here): load data, create secondary indexes, load more data, read+write and the read+write step is repeated using increasing rate limits for the writer (100/s, 200/s, 400/s, 600/s, 800/s and 1000/s).

Results, configuration files and helper scripts are archived in github. I tried to make the configuration options similar and the files are here for 11.10, 12.4 and 13.1. The NUC servers use Ubuntu 20.04 and XFS.

Summary for 13.1 relative to 11.10:

  • 13.1 gets ~6% more QPS during read+write
  • 13.1 does the load (without secondary indexes) and create index slightly faster (<= 10%)
  • 13.1 has a small regression for loading after secondary indexes were created. The insert rate decreases by 4%, 16% and 4% for tables with 20M, 100M and 500M rows, CPU/insert increases and KB written to storage/insert also increases. I know there are changes in 13.x related to vacuum and index dedup. This workload is insert-only, no updates or deletes, and needs vacuum to set visibility map bits. Perhaps vacuum is triggered in 13.1 when it wasn't triggered in earlier versions with the introduction of autovacuum_vacuum_insert_threshold. I might repeat a test where I try to prevent that from happening. I spent time earlier this year updating my scripts to do manual vacuums during the insert benchmark and it isn't clear that the benchmark needs more vacuum.
Updates - To understand the small regression mentioned above for the l.i1 test (more CPU & write IO) I repeated the test with 100M rows using 2 configurations: one disabled index deduplication and the other disabled insert-triggered autovacuum. Disabling index deduplication had no effect and disabling insert-triggered autovacuum resolves the regression. I am not suggesting that people disable insert-triggered autovac as that is a useful feature, but it is nice to be able to explain a regression.


Results overview

For now I am sharing text summaries and in a few days I will share a longer report with graphs and tables. The text summaries are dense, which is good and bad. Dense means I can do comparisons for many DBMS configurations with all the data I need on one page. But dense also imposes a burden on the reader. The guide is here if you want to understand the output.

The report summaries are in github for the tests with 20M100M and 500M rows. The summary files per test step are listed below. The mrg.* files have absolute values for throughput and HW efficiency metrics. The rel.* files have the values relative to the base case and here the base case is Postgres version 11.10, so the values from 11.10 are the denominator while values from 12.4 or 13.1 are the numerator. Here I explain the ratios for 13.1 relative to 11.10. For relative throughput a value < 1 means that 13.1 is slower than in 11.10. For relative HW efficiency a value > 1 means that 13.1 uses more HW per operation than 11.10.

The files are:

  • load data (l.i0) - mrg.l.i0, rel.l.i0
  • create secondary indexes (l.x) - mrg.l.x, rel.l.x
  • load more data (l.i1) - mrg.l.i1, rel.l.i1
  • read+write with 100 inserts/s - mrg.q100.2, rel.q100.2
  • read+write with 200 inserts/s - mrg.q200.2, rel.q200.2
  • read+write with 400 inserts/s - mrg.q400.2, rel.q400.2
  • read+write with 600 inserts/s - mrg.q600.2, rel.q600.2
  • read+write with 800 inserts/s - mrg.q800.2, rel.q800.2
  • read+write with 1000 inserts/s - mrg.q1000.2, rel.q1000.2

Finally, I pasted the output from mrg.l.i0, mrg.l.x, mrg.l.i1, mrg.q100.2, mrg.q200.q, mrg.q800.2 and mrg.q1000.2 into one file (see mrg.some for 20M100M and 500M) to make this easier to read. I also did the same for the relative files (see rel.some for 20M100M and 500M). Below I discuss the output using the mrg.some and rel.some files.

Results

As explained above I use ratios to compare throughput and HW efficiency. The numerator is the value from Postgres in 13.1 or 12.4 while the denominator is the value in 11.10. For throughput a ratio < 1 means that 13.1 or 12.4 are slower. For HW efficiency a ratio > 1 means that 13.1 or 12.4 uses more HW per operation. Below, unless I call it out, I only explain the 13.1 vs 11.10 results as 12.4 was similar to 13.1.

The ratios are triples -- the results for the test with 20M, 100M and 500M rows. 

Load data (l.i0)
  • This loads in PK order to a table that has no secondary indexes
  • Insert/s (ips) ratios were 1.03, 1.04, 1.03
  • CPU/insert (cpupq) ratios were 1.00, 1.00, 1.00
Create secondary indexes (l.x)
  • This creates 3 secondary indexes
  • Indexed rows/s (ips) ratios were 1.00, 1.03, 1.09
  • CPU/indexed row (cpupq) ratios were 1.00, 1.00, 1.00
Load more data (l.i1)
  • This loads more data in PK order after the secondary indexes are created
  • Insert/s (ips) ratios were 0.96, 0.84, 0.96
  • CPU/insert (cpupq) ratios were 1.06, 1.12, 1.11
  • KB written to storage/insert (wkbpi) ratios were 0.99, 1.33, 1.07
  • The regressions here for ips, cpupq & wkbpi were not in 12.4. So they are new in 13.0 or 13.1.

    Read+write with 100 inserts/s (q100.2)

    • This has one connection doing queries and another doing 100 inserts/s
    • Queries/s (qps) ratios were 1.06, 1.05, 1.06
    • CPU/insert (cpupq) ratios were 0.95, 0.95, 0.95

    Read+write with 200 inserts/s (q200.2)
    • This has one connection doing queries and another doing 200 inserts/s
    • Queries/s (qps) ratios were 1.06, 1.05, 1.06
    • CPU/insert (cpupq) ratios were 0.94, 0.95, 0.95
    Read+write with 800 inserts/s (q800.2)
    • This has one connection doing queries and another doing 800 inserts/s
    • Queries/s (qps) ratios were 1.06, 1.06, 1.07
    • CPU/insert (cpupq) ratios were 0.95, 0.94, 0.94
    Read+write with 1000 inserts/s (q1000.2)
    • This has one connection doing queries and another doing 1000 inserts/s
    • Queries/s (qps) ratios were 1.07, 1.05, 1.07
    • CPU/insert (cpupq) ratios were 0.94, 0.96, 0.95

    No comments:

    Post a Comment

    RocksDB on a big server: LRU vs hyperclock, v2

    This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...