Friday, June 12, 2026

HammerDB tproc-c on a large server, Postgres 14 to 19 beta1

This has results for HammerDB tproc-c on a large server using MySQL and Postgres. I am new to HammerDB and still figuring out how to explain and present results so I will keep this simple and just share graphs without explaining the results.

tl;dr

    • There are small regressions in versions 16, 17 and 18
    • NOPM usually improves a small amount in 19 beta1 relative to 18
    Builds, configuration and hardware

    I compiled Postgres versions from source: 14.22, 14.23, 15.17, 15.18, 16.13, 16.14, 17.9, 17.10, 18.0, 18.1, 18.2, 18.3, 18.4 and 19 beta1.

    I used a 48-core server from Hetzner
    • an ax162s with an AMD EPYC 9454P 48-Core Processor with SMT disabled
    • 2 Intel D7-P5520 NVMe storage devices with RAID 1 (3.8T each) using ext4
    • 128G RAM
    • Ubuntu 24.04
    Postgres configuration files:
    • prior to version 18 the config file is named conf.diff.cx10a50g_c32r128 (x10a_c32r128) and is here for versions 14, 15, 16 and 17.
    • for Postgres 18 and 19 I used conf.diff.cx10b_c32r128 (x10b_c32r128) with io_method=sync to be similar to the config used for versions 14 through 17.
    Benchmark

    The benchmark is tproc-c from HammerDB. The tproc-c benchmark is derived from TPC-C.

    The benchmark was run for several workloads:
    • vu=10, wh=1000 - 10 virtual users, 1000 warehouses
    • vu=20, wh=1000 - 20 virtual users, 1000 warehouses
    • vu=40, wh=1000 - 40 virtual users, 1000 warehouses
    • vu=10, wh=2000 - 10 virtual users, 2000 warehouses
    • vu=20, wh=2000 - 20 virtual users, 2000 warehouses
    • vu=40, wh=2000 - 40 virtual users, 2000 warehouses
    • vu=10, wh=4000 - 10 virtual users, 4000 warehouses
    • vu=20, wh=4000 - 20 virtual users, 4000 warehouses
    • vu=40, wh=4000 - 40 virtual users, 4000 warehouses
    The wh=1000 workloads are less heavy on IO. The wh=4000 workloads are more heavy on IO.

    The benchmark for Postgres is run by a variant of this script which depends on scripts here.
    • stored procedures are enabled
    • partitioning is used because the warehouse count is >= 1000
    • a 5 minute rampup is used
    • then performance is measured for 60 minutes
    Results

    My analysis at this point is simple -- I only consider average throughput. Eventually I will examine throughput over time and efficiency (CPU and IO).

    On the charts that follow y-axis does not start at 0 to improve readability at the risk of overstating the differences. The y-axis shows relative throughput. There might be a regression when the relative throughput is less than 1.0. There might be an improvement when it is > 1.0. The relative throughput is:
    (NOPM for some-version / NOPM for base-version)

    The base version is Postgres 14.22.

    A spreadsheet with absolute and relative values for NOPM is here.

    Results: vu=10, wh=1000

    Summary:

    • There are small regressions in versions 16, 17 and 18 while NOPM improves is 19 beta1

    Results: vu=20, wh=1000

    Summary:

    • There are small regressions in versions 16, 17 and 18 while NOPM improves is 19 beta1

    Results: vu=40, wh=1000

    Summary:

    • There are small regressions in versions 17 and 18 while NOPM improves is 19 beta1

    Results: vu=10, wh=2000

    Summary:

    • There are small regressions in version 18 while NOPM improves is 19 beta1

    Results: vu=20, wh=2000

    Summary:

    • There are small regressions in versions 16, 17 and 18 while NOPM improves is 19 beta1

    Results: vu=40, wh=2000

    Summary:

    • There are small regressions in versions 16, 17 and 18 while NOPM improves is 19 beta1
    • There is no result for 18.1 because of a bug in my test scripts

    Results: vu=10, wh=4000

    Summary:

    • There are small regressions in versions 16, 17 and 18 while NOPM improves is 19 beta1

    Results: vu=20, wh=4000

    Summary:

    • There are small regressions in versions 16, 17 and 18

    Results: vu=40, wh=4000

    Summary:

    • There are small regressions in versions 16, 17 and 18 while NOPM improves is 19 beta1


    Thursday, June 11, 2026

    Write-heavy sysbench tests, a large server, modern Postgres and MySQL

    This has results for modern Postgres and MySQL using write-heavy tests from sysbench and a large server. I think there are regressions in Postgres that arrive in some of versions 16, 17, 18 and 19 beta1 but I am far from certain and this blog post is just another step in my journey to figure that out.

    tl;dr

    • Postgres suffers a lot from throughput variation while MySQL+InnoDB does not
    • InnoDB gets much better average throughput on 6 of 10 tests, similar throughput one one and then Postgres does better on 3 of 10 tests
    • For tests from which I provided vmstat and iostat results, Postgres does more write IO per operation. In some cases InnoDB uses more CPU, in other cases it does not.

    Builds, configuration and hardware

    I compiled:
    • Postgres from source for versions 15.17, 16.13, 17.9 and 18.3.
    • MySQL from source for version 8.4.7
    I used a 48-core server from Hetzner
    • an ax162s with an AMD EPYC 9454P 48-Core Processor with SMT disabled
    • 2 Intel D7-P5520 NVMe storage devices with RAID 1 (3.8T each) using ext4
    • 128G RAM
    • Ubuntu 24.04
    Configuration files for Postgres:
    • the config file is named conf.diff.cx10a_c32r128 (x10a_c32r128) and is here for versions 15, 16 and 17.
    • for Postgres 18 I used conf.diff.cx10b_c32r128 (x10b_c32r128) which is as close as possible to the Postgres 17 config and uses io_method=sync
    Benchmark

    I used sysbench and my usage is explained here. Normally I run 32 of the 42 microbenchmarks listed in that blog post using tables small enough to be cached by the DBMS. Most test only one type of SQL statement.

    The tests can be called microbenchmarks. They are very synthetic. But microbenchmarks also make it easy to understand which types of SQL statements have great or lousy performance. Performance testing benefits from a variety of workloads -- both more and less synthetic.

    But I did things differently here:
    • I only run the write-heavy tests (to save time)
    • The tables are larger than memory and cannot be cached
    • Each test (microbenchmark) is run for 2 hours when I normally run each for 15 minutes
    • After each test a vacuum is done
    The purpose is to search for regressions from new CPU overhead and mutex contention related to MVCC GC (vacuum for Postgres, purge for InnoDB).

    Results

    I provide charts below with relative QPS. The relative QPS is the following:
    (QPS for some version) / (QPS for Postgres 15.17)
    When the relative QPS is > 1 then some version is faster than base version.  When it is < 1 then there might be a regression. When the relative QPS is 1.2 then some version is about 20% faster than base version.

    The per-test results from vmstat and iostat can help to explain why something is faster or slower because it shows how much HW is used per request, including CPU overhead per operation (cpu/o) and context switches per operation (cs/o) which are often a proxy for mutex contention.

    Results: writes

    The table below has relative QPS for Postgres 16 to 19 and then InnoDB all relative to the throughput for Postgres 15.17. Columns 1 to 4 have results for Postgres and the numbers in yellow highlight the tests where there is a regression in Postgres. For column 5 (MySQL with InnoDB) the numbers in yellow and red indicate tests where InnoDB's throughput is less than Postgres. And then the numbers in green indicate tests where InnoDB's throughput is much larger than Postgres.

    Note that when relative QPS (rQPS) is 0.90 then throughput dropped by ~10%.

    Summary:
    • throughput for Postgres drops after version 15.17. I don't know yet whether this is a regression.
    • throughput for InnoDB is much better than Postgres in 6 of 10 tests, similar in one test, and much worse in 3 of 10 tests.
    The sections that follow this one have more detail on results from the update-index, update-zipf tests and insert tests.

    Relative to: Postgres 15.17
    col-1 : Postgres 16.13
    col-2 : Postgres 17.9
    col-3 : Postgres 18.3
    col-4 : Postgres 19 beta1
    col-5 : MySQL 8.4.7

    col-1   col-2   col-3   col-4   col-5
    0.94    0.97    0.98    1.02    1.88    update-inlist
    0.94    0.90    0.88    0.92    1.43    update-index
    0.91    0.86    0.87    0.92    1.19    update-nonindex
    0.96    0.99    0.98    0.98    0.71    update-one
    0.92    0.83    0.81    0.85    0.93    update-zipf
    0.95    0.93    0.84    0.81    1.71    write-only
    0.94    0.94    0.90    0.92    1.14    read-write_range=10
    0.95    0.96    0.95    0.95    1.93    read-write_range=100
    0.89    0.82    0.80    0.84    1.01    delete
    1.05    1.05    1.01    1.10    0.53    insert

    Results: update-index

    Summary:
    • Postgres suffers from too much variance
    • Average throughput is ~1.55X larger for InnoDB than for Postgres
    • Per operation, Postgres does ~1.20X more write IO (KB written) to storage than InnoDB
    • Per operation, InnoDB uses more CPU and does more context switches. While autovacuum was enabled and was likely running during the test, my measurements exclude the manual vacuum done at the end of each test.
    iostat, vmstat normalized by operation rate
    r/s     rMB/s   w/s     wMB/s   r/o     rKB/o   wKB/o   o/s     dbms
    35503.0 373.7   58795.7 1345.1  1.375   14.824  53.351  25817   PG 19b1
    33140.6 517.8   53449.6 1735.3  0.827   13.226  44.326  40090   MySQL 8.4.7

    cs/s    cpu/s   cs/o    cpu/o   dbms
    176167  14.4     6.824  .000557 PG 19b1
    661395  41.9    16.498  .001046 MySQL 8.4.7

    Results: update-zipf

    Summary:
    • Postgres suffers from too much variance
    • Average throughput is ~1.09X larger for InnoDB than for Postgres
    • Per operation, Postgres does ~1.30X more write IO (KB written) to storage than InnoDB
    • Per operation, InnoDB uses more CPU and does more context switches. While autovacuum was enabled and was likely running during the test, my measurements exclude the manual vacuum done at the end of each test.
    iostat, vmstat normalized by operation rate
    r/s     rMB/s   w/s     wMB/s   r/o     rKB/o   wKB/o   o/s     dbms
    55595.5 620.7   64264.4 1352.3  0.622   7.110   15.490  89396   PG 19b1
    27405.9 428.2   37465.1 1133.6  0.282   4.508   11.933  97270   MySQL 8.4.7

    cs/s    cpu/s   cs/o    cpu/o   dbms
    424392  27.2     4.747  .000304 PG 19b1
    1213054 44.5    12.471  .000458 MySQL 8.4.7

    Results: insert

    Summary:
    • Postgres suffers from too much variance
    • Average throughput is ~2.06X larger for Postgres than for InnoDB
    • Per operation, Postgres does ~1.67X more write IO (KB written) to storage than InnoDB
    • Per operation, Postgres uses more CPU and does more context switches. This is the opposite of what happens above for update-index and update-zipf.

    iostat, vmstat normalized by operation rate
    r/s     rMB/s   w/s     wMB/s   r/o     rKB/o   wKB/o   o/s     dbms
    1615.5  56.0    15321.7 1170.9  0.007   0.242   5.059   237009  PG 19b1
    3.6     0.1     8275.4  340.7   0.000   0.000   3.029   115155  MySQL 8.4.7

    cs/s    cpu/s   cs/o    cpu/o   dbms
    1214563 46.0    10.547  .000399 PG 19b1
    800827  50.5     3.379  .000213 MySQL 8.4.7













    The insert benchmark on a small server, cached workload : Postgres 19 beta1

    This has results for Postgres versions 19 beta1, 18.4 and 17.10 with the Insert Benchmark on a small server using a cached and CPU-bound workload.

    Postgres continues to be boring in a good way. It is hard to find performance regressions.

     tl;dr

    • I don't see regressions here in 19 beta1
    • I see some improvements here in 19 beta1
      • index create (l.x) is faster but the step is short-running so I don't assume much from this
      • the write-heavy steps (l.i1, l.i2) are faster and CPU overhead is lower in 19 beta1, I hope to explain why the CPU overhead is lower, but that waits for another day.

    Builds, configuration and hardware

    I compiled Postgres from source using -O2 -fno-omit-frame-pointer for versions 19 beta1, 18.4 and 17.10.

    The server is an Beelink SER7 with a Ryzen 7 7840HS CPU with 8 cores and AMD SMT disabled, 32G of RAM. Storage is one SSD for the OS and an NVMe SSD for the database using ext-4 with discard enabled. The OS is Ubuntu 24.04.

    For 17.10 the config file is named conf.diff.cx10a_c8r32 (cx10a) and is here.

    For Postgres 18 and 19 the config file is conf.diff.cx10b_c8r32 (cx10b) which is as similar as possible to the config for version 17.

    The Benchmark

    The benchmark is explained here and is run with 1 client.

    The point query (qp100, qp500, qp1000) and range query (qr100, qr500, qr1000) steps are run for 3600 seconds each.

    The benchmark steps are:

    • l.i0
      • insert 30M rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.
    • l.x
      • create 3 secondary indexes per table. There is one connection per client.
    • l.i1
      • use 2 connections/client. One inserts 40M rows per table and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.
    • l.i2
      • like l.i1 but each transaction modifies 5 rows (small transactions) and 10M rows are inserted and deleted per table.
      • Wait for S seconds after the step finishes to reduce variance during the read-write benchmark steps that follow. The value of S is a function of the table size.
    • qr100
      • use 3 connections/client. One does range queries and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested. This step is frequently not IO-bound for the IO-bound workload.
    • qp100
      • like qr100 except uses point queries on the PK index
    • qr500
      • like qr100 but the insert and delete rates are increased from 100/s to 500/s
    • qp500
      • like qp100 but the insert and delete rates are increased from 100/s to 500/s
    • qr1000
      • like qr100 but the insert and delete rates are increased from 100/s to 1000/s
    • qp1000
      • like qp100 but the insert and delete rates are increased from 100/s to 1000/s
    Results

    The performance summary with charts is here.

    This table lists relative QPS per benchmark step and relative QPS is:
        (QPS for my version / QPS for Postgres 17.10)

    The background in the table cells is blue for big improvements and yellow for regressions. There are no regressions here. 

    The index create (l.x) step is much faster in 19.10. I usually ignore results on this step but I am curious if something was done in 19.10 to improve index create. But this step takes between 1 and 2 minutes and I am reluctant to assume too much from a short running step.

    For the write-heavy steps (l.i1, l.i2)
    • there are small improvements in 18.4
    • there are large improvements in 19 beta1. The CPU overhead is lower in 19 beta1 compared to 17.10, ~15% lower for l.i1 and ~10% lower for l.i2. Hopefully I can explain why. But the lower CPU overhead might explain the improved performance in 19 beta1. Some of the metrics from iostat and vmstat are here.
    dbmsl.i0l.xl.i1l.i2qr100qp100qr500qp500qr1000qp1000
    17.101.001.001.001.001.001.001.001.001.001.00
    18.41.001.031.021.070.991.001.001.001.011.00
    19 beta11.011.161.231.220.991.000.990.991.001.00

    Tuesday, June 9, 2026

    Postgres 19 beta1 vs sysbench on a small server

    This has results from sysbench on a small server with Postgres 19 beta1, 18.4 and 17.10. Sysbench is run with low concurrency (1 thread) and a cached database. The purpose is to search for changes in performance, often from new CPU overheads.

    tl;dr

    • 19beta1, 18.4 and 17.10 have mostly similar performance
    • There might be small regressions (about 2%) from 17.10 to 19beta1 but my tests are not good at spotting that.
    • 19beta1 is much faster on one test (read-only-count) thanks to a new query plan

    Builds, configuration and hardware

    I compiled Postgres from source. 

    The server is a Beelink SER7 7840HS with an AMD Ryzen 7 7840HS CPU and 32G RAM. Storage uses an NVMe device with ext-4 and discard enabled. The OS is Ubuntu 24.04. 

    The config files are here for 17.10, 18.4 and 19 beta1.

    Benchmark

    I used sysbench and my usage is explained here. To save time I only run 32 of the 42 microbenchmarks 
    and most test only 1 type of SQL statement. Benchmarks are run with the database cached by Postgres.

    The read-heavy microbenchmarks run for 600 seconds and the write-heavy for 900 seconds.

    The benchmark is run with 1 client, 1 table and 50M rows. The purpose is to search for CPU regressions.

    Results

    The microbenchmarks are split into 4 groups -- 1 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. 

    I provide charts below with relative QPS (rQPS). The relative QPS is the following:
    (QPS for some version) / (QPS for base version)
    When the relative QPS is > 1 then some version is faster than base version.  When it is < 1 then there might be a regression. Values from iostat and vmstat divided by QPS are also provided here. These can help to explain why something is faster or slower because it shows how much HW is used per request.

    Here, base version is Postgres 17.10 and some version is either 18.4 or 19 beta1.

    I describe performance changes (changes to relative QPS) in terms of basis points. Performance changes by one basis point when the difference in rQPS is 0.01. When rQPS decreases from 0.95 to 0.85 then it changed by 10 basis points.

    Results: point queries

    Summary:
    • 19beta1 is better than 17.10 by ~3 basis points for most tests
    • 19beta1 is slightly better than 18.4
    Relative to Postgres 17.10
    col-1 : Postgres 18.4
    col-2 : Postgres 19 beta1

    col-1   col-2
    1.01    1.00    hot-points
    0.98    0.97    point-query
    1.01    1.03    points-covered-pk
    1.00    1.04    points-covered-si
    1.00    1.02    points-notcovered-pk
    1.00    1.03    points-notcovered-si
    0.99    0.99    random-points_range=10
    1.00    1.03    random-points_range=100
    1.01    1.03    random-points_range=1000

    Results: range queries without aggregation

    Summary:
    • 19beta1 is worse than 17.10 by ~3 basis points in 4 of 5 tests
    • 19beta1 is better than 17.10 by 5 basis points in the scan test
    • 19beta1 and 18.4 are similar except for the scan test where 19beta1 did better
    Relative to Postgres 17.10
    col-1 : Postgres 18.4
    col-2 : Postgres 19 beta1

    col-1   col-2
    0.98    0.97    range-covered-pk
    0.96    0.96    range-covered-si
    0.98    0.98    range-notcovered-pk
    0.99    0.99    range-notcovered-si
    0.95    1.05    scan

    Results: range queries with aggregation

    Summary:
    • 19beta1 is worse than than 17.10 on two tests
    • 19beta1 is better than 17.10 on five tests
    • 19beta1 and 17.10 are the same on one test
    • 19beta1 is ~2.5X better than 17.10 on the read-only-count test
    • 19beta1 and 18.4 have similar results except for the read-only-count test
    The query for the read-only-count test appears to have a different plan in 19beta1 and that might explain the ~2.5X speedup. In 17.10 and 18.4 it gets Index Scan while in 19beta1 it gets Index Only Scan.

    Query plans for the read-only-count test ...

    For 17.10
    explain SELECT count(c) FROM sbtest1 WHERE id BETWEEN 17704460 AND 17705459
            Aggregate  (cost=1424.42..1424.43 rows=1 width=8)
              ->  Index Scan using sbtest1_pkey on sbtest1  (cost=0.56..1421.93 rows=996 width=121)
                    Index Cond: ((id >= 17704460) AND (id <= 17705459))

    For 18.4
    explain SELECT count(c) FROM sbtest1 WHERE id BETWEEN 11575278 AND 11576277
            Aggregate  (cost=1310.09..1310.10 rows=1 width=8)
              ->  Index Scan using sbtest1_pkey on sbtest1  (cost=0.56..1307.89 rows=882 width=121)
                    Index Cond: ((id >= 11575278) AND (id <= 11576277))

    For 19beta1
    explain SELECT count(c) FROM sbtest1 WHERE id BETWEEN 11686801 AND 11687800
            Aggregate  (cost=32.32..32.33 rows=1 width=8)
              ->  Index Only Scan using sbtest1_pkey on sbtest1  (cost=0.56..30.13 rows=878 width=0)
                    Index Cond: ((id >= 11686801) AND (id <= 11687800))

    Relative to Postgres 17.10
    col-1 : Postgres 18.4
    col-2 : Postgres 19 beta1

    col-1   col-2
    1.04    2.47    read-only-count
    1.00    0.99    read-only-distinct
    1.02    1.01    read-only-order
    0.98    0.97    read-only_range=10
    1.00    1.00    read-only_range=100
    1.02    1.03    read-only_range=10000
    1.09    1.09    read-only-simple
    1.01    1.01    read-only-sum

    Results: writes

    Summary:
    • 19beta1 is worse than 17.10 by 2 to 5 basis points
    • 18.4 is worse than 17.10 by 2 to 3 basis points
    Relative to Postgres 17.10
    col-1 : Postgres 18.4
    col-2 : Postgres 19 beta1

    col-1   col-2
    0.97    0.97    delete
    0.99    0.96    insert
    0.98    0.97    read-write_range=10
    0.98    0.98    read-write_range=100
    0.96    0.95    update-index
    0.99    0.97    update-inlist
    0.97    0.96    update-nonindex
    0.97    0.95    update-one
    0.97    0.95    update-zipf
    0.98    0.97    write-only

    Friday, April 10, 2026

    MySQL 9.7.0 vs sysbench on a small server

    This has results from sysbench on a small server with MySQL 9.7.0 and 8.4.8. Sysbench is run with low concurrency (1 thread) and a cached database. The purpose is to search for changes in performance, often from new CPU overheads.

    I tested MySQL 9.7.0 with and without the hypergraph optimizer enabled. I don't expect it to help much because the queries run here are simple. I hope to learn it doesn't hurt performance in that case.

    tl;dr

    • Throughput improves on two tests with the Hypergraph optimizer in 9.7.0 because they get better query plans.
    • One read-only test and several write-heavy tests have small regressions from 8.4.8 to 9.7.0. This might be from new CPU overheads but I don't see obvious problems in the flamegraphs. 

    Builds, configuration and hardware

    I compiled MySQL from source for versions \8.4.8 and 9.7.0.

    The server is an ASUS ExpertCenter PN53 with AMD Ryzen 7 7735HS, 32G RAM and an m.2 device for the database. More details on it are here. The OS is Ubuntu 24.04 and the database filesystem is ext4 with discard enabled.

    The my.cnf files os here for 8.4. I call this the z12a configs and variants of it are used for MySQL 5.6 through 8.4.

    For 9.7 I use two configs:

    All DBMS versions use the latin1 character set as explained here.

    Benchmark

    I used sysbench and my usage is explained here. To save time I only run 32 of the 42 microbenchmarks and most test only 1 type of SQL statement. Benchmarks are run with the database cached by InnoDB.

    The tests are run using 1 table with 50M rows. The read-heavy microbenchmarks run for 600 seconds and the write-heavy for 1800 seconds.

    Results

    The microbenchmarks are split into 4 groups -- 1 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. 

    I provide tables below with relative QPS. When the relative QPS is > 1 then some version is faster than the base version. When it is < 1 then there might be a regression.  The relative QPS (rQPS) is:
    (QPS for some version) / (QPS for MySQL 8.4.8) 

    Results: point queries

    I describe performance changes (changes to relative QPS, rQPS) in terms of basis points. Performance changes by one basis point when the difference in rQPS is 0.01. When rQPS decreases from 0.95 to 0.85 then it changed by 10 basis points.

    This shows the rQPS for MySQL 9.7.0 using both the z13a and z13b configs. It is relative to the throughput from MySQL 8.4.8.
    • Throughput with MySQL 9.7.0 is similar to 8.4.8 except for point-query where there are regressions as rQPS drops by 5 and 7 basis points. The point-query test uses simple queries that fetch one column from one row by PK. From vmstat metrics the CPU overhead per query for 9.7.0 is ~8% larger than for 8.4.8, with and without the hypergraph optimizer. I don't see anything obvious in the flamegraphs.
    z13a    z13b
    0.99    1.01    hot-points
    0.95    0.93    point-query
    0.99    1.01    points-covered-pk
    1.00    1.01    points-covered-si
    0.98    1.00    points-notcovered-pk
    0.99    1.01    points-notcovered-si
    1.00    1.02    random-points_range=1000
    0.99    1.01    random-points_range=100
    0.96    1.00    random-points_range=10

    Results: range queries without aggregation

    I describe performance changes (changes to relative QPS, rQPS) in terms of basis points. When rQPS decreases from 0.95 to 0.85 then it changed by 10 basis points.

    This shows the rQPS for MySQL 9.7.0 using both the z13a and z13b configs. It is relative to the throughput from MySQL 8.4.8.
    • Throughput with MySQL 9.7.0 is similar to 8.4.8. I am skeptical there is a regression for the scan test with the z13b config. I suspect that is noise.
    z13a    z13b
    0.99    0.99    range-covered-pk
    0.99    0.99    range-covered-si
    0.99    0.99    range-notcovered-pk
    0.98    0.98    range-notcovered-si
    1.00    0.96    scan

    Results: range queries with aggregation

    I describe performance changes (changes to relative QPS, rQPS) in terms of basis points. When rQPS decreases from 0.95 to 0.85 then it changed by 10 basis points.

    This shows the rQPS for MySQL 9.7.0 using both the z13a and z13b configs. It is relative to the throughput from MySQL 8.4.8.
    • There might be small regressions in several tests with rQPS dropping by a few points but I will ignore that for now.
    • There is a large improvement for the read-only-distinct test with the z13b config. The query for this test is select distinct c from sbtest where id between ? and ? order by c. The reason for the performance improvment is that the hypergraph optimizer chooses a better plan, see here.
    • There is a large improvement for the read-only test with range=10000. This test uses the read-only version of the classic sysbench transaction (see here). One of the queries it runs is the query used by read-only-distinct. So it benefits from the better plan for that query. 
    z13a    z13b
    0.97    0.97    read-only-count
    0.98    1.26    read-only-distinct
    0.96    0.95    read-only-order
    0.99    1.15    read-only_range=10000
    0.97    1.00    read-only_range=100
    0.96    0.97    read-only_range=10
    0.99    0.99    read-only-simple
    0.97    0.96    read-only-sum

    Results: writes

    I describe performance changes (changes to relative QPS, rQPS) in terms of basis points. When rQPS decreases from 0.95 to 0.85 then it changed by 10 basis points.

    This shows the rQPS for MySQL 9.7.0 using both the z13a and z13b configs. It is relative to the throughput from MySQL 8.4.8.
    • There might be several small regressions here. I don't see obvious problems in the flamegraphs.
    z13a    z13b
    0.95    0.92    delete
    1.00    1.01    insert
    0.97    0.98    read-write_range=100
    0.96    0.95    read-write_range=10
    0.97    0.96    update-index
    0.97    0.92    update-inlist
    0.95    0.93    update-nonindex
    0.95    0.92    update-one
    0.95    0.93    update-zipf
    0.97    0.95    write-only

    Thursday, April 9, 2026

    Sysbench vs MySQL on a small server: another way to view the regressions

    This post provides another way to see the performance regressions in MySQL from versions 5.6 to 9.7. It complements what I shared in a recent post. The workload here is cached by InnoDB and my focus is on regressions from new CPU overheads. 

    The good news is that there are few regressions after 8.0. The bad news is that there were many prior to that and these are unlikely to be undone.

      tl;dr

      • for point queries
        • there are large regressions from 5.6.51 to 5.7.44, 5.7.44 to 8.0.28 and 8.0.28 to 8.0.45
        • there are few regressions from 8.0.45 to 8.4.8 to 9.7.0
      • for range queries without aggregation
        • there are large regressions from 5.6.51 to 5.7.44 and 5.7.44 to 8.0.28
        • there are mostly small regressions from 8.0.28 to 8.0.45, but scan has a large regression
        • there are few regressions from 8.0.45 to 8.4.8 to 9.7.0
      • for range queries with aggregation
        • there are large regressions from 5.6.51 to 5.7.44 with two improvements
        • there are large regressions from 5.7.44 to 8.0.28
        • there are small regressions from 8.0.28 to 8.0.45
        • there are few regressions from 8.0.45 to 8.4.8 to 9.7.0
      • for writes
        • there are large regressions from 5.6.51 to 5.7.44 and 5.7.44 to 8.0.28
        • there are small regressions from 8.0.28 to 8.0.45
        • there are few regressions from 8.0.45 to 8.4.8
        • there are a few small regressions from 8.4.8 to 9.7.0

      Builds, configuration and hardware

      I compiled MySQL from source for versions 5.6.51, 5.7.44, 8.0.28, 8.0.45, 8.4.8 and 9.7.0.

      The server is an ASUS ExpertCenter PN53 with AMD Ryzen 7 7735HS, 32G RAM and an m.2 device for the database. More details on it are here. The OS is Ubuntu 24.04 and the database filesystem is ext4 with discard enabled.

      The my.cnf files are here for 5.65.7 and 8.4. I call these the z12a configs.

      For 9.7 I use the z13a config. It is as close as possible to z12a and adds two options for gtid-related features to undo a default config change that arrived in 9.6. 

      All DBMS versions use the latin1 character set as explained here.

      Benchmark

      I used sysbench and my usage is explained here. To save time I only run 32 of the 42 microbenchmarks and most test only 1 type of SQL statement. Benchmarks are run with the database cached by InnoDB.

      The tests are run using 1 table with 50M rows. The read-heavy microbenchmarks run for 600 seconds and the write-heavy for 1800 seconds.

      Results

      The microbenchmarks are split into 4 groups -- 1 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. 

      I provide tables below with relative QPS. When the relative QPS is > 1 then some version is faster than the base version. When it is < 1 then there might be a regression.  The relative QPS (rQPS) is:
      (QPS for some version) / (QPS for base version) 
      Results: point queries

      MySQL 5.6.51 gets from 1.18X to 1.61X more QPS than 9.7.0 on point queries. It is easier for me to write about this in terms of relative QPS (rQPS) which is as low as 0.62 for MySQL 9.7.0 vs 5.6.51. I define a basis point to mean a change of 0.01 in rQPS.

      Summary:
      • from 5.6.51 to 9.7.0
        • the median regression is a drop in rQPS of 27 basis points
      • from 5.6.51 to 5.7.44
        • the median regression is a drop in rQPS of 11 basis points
      • from 5.7.44 to 8.0.28
        • the median regression is a drop in rQPS of 25 basis points
      • from 8.0.28 to 8.0.45
        • 7 of 9 tests get more QPS with 8.0.45
        • 2 tests have regressions where rQPS drops by ~6 basis points
      • from 8.0.45 to 8.4.8
        • there are few regressions
      • from 8.4.8 to 9.7.0
        • there are few regressions
      This has (QPS for 9.7.0) / (QPS for 5.6.51) and is followed by tables that show the difference between the latest point release in adjacent versions.
      • the largest regression is an rQPS drop of 38 basis points for point-query. Compared to most of the other tests in this section, this query does less work in the storage engine which implies the regression is from code above the storage engine.
      • the smallest regression is an rQPS drop of 15 basis points for random-points_range=1000. The regression for the same query with a shorter range (=10, =100) is larger. That implies, at least for this query, that the regression is for something above the storage engine (optimizer, parser, etc).
      • the median regression is an rQPS drop of 27 basis points
      0.65    hot-points
      0.62    point-query
      0.72    points-covered-pk
      0.78    points-covered-si
      0.73    points-notcovered-pk
      0.76    points-notcovered-si
      0.85    random-points_range=1000
      0.73    random-points_range=100
      0.66    random-points_range=10

      This has: (QPS for 5.7.44) / (QPS for 5.6.51)
      • the largest regression is an rQPS drop of 14 basis points for hot-points.
      • the next largest regression is an rQPS drop of 13 basis points for random-points with range=10. The regressions for that query are smaller when a larger range is used =100, =1000 and this implies the problem is above the storage engine. 
      • the median regression is an rQPS drop of 11 basis points
      0.86    hot-points
      0.90    point-query
      0.89    points-covered-pk
      0.90    points-covered-si
      0.89    points-notcovered-pk
      0.88    points-notcovered-si
      1.00    random-points_range=1000
      0.89    random-points_range=100
      0.87    random-points_range=10

      This has: (QPS for 8.0.28) / (QPS for 5.7.44)
      • the largest regression is an rQPS drop of 66 basis points for random-points with range=1000. The regression for that same query with smaller ranges (=10, =100) is smaller. This implies the problem is in the storage engine.
      • the second largest regression is an rQPS drop of 35 basis points for hot-points
      • the median regression is an rQPS drop of 25 basis points
      0.65    hot-points
      0.82    point-query
      0.74    points-covered-pk
      0.75    points-covered-si
      0.76    points-notcovered-pk
      0.84    points-notcovered-si
      0.34    random-points_range=1000
      0.75    random-points_range=100
      0.86    random-points_range=10

      This has: (QPS for 8.0.45) / (QPS for 8.0.28)
      • at last, there are many improvements. Some are from a fix for bug 102037 which I found with help from sysbench
      • the regressions, with rQPS drops by ~6 basis points, are for queries that do less work in the storage engine relative to the other tests in this section
      1.20    hot-points
      0.93    point-query
      1.13    points-covered-pk
      1.19    points-covered-si
      1.09    points-notcovered-pk
      1.04    points-notcovered-si
      2.48    random-points_range=1000
      1.12    random-points_range=100
      0.94    random-points_range=10

      This has: (QPS for 8.4.8) / (QPS for 8.0.45)
      • there are few regressions from 8.0.45 to 8.4.8
      0.99    hot-points
      0.96    point-query
      0.99    points-covered-pk
      0.98    points-covered-si
      1.00    points-notcovered-pk
      0.99    points-notcovered-si
      1.00    random-points_range=1000
      1.00    random-points_range=100
      0.98    random-points_range=10

      This has: (QPS for 9.7.0) / (QPS for 8.4.8)
      • there are few regressions from 8.4.8 to 9.7.0
      0.99    hot-points
      0.95    point-query
      0.99    points-covered-pk
      1.00    points-covered-si
      0.98    points-notcovered-pk
      0.99    points-notcovered-si
      1.00    random-points_range=1000
      0.99    random-points_range=100
      0.96    random-points_range=10

      Results: range queries without aggregation

      MySQL 5.6.51 gets from 1.35X to 1.52X more QPS than 9.7.0 on range queries without aggregation. It is easier for me to write about this in terms of relative QPS (rQPS) which is as low as 0.66 for MySQL 9.7.0 vs 5.6.51. I define a basis point to mean a change of 0.01 in rQPS.

      Summary:
      • from 5.6.51 to 9.7.0
        • the median regression is drop in rQPS of 33 basis points
      • from 5.6.51 to 5.7.44
        • the median regression is a drop in rQPS of 16 basis points
      • from 5.7.44 to 8.0.28
        • the median regression is a drop in rQPS ~10 basis points
      • from 8.0.28 to 8.0.45
        • the median regression is a drop in rQPS of 5 basis points
      • from 8.0.45 to 8.4.8
        • there are few regressions from 8.0.45 to 8.4.8
      • from 8.4.8 to 9.7.0
        • there are few regressions from 8.4.8 to 9.7.0
      This has (QPS for 9.7.0) / (QPS for 5.6.51) and is followed by tables that show the difference between the latest point release in adjacent versions.
      • all tests have large regressions with an rQPS drop that ranges from 26 to 34 basis points
      • the median regression is an rQPS drop of 33 basis points
      0.66    range-covered-pk
      0.67    range-covered-si
      0.66    range-notcovered-pk
      0.74    range-notcovered-si
      0.67    scan

      This has: (QPS for 5.7.44) / (QPS for 5.6.51)
      • all tests have large regressions with an rQPS drop that ranges from 12 to 17 basis points
      • the median regression is an rQPS drop of 16 basis points
      0.85    range-covered-pk
      0.84    range-covered-si
      0.84    range-notcovered-pk
      0.88    range-notcovered-si
      0.83    scan

      This has: (QPS for 8.0.28) / (QPS for 5.7.44)
      • 4 of 5 tests have regressions with an rQPS drop that ranges from 10 to 14 basis points
      • the median regression is ~10 basis points
      • rQPS improves for the scan test
      0.86    range-covered-pk
      0.89    range-covered-si
      0.90    range-notcovered-pk
      0.90    range-notcovered-si
      1.04    scan

      This has: (QPS for 8.0.45) / (QPS for 8.0.28)
      • all tests are slower in 8.0.45 than 8.0.28, but the regression for 3 of 5 is <= 5 basis points
      • rQPS in the scan test drops by 21 basis points
      • the median regression is an rQPS drop of 5 basis points
      0.96    range-covered-pk
      0.95    range-covered-si
      0.91    range-notcovered-pk
      0.96    range-notcovered-si
      0.79    scan

      This has: (QPS for 8.4.8) / (QPS for 8.0.45)
      • there are few regressions from 8.0.45 to 8.4.8
      0.95    range-covered-pk
      0.95    range-covered-si
      0.98    range-notcovered-pk
      0.99    range-notcovered-si
      0.98    scan

      This has: (QPS for 9.7.0) / (QPS for 8.4.8)
      • there are few regressions from 8.4.8 to 9.7.0
      0.99    range-covered-pk
      0.99    range-covered-si
      0.99    range-notcovered-pk
      0.98    range-notcovered-si
      1.00    scan

      Results: range queries with aggregation

      Summary:
      • from 5.6.51 to 9.7.0 rQPS
        • the median result is a drop in rQPS of ~30 basis points
      • from 5.6.51 to 5.7.44
        • the median result is a drop in rQPS of ~10 basis points
      • from 5.7.44 to 8.0.28
        • the median result is a drop in rQPS of ~12 basis points
      • from 8.0.28 to 8.0.45
        • the median result is an rQPS drop of 5 basis points
      • from 8.0.45 to 8.4.8
        • there are few regressions from 8.0.45 to 8.4.8
      • from 8.4.8 to 9.7.0
        • there are few regressions from 8.4.8 to 9.7.0
      This has (QPS for 9.7.0) / (QPS for 5.6.51) and is followed by tables that show the difference between the latest point release in adjacent versions.
      • the median result is a drop in rQPS of ~30 basis points
      • rQPS for the read-only-distinct test improves by 25 basis point
      0.67    read-only-count
      1.25    read-only-distinct
      0.75    read-only-order
      1.02    read-only_range=10000
      0.74    read-only_range=100
      0.66    read-only_range=10
      0.69    read-only-simple
      0.66    read-only-sum

      This has: (QPS for 5.7.44) / (QPS for 5.6.51)
      • the median result is an rQPS drop of ~10 basis points
      • rQPS improves by 45 basis points for read-only-distinct and by 23 basis points for read-only with the largest range (=10000)
      0.86    read-only-count
      1.45    read-only-distinct
      0.93    read-only-order
      1.23    read-only_range=10000
      0.96    read-only_range=100
      0.88    read-only_range=10
      0.85    read-only-simple
      0.86    read-only-sum

      This has: (QPS for 8.0.28) / (QPS for 5.7.44)
      • the median result is an rQPS drop of ~12 basis points
      0.91    read-only-count
      0.94    read-only-distinct
      0.89    read-only-order
      0.86    read-only_range=10000
      0.87    read-only_range=100
      0.85    read-only_range=10
      0.90    read-only-simple
      0.87    read-only-sum

      This has: (QPS for 8.0.45) / (QPS for 8.0.28)
      • the median result is an rQPS drop of 5 basis points
      0.89    read-only-count
      0.95    read-only-distinct
      0.95    read-only-order
      0.97    read-only_range=10000
      0.94    read-only_range=100
      0.95    read-only_range=10
      0.93    read-only-simple
      0.93    read-only-sum

      This has: (QPS for 8.4.8) / (QPS for 8.0.45)
      • there are few regressions from 8.0.45 to 8.4.8
      0.99    read-only-count
      0.98    read-only-distinct
      0.99    read-only-order
      1.00    read-only_range=10000
      0.98    read-only_range=100
      0.97    read-only_range=10
      0.97    read-only-simple
      0.98    read-only-sum

      This has: (QPS for 9.7.0) / (QPS for 8.4.8)
      • there are few regressions from 8.4.8 to 9.7.0
      0.97    read-only-count
      0.98    read-only-distinct
      0.96    read-only-order
      0.99    read-only_range=10000
      0.97    read-only_range=100
      0.96    read-only_range=10
      0.99    read-only-simple
      0.97    read-only-sum

      Results: writes

      Summary:
      • from 5.6.51 to 9.7.0 rQPS 
        • the median result is a drop in rQPS of ~33 basis points
      • from 5.6.51 to 5.7.44
        • the median result is an rQPS drop of ~13 basis points
      • from 5.7.44 to 8.0.28
        • the median result is an rQPS drop of ~18 basis points
      • from 8.0.28 to 8.0.45
        • the median result is an rQPS drop of 9 basis points
      • from 8.0.45 to 8.4.8
        • there are few regressions from 8.0.45 to 8.4.8
      • from 8.4.8 to 9.7.0
        • the median result is an rQPS drop of 4 basis points
      This has (QPS for 9.7.0) / (QPS for 5.6.51) and is followed by tables that show the difference between the latest point release in adjacent versions.
      • the median result is an rQPS drop of ~33 basis points
      0.56    delete
      0.54    insert
      0.72    read-write_range=100
      0.66    read-write_range=10
      0.88    update-index
      0.74    update-inlist
      0.60    update-nonindex
      0.58    update-one
      0.60    update-zipf
      0.67    write-only

      This has: (QPS for 5.7.44) / (QPS for 5.6.51)
      • the median result is an rQPS drop of ~13 basis points
      • rQPS improves by 21 basis points for update-index and by 5 basis points for update-inlist
      0.82    delete
      0.80    insert
      0.94    read-write_range=100
      0.88    read-write_range=10
      1.21    update-index
      1.05    update-inlist
      0.86    update-nonindex
      0.85    update-one
      0.86    update-zipf
      0.94    write-only

      This has: (QPS for 8.0.28) / (QPS for 5.7.44)
      • the median result is an rQPS drop of ~18 basis points
      0.80    delete
      0.77    insert
      0.87    read-write_range=100
      0.85    read-write_range=10
      0.94    update-index
      0.79    update-inlist
      0.81    update-nonindex
      0.80    update-one
      0.81    update-zipf
      0.83    write-only

      This has: (QPS for 8.0.45) / (QPS for 8.0.28)
      • the median result is an rQPS drop of 9 basis points
      0.91    delete
      0.90    insert
      0.94    read-write_range=100
      0.94    read-write_range=10
      0.80    update-index
      0.92    update-inlist
      0.91    update-nonindex
      0.92    update-one
      0.91    update-zipf
      0.89    write-only

      This has: (QPS for 8.4.8) / (QPS for 8.0.45)
      • there are few regressions from 8.0.45 to 8.4.8
      0.98    delete
      0.98    insert
      0.98    read-write_range=100
      0.98    read-write_range=10
      0.99    update-index
      0.99    update-inlist
      0.99    update-nonindex
      0.99    update-one
      0.99    update-zipf
      0.99    write-only

      This has: (QPS for 9.7.0) / (QPS for 8.4.8)
      • the median result is an rQPS drop of 4 basis points
      0.95    delete
      1.00    insert
      0.97    read-write_range=100
      0.96    read-write_range=10
      0.97    update-index
      0.97    update-inlist
      0.95    update-nonindex
      0.95    update-one
      0.95    update-zipf
      0.97    write-only

      HammerDB tproc-c on a large server, Postgres 14 to 19 beta1

      This has results for HammerDB tproc-c on a large server using MySQL and Postgres. I am new to HammerDB and still figuring out how to explai...