Tuesday, June 9, 2026

Postgres 19 beta1 vs sysbench on a small server

This has results from sysbench on a small server with Postgres 19 beta1, 18.4 and 17.10. Sysbench is run with low concurrency (1 thread) and a cached database. The purpose is to search for changes in performance, often from new CPU overheads.

tl;dr

  • 19beta1, 18.4 and 17.10 have mostly similar performance
  • There might be small regressions (about 2%) from 17.10 to 19beta1 but my tests are not good at spotting that.
  • 19beta1 is much faster on one test (read-only-count) thanks to a new query plan

Builds, configuration and hardware

I compiled Postgres from source. 

The server is a Beelink SER7 7840HS with an AMD Ryzen 7 7840HS CPU and 32G RAM. Storage uses an NVMe device with ext-4 and discard enabled. The OS is Ubuntu 24.04. 

The config files are here for 17.10, 18.4 and 19 beta1.

Benchmark

I used sysbench and my usage is explained here. To save time I only run 32 of the 42 microbenchmarks 
and most test only 1 type of SQL statement. Benchmarks are run with the database cached by Postgres.

The read-heavy microbenchmarks run for 600 seconds and the write-heavy for 900 seconds.

The benchmark is run with 1 client, 1 table and 50M rows. The purpose is to search for CPU regressions.

Results

The microbenchmarks are split into 4 groups -- 1 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. 

I provide charts below with relative QPS (rQPS). The relative QPS is the following:
(QPS for some version) / (QPS for base version)
When the relative QPS is > 1 then some version is faster than base version.  When it is < 1 then there might be a regression. Values from iostat and vmstat divided by QPS are also provided here. These can help to explain why something is faster or slower because it shows how much HW is used per request.

Here, base version is Postgres 17.10 and some version is either 18.4 or 19 beta1.

I describe performance changes (changes to relative QPS) in terms of basis points. Performance changes by one basis point when the difference in rQPS is 0.01. When rQPS decreases from 0.95 to 0.85 then it changed by 10 basis points.

Results: point queries

Summary:
  • 19beta1 is better than 17.10 by ~3 basis points for most tests
  • 19beta1 is slightly better than 18.4
Relative to Postgres 17.10
col-1 : Postgres 18.4
col-2 : Postgres 19 beta1

col-1   col-2
1.01    1.00    hot-points
0.98    0.97    point-query
1.01    1.03    points-covered-pk
1.00    1.04    points-covered-si
1.00    1.02    points-notcovered-pk
1.00    1.03    points-notcovered-si
0.99    0.99    random-points_range=10
1.00    1.03    random-points_range=100
1.01    1.03    random-points_range=1000

Results: range queries without aggregation

Summary:
  • 19beta1 is worse than 17.10 by ~3 basis points in 4 of 5 tests
  • 19beta1 is better than 17.10 by 5 basis points in the scan test
  • 19beta1 and 18.4 are similar except for the scan test where 19beta1 did better
Relative to Postgres 17.10
col-1 : Postgres 18.4
col-2 : Postgres 19 beta1

col-1   col-2
0.98    0.97    range-covered-pk
0.96    0.96    range-covered-si
0.98    0.98    range-notcovered-pk
0.99    0.99    range-notcovered-si
0.95    1.05    scan

Results: range queries with aggregation

Summary:
  • 19beta1 is worse than than 17.10 on two tests
  • 19beta1 is better than 17.10 on five tests
  • 19beta1 and 17.10 are the same on one test
  • 19beta1 is ~2.5X better than 17.10 on the read-only-count test
  • 19beta1 and 18.4 have similar results except for the read-only-count test
The query for the read-only-count test appears to have a different plan in 19beta1 and that might explain the ~2.5X speedup. In 17.10 and 18.4 it gets Index Scan while in 19beta1 it gets Index Only Scan.

Query plans for the read-only-count test ...

For 17.10
explain SELECT count(c) FROM sbtest1 WHERE id BETWEEN 17704460 AND 17705459
        Aggregate  (cost=1424.42..1424.43 rows=1 width=8)
          ->  Index Scan using sbtest1_pkey on sbtest1  (cost=0.56..1421.93 rows=996 width=121)
                Index Cond: ((id >= 17704460) AND (id <= 17705459))

For 18.4
explain SELECT count(c) FROM sbtest1 WHERE id BETWEEN 11575278 AND 11576277
        Aggregate  (cost=1310.09..1310.10 rows=1 width=8)
          ->  Index Scan using sbtest1_pkey on sbtest1  (cost=0.56..1307.89 rows=882 width=121)
                Index Cond: ((id >= 11575278) AND (id <= 11576277))

For 19beta1
explain SELECT count(c) FROM sbtest1 WHERE id BETWEEN 11686801 AND 11687800
        Aggregate  (cost=32.32..32.33 rows=1 width=8)
          ->  Index Only Scan using sbtest1_pkey on sbtest1  (cost=0.56..30.13 rows=878 width=0)
                Index Cond: ((id >= 11686801) AND (id <= 11687800))

Relative to Postgres 17.10
col-1 : Postgres 18.4
col-2 : Postgres 19 beta1

col-1   col-2
1.04    2.47    read-only-count
1.00    0.99    read-only-distinct
1.02    1.01    read-only-order
0.98    0.97    read-only_range=10
1.00    1.00    read-only_range=100
1.02    1.03    read-only_range=10000
1.09    1.09    read-only-simple
1.01    1.01    read-only-sum

Results: writes

Summary:
  • 19beta1 is worse than 17.10 by 2 to 5 basis points
  • 18.4 is worse than 17.10 by 2 to 3 basis points
Relative to Postgres 17.10
col-1 : Postgres 18.4
col-2 : Postgres 19 beta1

col-1   col-2
0.97    0.97    delete
0.99    0.96    insert
0.98    0.97    read-write_range=10
0.98    0.98    read-write_range=100
0.96    0.95    update-index
0.99    0.97    update-inlist
0.97    0.96    update-nonindex
0.97    0.95    update-one
0.97    0.95    update-zipf
0.98    0.97    write-only

Friday, April 10, 2026

MySQL 9.7.0 vs sysbench on a small server

This has results from sysbench on a small server with MySQL 9.7.0 and 8.4.8. Sysbench is run with low concurrency (1 thread) and a cached database. The purpose is to search for changes in performance, often from new CPU overheads.

I tested MySQL 9.7.0 with and without the hypergraph optimizer enabled. I don't expect it to help much because the queries run here are simple. I hope to learn it doesn't hurt performance in that case.

tl;dr

  • Throughput improves on two tests with the Hypergraph optimizer in 9.7.0 because they get better query plans.
  • One read-only test and several write-heavy tests have small regressions from 8.4.8 to 9.7.0. This might be from new CPU overheads but I don't see obvious problems in the flamegraphs. 

Builds, configuration and hardware

I compiled MySQL from source for versions \8.4.8 and 9.7.0.

The server is an ASUS ExpertCenter PN53 with AMD Ryzen 7 7735HS, 32G RAM and an m.2 device for the database. More details on it are here. The OS is Ubuntu 24.04 and the database filesystem is ext4 with discard enabled.

The my.cnf files os here for 8.4. I call this the z12a configs and variants of it are used for MySQL 5.6 through 8.4.

For 9.7 I use two configs:

All DBMS versions use the latin1 character set as explained here.

Benchmark

I used sysbench and my usage is explained here. To save time I only run 32 of the 42 microbenchmarks and most test only 1 type of SQL statement. Benchmarks are run with the database cached by InnoDB.

The tests are run using 1 table with 50M rows. The read-heavy microbenchmarks run for 600 seconds and the write-heavy for 1800 seconds.

Results

The microbenchmarks are split into 4 groups -- 1 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. 

I provide tables below with relative QPS. When the relative QPS is > 1 then some version is faster than the base version. When it is < 1 then there might be a regression.  The relative QPS (rQPS) is:
(QPS for some version) / (QPS for MySQL 8.4.8) 

Results: point queries

I describe performance changes (changes to relative QPS, rQPS) in terms of basis points. Performance changes by one basis point when the difference in rQPS is 0.01. When rQPS decreases from 0.95 to 0.85 then it changed by 10 basis points.

This shows the rQPS for MySQL 9.7.0 using both the z13a and z13b configs. It is relative to the throughput from MySQL 8.4.8.
  • Throughput with MySQL 9.7.0 is similar to 8.4.8 except for point-query where there are regressions as rQPS drops by 5 and 7 basis points. The point-query test uses simple queries that fetch one column from one row by PK. From vmstat metrics the CPU overhead per query for 9.7.0 is ~8% larger than for 8.4.8, with and without the hypergraph optimizer. I don't see anything obvious in the flamegraphs.
z13a    z13b
0.99    1.01    hot-points
0.95    0.93    point-query
0.99    1.01    points-covered-pk
1.00    1.01    points-covered-si
0.98    1.00    points-notcovered-pk
0.99    1.01    points-notcovered-si
1.00    1.02    random-points_range=1000
0.99    1.01    random-points_range=100
0.96    1.00    random-points_range=10

Results: range queries without aggregation

I describe performance changes (changes to relative QPS, rQPS) in terms of basis points. When rQPS decreases from 0.95 to 0.85 then it changed by 10 basis points.

This shows the rQPS for MySQL 9.7.0 using both the z13a and z13b configs. It is relative to the throughput from MySQL 8.4.8.
  • Throughput with MySQL 9.7.0 is similar to 8.4.8. I am skeptical there is a regression for the scan test with the z13b config. I suspect that is noise.
z13a    z13b
0.99    0.99    range-covered-pk
0.99    0.99    range-covered-si
0.99    0.99    range-notcovered-pk
0.98    0.98    range-notcovered-si
1.00    0.96    scan

Results: range queries with aggregation

I describe performance changes (changes to relative QPS, rQPS) in terms of basis points. When rQPS decreases from 0.95 to 0.85 then it changed by 10 basis points.

This shows the rQPS for MySQL 9.7.0 using both the z13a and z13b configs. It is relative to the throughput from MySQL 8.4.8.
  • There might be small regressions in several tests with rQPS dropping by a few points but I will ignore that for now.
  • There is a large improvement for the read-only-distinct test with the z13b config. The query for this test is select distinct c from sbtest where id between ? and ? order by c. The reason for the performance improvment is that the hypergraph optimizer chooses a better plan, see here.
  • There is a large improvement for the read-only test with range=10000. This test uses the read-only version of the classic sysbench transaction (see here). One of the queries it runs is the query used by read-only-distinct. So it benefits from the better plan for that query. 
z13a    z13b
0.97    0.97    read-only-count
0.98    1.26    read-only-distinct
0.96    0.95    read-only-order
0.99    1.15    read-only_range=10000
0.97    1.00    read-only_range=100
0.96    0.97    read-only_range=10
0.99    0.99    read-only-simple
0.97    0.96    read-only-sum

Results: writes

I describe performance changes (changes to relative QPS, rQPS) in terms of basis points. When rQPS decreases from 0.95 to 0.85 then it changed by 10 basis points.

This shows the rQPS for MySQL 9.7.0 using both the z13a and z13b configs. It is relative to the throughput from MySQL 8.4.8.
  • There might be several small regressions here. I don't see obvious problems in the flamegraphs.
z13a    z13b
0.95    0.92    delete
1.00    1.01    insert
0.97    0.98    read-write_range=100
0.96    0.95    read-write_range=10
0.97    0.96    update-index
0.97    0.92    update-inlist
0.95    0.93    update-nonindex
0.95    0.92    update-one
0.95    0.93    update-zipf
0.97    0.95    write-only

Thursday, April 9, 2026

Sysbench vs MySQL on a small server: another way to view the regressions

This post provides another way to see the performance regressions in MySQL from versions 5.6 to 9.7. It complements what I shared in a recent post. The workload here is cached by InnoDB and my focus is on regressions from new CPU overheads. 

The good news is that there are few regressions after 8.0. The bad news is that there were many prior to that and these are unlikely to be undone.

    tl;dr

    • for point queries
      • there are large regressions from 5.6.51 to 5.7.44, 5.7.44 to 8.0.28 and 8.0.28 to 8.0.45
      • there are few regressions from 8.0.45 to 8.4.8 to 9.7.0
    • for range queries without aggregation
      • there are large regressions from 5.6.51 to 5.7.44 and 5.7.44 to 8.0.28
      • there are mostly small regressions from 8.0.28 to 8.0.45, but scan has a large regression
      • there are few regressions from 8.0.45 to 8.4.8 to 9.7.0
    • for range queries with aggregation
      • there are large regressions from 5.6.51 to 5.7.44 with two improvements
      • there are large regressions from 5.7.44 to 8.0.28
      • there are small regressions from 8.0.28 to 8.0.45
      • there are few regressions from 8.0.45 to 8.4.8 to 9.7.0
    • for writes
      • there are large regressions from 5.6.51 to 5.7.44 and 5.7.44 to 8.0.28
      • there are small regressions from 8.0.28 to 8.0.45
      • there are few regressions from 8.0.45 to 8.4.8
      • there are a few small regressions from 8.4.8 to 9.7.0

    Builds, configuration and hardware

    I compiled MySQL from source for versions 5.6.51, 5.7.44, 8.0.28, 8.0.45, 8.4.8 and 9.7.0.

    The server is an ASUS ExpertCenter PN53 with AMD Ryzen 7 7735HS, 32G RAM and an m.2 device for the database. More details on it are here. The OS is Ubuntu 24.04 and the database filesystem is ext4 with discard enabled.

    The my.cnf files are here for 5.65.7 and 8.4. I call these the z12a configs.

    For 9.7 I use the z13a config. It is as close as possible to z12a and adds two options for gtid-related features to undo a default config change that arrived in 9.6. 

    All DBMS versions use the latin1 character set as explained here.

    Benchmark

    I used sysbench and my usage is explained here. To save time I only run 32 of the 42 microbenchmarks and most test only 1 type of SQL statement. Benchmarks are run with the database cached by InnoDB.

    The tests are run using 1 table with 50M rows. The read-heavy microbenchmarks run for 600 seconds and the write-heavy for 1800 seconds.

    Results

    The microbenchmarks are split into 4 groups -- 1 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. 

    I provide tables below with relative QPS. When the relative QPS is > 1 then some version is faster than the base version. When it is < 1 then there might be a regression.  The relative QPS (rQPS) is:
    (QPS for some version) / (QPS for base version) 
    Results: point queries

    MySQL 5.6.51 gets from 1.18X to 1.61X more QPS than 9.7.0 on point queries. It is easier for me to write about this in terms of relative QPS (rQPS) which is as low as 0.62 for MySQL 9.7.0 vs 5.6.51. I define a basis point to mean a change of 0.01 in rQPS.

    Summary:
    • from 5.6.51 to 9.7.0
      • the median regression is a drop in rQPS of 27 basis points
    • from 5.6.51 to 5.7.44
      • the median regression is a drop in rQPS of 11 basis points
    • from 5.7.44 to 8.0.28
      • the median regression is a drop in rQPS of 25 basis points
    • from 8.0.28 to 8.0.45
      • 7 of 9 tests get more QPS with 8.0.45
      • 2 tests have regressions where rQPS drops by ~6 basis points
    • from 8.0.45 to 8.4.8
      • there are few regressions
    • from 8.4.8 to 9.7.0
      • there are few regressions
    This has (QPS for 9.7.0) / (QPS for 5.6.51) and is followed by tables that show the difference between the latest point release in adjacent versions.
    • the largest regression is an rQPS drop of 38 basis points for point-query. Compared to most of the other tests in this section, this query does less work in the storage engine which implies the regression is from code above the storage engine.
    • the smallest regression is an rQPS drop of 15 basis points for random-points_range=1000. The regression for the same query with a shorter range (=10, =100) is larger. That implies, at least for this query, that the regression is for something above the storage engine (optimizer, parser, etc).
    • the median regression is an rQPS drop of 27 basis points
    0.65    hot-points
    0.62    point-query
    0.72    points-covered-pk
    0.78    points-covered-si
    0.73    points-notcovered-pk
    0.76    points-notcovered-si
    0.85    random-points_range=1000
    0.73    random-points_range=100
    0.66    random-points_range=10

    This has: (QPS for 5.7.44) / (QPS for 5.6.51)
    • the largest regression is an rQPS drop of 14 basis points for hot-points.
    • the next largest regression is an rQPS drop of 13 basis points for random-points with range=10. The regressions for that query are smaller when a larger range is used =100, =1000 and this implies the problem is above the storage engine. 
    • the median regression is an rQPS drop of 11 basis points
    0.86    hot-points
    0.90    point-query
    0.89    points-covered-pk
    0.90    points-covered-si
    0.89    points-notcovered-pk
    0.88    points-notcovered-si
    1.00    random-points_range=1000
    0.89    random-points_range=100
    0.87    random-points_range=10

    This has: (QPS for 8.0.28) / (QPS for 5.7.44)
    • the largest regression is an rQPS drop of 66 basis points for random-points with range=1000. The regression for that same query with smaller ranges (=10, =100) is smaller. This implies the problem is in the storage engine.
    • the second largest regression is an rQPS drop of 35 basis points for hot-points
    • the median regression is an rQPS drop of 25 basis points
    0.65    hot-points
    0.82    point-query
    0.74    points-covered-pk
    0.75    points-covered-si
    0.76    points-notcovered-pk
    0.84    points-notcovered-si
    0.34    random-points_range=1000
    0.75    random-points_range=100
    0.86    random-points_range=10

    This has: (QPS for 8.0.45) / (QPS for 8.0.28)
    • at last, there are many improvements. Some are from a fix for bug 102037 which I found with help from sysbench
    • the regressions, with rQPS drops by ~6 basis points, are for queries that do less work in the storage engine relative to the other tests in this section
    1.20    hot-points
    0.93    point-query
    1.13    points-covered-pk
    1.19    points-covered-si
    1.09    points-notcovered-pk
    1.04    points-notcovered-si
    2.48    random-points_range=1000
    1.12    random-points_range=100
    0.94    random-points_range=10

    This has: (QPS for 8.4.8) / (QPS for 8.0.45)
    • there are few regressions from 8.0.45 to 8.4.8
    0.99    hot-points
    0.96    point-query
    0.99    points-covered-pk
    0.98    points-covered-si
    1.00    points-notcovered-pk
    0.99    points-notcovered-si
    1.00    random-points_range=1000
    1.00    random-points_range=100
    0.98    random-points_range=10

    This has: (QPS for 9.7.0) / (QPS for 8.4.8)
    • there are few regressions from 8.4.8 to 9.7.0
    0.99    hot-points
    0.95    point-query
    0.99    points-covered-pk
    1.00    points-covered-si
    0.98    points-notcovered-pk
    0.99    points-notcovered-si
    1.00    random-points_range=1000
    0.99    random-points_range=100
    0.96    random-points_range=10

    Results: range queries without aggregation

    MySQL 5.6.51 gets from 1.35X to 1.52X more QPS than 9.7.0 on range queries without aggregation. It is easier for me to write about this in terms of relative QPS (rQPS) which is as low as 0.66 for MySQL 9.7.0 vs 5.6.51. I define a basis point to mean a change of 0.01 in rQPS.

    Summary:
    • from 5.6.51 to 9.7.0
      • the median regression is drop in rQPS of 33 basis points
    • from 5.6.51 to 5.7.44
      • the median regression is a drop in rQPS of 16 basis points
    • from 5.7.44 to 8.0.28
      • the median regression is a drop in rQPS ~10 basis points
    • from 8.0.28 to 8.0.45
      • the median regression is a drop in rQPS of 5 basis points
    • from 8.0.45 to 8.4.8
      • there are few regressions from 8.0.45 to 8.4.8
    • from 8.4.8 to 9.7.0
      • there are few regressions from 8.4.8 to 9.7.0
    This has (QPS for 9.7.0) / (QPS for 5.6.51) and is followed by tables that show the difference between the latest point release in adjacent versions.
    • all tests have large regressions with an rQPS drop that ranges from 26 to 34 basis points
    • the median regression is an rQPS drop of 33 basis points
    0.66    range-covered-pk
    0.67    range-covered-si
    0.66    range-notcovered-pk
    0.74    range-notcovered-si
    0.67    scan

    This has: (QPS for 5.7.44) / (QPS for 5.6.51)
    • all tests have large regressions with an rQPS drop that ranges from 12 to 17 basis points
    • the median regression is an rQPS drop of 16 basis points
    0.85    range-covered-pk
    0.84    range-covered-si
    0.84    range-notcovered-pk
    0.88    range-notcovered-si
    0.83    scan

    This has: (QPS for 8.0.28) / (QPS for 5.7.44)
    • 4 of 5 tests have regressions with an rQPS drop that ranges from 10 to 14 basis points
    • the median regression is ~10 basis points
    • rQPS improves for the scan test
    0.86    range-covered-pk
    0.89    range-covered-si
    0.90    range-notcovered-pk
    0.90    range-notcovered-si
    1.04    scan

    This has: (QPS for 8.0.45) / (QPS for 8.0.28)
    • all tests are slower in 8.0.45 than 8.0.28, but the regression for 3 of 5 is <= 5 basis points
    • rQPS in the scan test drops by 21 basis points
    • the median regression is an rQPS drop of 5 basis points
    0.96    range-covered-pk
    0.95    range-covered-si
    0.91    range-notcovered-pk
    0.96    range-notcovered-si
    0.79    scan

    This has: (QPS for 8.4.8) / (QPS for 8.0.45)
    • there are few regressions from 8.0.45 to 8.4.8
    0.95    range-covered-pk
    0.95    range-covered-si
    0.98    range-notcovered-pk
    0.99    range-notcovered-si
    0.98    scan

    This has: (QPS for 9.7.0) / (QPS for 8.4.8)
    • there are few regressions from 8.4.8 to 9.7.0
    0.99    range-covered-pk
    0.99    range-covered-si
    0.99    range-notcovered-pk
    0.98    range-notcovered-si
    1.00    scan

    Results: range queries with aggregation

    Summary:
    • from 5.6.51 to 9.7.0 rQPS
      • the median result is a drop in rQPS of ~30 basis points
    • from 5.6.51 to 5.7.44
      • the median result is a drop in rQPS of ~10 basis points
    • from 5.7.44 to 8.0.28
      • the median result is a drop in rQPS of ~12 basis points
    • from 8.0.28 to 8.0.45
      • the median result is an rQPS drop of 5 basis points
    • from 8.0.45 to 8.4.8
      • there are few regressions from 8.0.45 to 8.4.8
    • from 8.4.8 to 9.7.0
      • there are few regressions from 8.4.8 to 9.7.0
    This has (QPS for 9.7.0) / (QPS for 5.6.51) and is followed by tables that show the difference between the latest point release in adjacent versions.
    • the median result is a drop in rQPS of ~30 basis points
    • rQPS for the read-only-distinct test improves by 25 basis point
    0.67    read-only-count
    1.25    read-only-distinct
    0.75    read-only-order
    1.02    read-only_range=10000
    0.74    read-only_range=100
    0.66    read-only_range=10
    0.69    read-only-simple
    0.66    read-only-sum

    This has: (QPS for 5.7.44) / (QPS for 5.6.51)
    • the median result is an rQPS drop of ~10 basis points
    • rQPS improves by 45 basis points for read-only-distinct and by 23 basis points for read-only with the largest range (=10000)
    0.86    read-only-count
    1.45    read-only-distinct
    0.93    read-only-order
    1.23    read-only_range=10000
    0.96    read-only_range=100
    0.88    read-only_range=10
    0.85    read-only-simple
    0.86    read-only-sum

    This has: (QPS for 8.0.28) / (QPS for 5.7.44)
    • the median result is an rQPS drop of ~12 basis points
    0.91    read-only-count
    0.94    read-only-distinct
    0.89    read-only-order
    0.86    read-only_range=10000
    0.87    read-only_range=100
    0.85    read-only_range=10
    0.90    read-only-simple
    0.87    read-only-sum

    This has: (QPS for 8.0.45) / (QPS for 8.0.28)
    • the median result is an rQPS drop of 5 basis points
    0.89    read-only-count
    0.95    read-only-distinct
    0.95    read-only-order
    0.97    read-only_range=10000
    0.94    read-only_range=100
    0.95    read-only_range=10
    0.93    read-only-simple
    0.93    read-only-sum

    This has: (QPS for 8.4.8) / (QPS for 8.0.45)
    • there are few regressions from 8.0.45 to 8.4.8
    0.99    read-only-count
    0.98    read-only-distinct
    0.99    read-only-order
    1.00    read-only_range=10000
    0.98    read-only_range=100
    0.97    read-only_range=10
    0.97    read-only-simple
    0.98    read-only-sum

    This has: (QPS for 9.7.0) / (QPS for 8.4.8)
    • there are few regressions from 8.4.8 to 9.7.0
    0.97    read-only-count
    0.98    read-only-distinct
    0.96    read-only-order
    0.99    read-only_range=10000
    0.97    read-only_range=100
    0.96    read-only_range=10
    0.99    read-only-simple
    0.97    read-only-sum

    Results: writes

    Summary:
    • from 5.6.51 to 9.7.0 rQPS 
      • the median result is a drop in rQPS of ~33 basis points
    • from 5.6.51 to 5.7.44
      • the median result is an rQPS drop of ~13 basis points
    • from 5.7.44 to 8.0.28
      • the median result is an rQPS drop of ~18 basis points
    • from 8.0.28 to 8.0.45
      • the median result is an rQPS drop of 9 basis points
    • from 8.0.45 to 8.4.8
      • there are few regressions from 8.0.45 to 8.4.8
    • from 8.4.8 to 9.7.0
      • the median result is an rQPS drop of 4 basis points
    This has (QPS for 9.7.0) / (QPS for 5.6.51) and is followed by tables that show the difference between the latest point release in adjacent versions.
    • the median result is an rQPS drop of ~33 basis points
    0.56    delete
    0.54    insert
    0.72    read-write_range=100
    0.66    read-write_range=10
    0.88    update-index
    0.74    update-inlist
    0.60    update-nonindex
    0.58    update-one
    0.60    update-zipf
    0.67    write-only

    This has: (QPS for 5.7.44) / (QPS for 5.6.51)
    • the median result is an rQPS drop of ~13 basis points
    • rQPS improves by 21 basis points for update-index and by 5 basis points for update-inlist
    0.82    delete
    0.80    insert
    0.94    read-write_range=100
    0.88    read-write_range=10
    1.21    update-index
    1.05    update-inlist
    0.86    update-nonindex
    0.85    update-one
    0.86    update-zipf
    0.94    write-only

    This has: (QPS for 8.0.28) / (QPS for 5.7.44)
    • the median result is an rQPS drop of ~18 basis points
    0.80    delete
    0.77    insert
    0.87    read-write_range=100
    0.85    read-write_range=10
    0.94    update-index
    0.79    update-inlist
    0.81    update-nonindex
    0.80    update-one
    0.81    update-zipf
    0.83    write-only

    This has: (QPS for 8.0.45) / (QPS for 8.0.28)
    • the median result is an rQPS drop of 9 basis points
    0.91    delete
    0.90    insert
    0.94    read-write_range=100
    0.94    read-write_range=10
    0.80    update-index
    0.92    update-inlist
    0.91    update-nonindex
    0.92    update-one
    0.91    update-zipf
    0.89    write-only

    This has: (QPS for 8.4.8) / (QPS for 8.0.45)
    • there are few regressions from 8.0.45 to 8.4.8
    0.98    delete
    0.98    insert
    0.98    read-write_range=100
    0.98    read-write_range=10
    0.99    update-index
    0.99    update-inlist
    0.99    update-nonindex
    0.99    update-one
    0.99    update-zipf
    0.99    write-only

    This has: (QPS for 9.7.0) / (QPS for 8.4.8)
    • the median result is an rQPS drop of 4 basis points
    0.95    delete
    1.00    insert
    0.97    read-write_range=100
    0.96    read-write_range=10
    0.97    update-index
    0.97    update-inlist
    0.95    update-nonindex
    0.95    update-one
    0.95    update-zipf
    0.97    write-only

    Wednesday, April 8, 2026

    The Insert Benchmark vs MariaDB 10.2 to 13.0 on a 32-core server

    This has results for MariaDB versions 10.2 through 13.0 vs the Insert Benchmark on a 32-core server. The goal is to see how performance changes over time to find regressions or highlight improvements. My previous post has results from a 24-core server.  Differences between these servers include:

    • RAM - 32-core server has 128G, 24-core server has 64G
    • fsync latency - 32-core has an SSD with high fsync latency, while it is fast on the 24-core server
    • sockets - 32-core server has 1 CPU socket, 24-core server has two
    • CPU maker  - 32-core server uses an AMD Threadripper, 24-core server has an Intel Xeon
    • cores - obviously it is 32 vs 24, Intel HT and AMD SMT are disabled

    The results here for modern MariaDB are great for the CPU-bound workload but not for the IO-bound workload.. They were great for both on the 24-core server. The regressions are likely caused by the extra fsync calls that are done because the equivalent of equivalent of innodb_flush_method =O_DIRECT_NO_FSYNC was lost with the new options that replace innodb_flush_method starting in MariaDB 11.4. I created MDEV-33545 to request support for it. The workaround is to use an SSD that doesn't have high fsync latency, which is always a good idea, but not always possible.

    tl;dr

    • for a CPU-bound workload
      • the write-heavy steps are much faster in 13.0.0 than 10.2.30
      • the read-heavy steps get similar QPS in 13.0.0 and 10.2.30
      • this is similar to the results on the 24-core server
    • for an IO-bound workload
      • the initial load (l.i0) is much faster in 13.0.0 than 10.2.30
      • the random write step (l.i1) is slower in 13.0.0 than 10.2.30 because fsync latency
      • the range query step (qr100) gets similar QPS in 13.0.0 and 10.2.30
      • the point query step (qp100) is much slower in 13.0.0 than 10.2.30 because fsync latency

    Builds, configuration and hardware

    I compiled MariaDB from source for versions 10.2.30, 10.2.44, 10.3.39, 10.4.34, 10.5.29, 10.6.25, 10.11.16, 11.4.10, 11.8.6, 12.3.1 and 13.0.0.

    The server has 24-cores, 2-sockets and 64G of RAM. Storage is 1 NVMe device with ext-4 and discard enabled. The OS is Ubuntu 24.04. Intel HT is disabled.

    The my.cnf files are here for: 10.210.310.410.510.610.1111.411.812.3 and 13.0

    For MariaDB 10.11.16 I used both the z12a config, as I did for all 10.x releases, and also used the z12b config. The difference is that the z12a config uses innodb_flush_method =O_DIRECT_NO_FSYNC while the z12b config uses =O_DIRECT. And the z12b config is closer to the configs used for MariaDB because with the new variables that replaced innodb_flush_method, we lose support for the equivalent of =O_DIRECT_NO_FSYNC.

    And I write about this because the extra fsync calls that are done when the z12b config is used have a large impact on throughput on a server that uses an SSD with high fsync latency, which causes perf regressions for all DBMS versions that used the z12b config -- 10.11.16, 11.4, 11.8, 12.3 and 13.0.

    The Benchmark

    The benchmark is explained here and is run with 12 clients with a table per client. I repeated it with two workloads:
    • CPU-bound
      • the values for X, Y, Z are 10M, 16M, 4M
    • IO-bound
      • the values for X, Y, Z are 300M, 4M, 1M
    The point query (qp100, qp500, qp1000) and range query (qr100, qr500, qr1000) steps are run for 1800 seconds each.

    The benchmark steps are:

    • l.i0
      • insert X rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.
    • l.x
      • create 3 secondary indexes per table. There is one connection per client.
    • l.i1
      • use 2 connections/client. One inserts Y rows per table and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.
    • l.i2
      • like l.i1 but each transaction modifies 5 rows (small transactions) and Z rows are inserted and deleted per table.
      • Wait for S seconds after the step finishes to reduce variance during the read-write benchmark steps that follow. The value of S is a function of the table size.
    • qr100
      • use 3 connections/client. One does range queries and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested. This step is frequently not IO-bound for the IO-bound workload.
    • qp100
      • like qr100 except uses point queries on the PK index
    • qr500
      • like qr100 but the insert and delete rates are increased from 100/s to 500/s
    • qp500
      • like qp100 but the insert and delete rates are increased from 100/s to 500/s
    • qr1000
      • like qr100 but the insert and delete rates are increased from 100/s to 1000/s
    • qp1000
      • like qp100 but the insert and delete rates are increased from 100/s to 1000/s
    Results: overview

    The performance reports are here for the CPU-bound and IO-bound workloads.

    The summary sections from the performances report have 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version from the first row of the table. The third shows the background insert rate for benchmark steps with background inserts. The second table makes it easy to see how performance changes over time. The third table makes it easy to see which DBMS+configs failed to meet the SLA.

    Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is the result for some version. The base version is MariaDB 10.2.30.

    When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: 
    • insert/s for l.i0, l.i1, l.i2
    • indexed rows/s for l.x
    • range queries/s for qr100, qr500, qr1000
    • point queries/s for qp100, qp500, qp1000
    This statement doesn't apply to this blog post, but I keep it here for copy/paste into future posts. Below I use colors to highlight the relative QPS values with red for <= 0.95, green for >= 1.05 and grey for values between 0.95 and 1.05.

    Results: CPU-bound

    The performance summary is here.

    The summary per benchmark step, where rQPS means relative QPS.
    • l.i0
      • MariaDB 13.0.0 is faster than 10.2.30, rQPS is 1.47
      • CPU per insert (cpupq) and KB written to storage per insert (wKBpi) are much smaller in 13.0.0 than 10.2.30 (see here)
    • l.x
      • I will ignore this
    • l.i1, l.i2
      • MariaDB 13.0.0 is faster than 10.2.30, rQPS is 1.50 and 1.37
      • CPU per write (cpupq) is much smaller in 13.0.0 than 10.2.30 (see here)
    • qr100, qr500, qr1000
      • MariaDB 13.0.0 and 10.2.30 have similar QPS (rQPS is close to 1.0)
      • CPU per query (cqpq) is similar in 13.0.0 and 10.2.30 (see here)
    • qp100, qp500, qp1000
      • MariaDB 13.0.0 and 10.2.30 have similar QPS (rQPS is close to 1.0)
      • CPU per query (cqpq) is similar in 13.0.0 and 10.2.30 (see here)

    Results: IO-bound

    The performance summary is here.

    The summary per benchmark step, where rQPS means relative QPS.
    • l.i0
      • MariaDB 13.0.0 is faster than 10.2.30, rQPS is 1.25
      • CPU per insert (cpupq) and KB written to storage per insert (wKBpi) are much smaller in 13.0.0 than 10.2.30 (see here)
    • l.x
      • I will ignore this
    • l.i1, l.i2
      • MariaDB 13.0.0 is slower than 10.2.30 for l.i1, rQPS is 0.68
      • MariaDB 13.0.0 is faster than 10.2.30 for l.i2, rQPS is 1.31. I suspect it is faster on l.i2 because it inherits less MVCC GC debt from l.i1 because it was slower on l.i1. So I won't celebrate this result and will focus on l.i1.
      • From the normalized vmstat and iostat metrics I don't see anything obvious. But I do see a reduction in storage reads/s (rps) and storage read MB/s (rMBps). And this reduction starts in 10.11.16 with the z12b config and continues to 13.0.0. This does not occur on the earlier releases that are eable to use the z12a config. So I am curious if the extra fsyncs are the root cause.
      • From the iostat summary for l.i1 that includes average values for all iostat columns, and these are not divided by QPS, what I see a much higher rate for fsyncs (f/s) as well as an increase in read latency. For MariaDB 10.11.16 the value for r_await is 0.640 with the z12a config vs 0.888 with the z12b config. I assume that more frequent fsync calls hurt read latency. The iostat results don't look great for either the z12a or z12b config and the real solution is to avoid using an SSD with high fsync latency, but that isn't always possible.
    • qr100, qr500, qr1000
      • no DBMS versions were able to sustain the target write rate for qr500 or qr1000 so I ignore them. This server needs more IOPs capacity -- a second SSD, and both SSDs needs power loss protection to reduce fsync latency.
      • MariaDB 13.0.0 and 10.2.30 have similar performance, rQPS is 0.96The qr100 step for MariaDB 13.0.0 might not suffer from fsync latency like the qp100 step because it does less read IO per query than qp100 (see rpq here).
    • qp100, qp500, qp1000
      • no DBMS versions were able to sustain the target write rate for qp500 or qp1000 so I ignore them. This server needs more IOPs capacity -- a second SSD, and both SSDs needs power loss protection to reduce fsync latency.
      • MariaDB 13.0.0 is slower than 10.2.30, rQPS is 0.62
      • From the normalized vmstat and iostat metrics there are increases in CPU per query (cpupq) and storage reads per query (rpq) for all DBMS versions that use the z12b config (see here).
      • From the iostat summary for qp100 that includes average values for all iostat columns the read latency increases for all DBMS versions that use the z12b config. I blame interference from the extra fsync calls.
























    Tuesday, April 7, 2026

    The Insert Benchmark vs MariaDB 10.2 to 13.0 on a 24-core server

    This has results for MariaDB versions 10.2 through 13.0 vs the Insert Benchmark on a 24-core server. The goal is to see how performance changes over time to find regressions or highlight improvements.

    MariaDB 13.0.0 is faster than 10.2.30 on most benchmark steps and otherwise as fast as 10.2.30. This is a great result.

    tl;dr

    • for a CPU-bound workload
      • the write-heavy steps are much faster in 13.0.0 than 10.2.30
      • the read-heavy steps get similar QPS in 13.0.0 and 10.2.30
    • for an IO-bound workload
      • most of the write-heavy steps are much faster in 13.0.0 than 10.2.30
      • the point-query heavy steps get similar QPS in 13.0.0 and 10.2.30
      • the range-query heavy steps get more QPS in 13.0.0 than 10.2.30

    Builds, configuration and hardware

    I compiled MariaDB from source for versions 10.2.30, 10.2.44, 10.3.39, 10.4.34, 10.5.29, 10.6.25, 10.11.16, 11.4.10, 11.8.6, 12.3.1 and 13.0.0.

    The server has 24-cores, 2-sockets and 64G of RAM. Storage is 1 NVMe device with ext-4 and discard enabled. The OS is Ubuntu 24.04. Intel HT is disabled.

    The my.cnf files are here for: 10.2, 10.3, 10.4, 10.5, 10.6, 10.11, 11.4, 11.8, 12.3 and 13.0.

    The Benchmark

    The benchmark is explained here and is run with 8 clients with a table per client. I repeated it with two workloads:
    • CPU-bound
      • the values for X, Y, Z are 10M, 16M, 4M
    • IO-bound
      • the values for X, Y, Z are 250M, 4M, 1M
    The point query (qp100, qp500, qp1000) and range query (qr100, qr500, qr1000) steps are run for 1800 seconds each.

    The benchmark steps are:

    • l.i0
      • insert X rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.
    • l.x
      • create 3 secondary indexes per table. There is one connection per client.
    • l.i1
      • use 2 connections/client. One inserts Y rows per table and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.
    • l.i2
      • like l.i1 but each transaction modifies 5 rows (small transactions) and Z rows are inserted and deleted per table.
      • Wait for S seconds after the step finishes to reduce variance during the read-write benchmark steps that follow. The value of S is a function of the table size.
    • qr100
      • use 3 connections/client. One does range queries and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested. This step is frequently not IO-bound for the IO-bound workload.
    • qp100
      • like qr100 except uses point queries on the PK index
    • qr500
      • like qr100 but the insert and delete rates are increased from 100/s to 500/s
    • qp500
      • like qp100 but the insert and delete rates are increased from 100/s to 500/s
    • qr1000
      • like qr100 but the insert and delete rates are increased from 100/s to 1000/s
    • qp1000
      • like qp100 but the insert and delete rates are increased from 100/s to 1000/s
    Results: overview

    The performance reports are here for the CPU-bound and IO-bound workloads.

    The summary sections from the performances report have 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version from the first row of the table. The third shows the background insert rate for benchmark steps with background inserts. The second table makes it easy to see how performance changes over time. The third table makes it easy to see which DBMS+configs failed to meet the SLA.

    Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is the result for some version. The base version is MariaDB 10.2.30.

    When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: 
    • insert/s for l.i0, l.i1, l.i2
    • indexed rows/s for l.x
    • range queries/s for qr100, qr500, qr1000
    • point queries/s for qp100, qp500, qp1000
    This statement doesn't apply to this blog post, but I keep it here for copy/paste into future posts. Below I use colors to highlight the relative QPS values with red for <= 0.95, green for >= 1.05 and grey for values between 0.95 and 1.05.

    Results: CPU-bound

    The performance summary is here.

    The summary per benchmark step, where rQPS means relative QPS.
    • l.i0
      • MariaDB 13.0.0 is faster than 10.2.30 (rQPS is 1.22)
      • KB written to storage per insert (wKBpi) and CPU per insert (cpupq) are smaller in 13.0.0 than 10.2.30, see here
    • l.x
      • I will ignore this
    • l.i1, l.i2
      • MariaDB 13.0.0 is faster than 10.2.30 (rQPS is 1.21 and 1.45)
      • for l.i1, CPU per insert (cpupq) is smaller in 13.0.0 than 10.2.30 but KB written to storage per insert (wKBpi) and the context switch rate (cspq) are larger in 13.0.0 than 10.2.30, see here
      • for l.i2, CPU per insert (cpupq) and KB written to storage per insert (wKBpi) are smaller in 13.0.0 than 10.2.30 but the context switch rate (cspq) is larger in 13.0.0 than 10.2.30, see here
    • qr100, qr500, qr1000
      • MariaDB 13.0.0 and 10.2.30 have similar QPS (rQPS is close to 1.0)
      • the results from vmstat and iostat are less useful here because the write rate in 10.2 to 10.4 was much larger than 10.5+. While the my.cnf settings are as close as possible across all versions, it looks like furious flushing was enabled in 10.2 to 10.4 and I need to figure out whether it is possible to disable that.
    • qp100, qp500, qp1000
      • MariaDB 13.0.0 and 10.2.30 have similar QPS (rQPS is close to 1.0)
      • what I wrote above for vmstat and iostat with the qr* test also applies here
    Results: IO-bound

    The performance summary is here.

    The summary per benchmark step, where rQPS means relative QPS.
    • l.i0
      • MariaDB 13.0.0 is faster than 10.2.30 (rQPS is 1.16)
      • KB written to storage per insert (wKBpi) and CPU per insert (cpupq) are smaller in 13.0.0 than 10.2.30, see here
    • l.x
      • I will ignore this
    • l.i1, l.i2
      • MariaDB 13.0.0 and 10.2.30 have the same QPS for l.i1 while 13.0.0 is faster for l.i2 (rQPS is 1.03 and 3.70). It is odd that QPS drops from 12.3.1 to 13.0.0 on the l.i1 step.
      • for l.i1, CPU per insert (cpupq) and the context switch rate (cspq) are larger in 13.0.0 than 12.3.1, see here. The flamegraphs, that I have not shared, look similar. From iostat results there is much more discard (TRIM, SSD GC) in progress with 13.0.0 than 12.3.1 and the overhead from that might explain the difference.
      • for l.i2, almost everything looks better in 13.0.0 than 10.2.30. Unlike what occurs for the l.i1 step, the results for 13.0.0 are similar to 12.3.1, see here.
    • qr100, qr500, qr1000
      • no DBMS versions were able to sustain the target write rate for qr1000 so I ignore that step
      • MariaDB 13.0.0 and 10.2.30 have similar QPS (rQPS is close to 1.0)
      • the results from vmstat and iostat are less useful here because the write rate in 10.2 to 10.4 was much larger than 10.5+. While the my.cnf settings are as close as possible across all versions, it looks like furious flushing was enabled in 10.2 to 10.4 and I need to figure out whether it is possible to disable that.
    • qp100, qp500, qp1000
      • no DBMS versions were able to sustain the target write rate for qr1000 so I ignore that step
      • MariaDB 13.0.0 is faster than 10.2.30 (rQPS is 1.17 and 1.56)
      • what I wrote above for vmstat and iostat with the qr* test also applies here






    Sunday, April 5, 2026

    Sysbench vs MariaDB on a small server: using the same charset for all versions

    This has results for sysbench vs MariaDB on a small server. I repeated tests using the same charset (latin1) for all versions as explained here. In previous results I used a multi-byte charset for modern MariaDB (probably 11.4+) by mistake and that adds a 5% CPU overhead for many tests.

    tl;dr

    • MariaDB has done much better than MySQL at avoid regressions from code bloat.
    • There are several performance improvements in MariaDB 12.3 and 13.0
    • For reads there are small regressions and frequent improvements.
    • For writes there are  regressions up to 10%, and the biggest contributor is MariaDB 11.4

    Builds, configuration and hardware

    I compiled MariaDB from source for versions 10.2.30, 10.2.44, 10.3.39, 10.4.34, 10.5.29, 10.6.25, 10.11.16, 11.4.10, 11.8.6, 12.3.1 and 13.0.0.

    The server is an ASUS ExpertCenter PN53 with AMD Ryzen 7 7735HS, 32G RAM and an m.2 device for the database. More details on it are here. The OS is Ubuntu 24.04 and the database filesystem is ext4 with discard enabled.

    The my.cnf files are here for 10.2, 10.3, 10.4, 10.5, 10.6, 10.11, 11.4, 11.8, 12.3 and 13.0.

    Benchmark

    I used sysbench and my usage is explained here. To save time I only run 32 of the 42 microbenchmarks and most test only 1 type of SQL statement. Benchmarks are run with the database cached by InnoDB.

    The tests are run using 1 table with 50M rows. The read-heavy microbenchmarks run for 600 seconds and the write-heavy for 1800 seconds.

    Results

    The microbenchmarks are split into 4 groups -- 1 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. 

    I provide tables below with relative QPS. When the relative QPS is > 1 then some version is faster than the base version. When it is < 1 then there might be a regression.  The relative QPS is:
    (QPS for some version) / (QPS for MariaDB 10.2.30) 
    Values from iostat and vmstat divided by QPS are hereThese can help to explain why something is faster or slower because it shows how much HW is used per request.

    The spreadsheet with results and charts is here. Files with performance summaries are here.

    Results: point queries

    Summary
    • The y-axis starts at 0.8 to improve readability.
    • Modern MariaDB (13.0) is faster than old MariaDB (10.2) in 7 of 9 tests
      • There were regressions from 10.2 through 10.5
      • Performance has been improving from 10.6 through 13.0

    Results: range queries without aggregation

    Summary
    • The y-axis starts at 0.8 to improve readability.
    • Modern MariaDB (13.0) is faster than old MariaDB (10.2) in 2 of 5 tests
      • There were regressions from 10.2 through 10.5, then performance was stable from 10.6 though 11.8, and now performance has improved in 12.3 and 13.0.
    Results: range queries with aggregation

    Summary
    • The y-axis starts at 0.8 to improve readability.
    • Modern MariaDB (13.0) is faster than old MariaDB (10.2) in 1 of 8 tests and within 2% in 6 tests
    Results: writes

    Summary
    • The y-axis starts at 0.8 to improve readability.
    • Modern MariaDB (13.0) is about 10% slower than old MariaDB (10.2) in 5 of 10 tests and the largest regressions arrive in 11.4.

    Postgres 19 beta1 vs sysbench on a small server

    This has results from sysbench on a small server with Postgres 19 beta1, 18.4 and 17.10. Sysbench is run with low concurrency (1 thread) and...