Wednesday, October 1, 2025

Measuring scaleup for MariaDB with sysbench

This post has results to measure scaleup for MariaDB 11.8.3 on a 48-core server.

tl;dr

  • Scaleup is better for range queries than for point queries
  • For tests where results were less than great, the problem appears to be mutex contention within InnoDB

Builds, Configuration & Hardware

The server has an AMD EPYC 9454P 48-Core Processor with AMD SMT disabled, 128G of RAM and SW RAID 0 with 2 NVMe devices. The OS is Ubuntu 22.04.

I compiled MariaDB 11.8.3 from source and the my.cnf file is here.

Benchmark

I used sysbench and my usage is explained here. To save time I only run 32 of the 42 microbenchmarks 
and most test only 1 type of SQL statement. Benchmarks are run with the database cached by MariaDB. Each microbenchmark is run for 300 seconds.

The benchmark is run with 1, 2, 4, 8, 12, 16, 20, 24, 32, 40 and 48 clients. The purpose is to determine how well MariaDB scales up.

Results

The microbenchmarks are split into 4 groups -- 1 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. 

I still use relative QPS here, but in a different way. The relative QPS here is:
(QPS at X clients) / (QPS at 1 client)

The goal is to determine scaleup efficiency for MariaDB. When the relative QPS at X clients is a value near X, then things are great. But sometimes things aren't great and the relative QPS is much less than X. One issue is data contention for some of the write-heavy microbenchmarks. Another issue is mutex and rw-lock contention.

Perf debugging via vmstat and iostat

I use normalized results from vmstat and iostat to help explain why things aren't as fast as expected. By normalized I mean I divide the average values from vmstat and iostat by QPS to see things like how much CPU is used per query or how many context switches occur per write. And note that a high context switch rate is often a sign of mutex contention.

Charts: point queries

The spreadsheet with all of the results is here.

For point queries

  • tests for which the relative QPS at 48 clients is greater than 40
    • point-query
  • tests for which the relative QPS at 48 clients is between 30 and 40
    • none
  • tests for which the relative QPS at 48 clients is between 20 and 30
    • hot-points, points-covered-si, random-points_range=10
  • tests for which the relative QPS at 48 clients is between 10 and 20
    • points-covered-pk, points-notcovered-pk, points-notcovered-si, random-points_range=100
  • tests for which the relative QPS at 48 clients is less than 10
    • random-points_range=1000
For 5 of the 9 point query tests, QPS stops improving beyond 16 clients. And I assume that mutex contention is the problem.

Results for the random-points_range=Z tests are interesting. They use oltp_inlist_select.lua which does a SELECT with a large IN-list where the IN-list entries can find rows by exact match on the PK. The value of Z is the number of entries in the IN-list. And here MariaDB scales worse with a larger Z (1000) than with a smaller Z (10 or 100), which means that the thing that limits scaleup is more likely in InnoDB than the parser or optimizer.

From the normalized vmstat metrics (see here) for 1 client and 48 clients the number of context switches per query (the cs/o column) grows a lot more from 1 to 48 clients for random-points_range=1000 than for random-points_range=10. The ratio (cs/o at 48 clients / cs/o at 1 client) is 1.46 for random-points_range=10 and then increases to 19.96 for random-points_range=1000. The problem appears to be mutex contention.

Charts: range queries without aggregation

The spreadsheet with all of the results is here.

For range queries without aggregation:

  • tests for which the relative QPS at 48 clients is greater than 40
    • range-covered-pk, range-covered-si, range-notcovered-pk
  • tests for which the relative QPS at 48 clients is between 30 and 40
    • scan
  • tests for which the relative QPS at 48 clients is between 20 and 30
    • none
  • tests for which the relative QPS at 48 clients is between 10 and 20
    • none
  • tests for which the relative QPS at 48 clients is less than 10
    • range-notcovered-si
Only one test has less than great results for scaleup -- range-notcovered-si. QPS for it stops growing beyond 12 clients. The root cause appears to be mutex contention based on the large value for cs/o in the normalized vmstat metrics (see here). For all of the range-*covered-* tests, has the most InnoDB activity per query -- the query isn't covering so it must do PK index access per index entry it finds in the secondary index.

Charts: range queries with aggregation

The spreadsheet with all of the results is here.

For range queries with aggregation:

  • tests for which the relative QPS at 48 clients is greater than 40
    • read-only-distinct, read-only-order, read-only-range=Y, read-only-sum
  • tests for which the relative QPS at 48 clients is between 30 and 40
    • read-only-count, read-only-simple
  • tests for which the relative QPS at 48 clients is between 20 and 30
    • none
  • tests for which the relative QPS at 48 clients is between 10 and 20
    • none
  • tests for which the relative QPS at 48 clients is less than 10
    • none
Results here are excellent, and better than the results above for range queries without aggregation. The difference might mean that there is less concurrent activity within InnoDB because aggregation code is run after each row is fetched from InnoDB.

Charts: writes

The spreadsheet with all of the results is here.

For writes:

  • tests for which the relative QPS at 48 clients is greater than 40
    • none
  • tests for which the relative QPS at 48 clients is between 30 and 40
    • read-write_range=Y
  • tests for which the relative QPS at 48 clients is between 20 and 30
    • update-index, write-only
  • tests for which the relative QPS at 48 clients is between 10 and 20
    • delete, insert, update-inlist, update-nonindex, update-zipf
  • tests for which the relative QPS at 48 clients is less than 10
    • update-one
The best result is for the read-write_range=Y tests which are the classic sysbench transaction that does a mix of writes, point and range queries. 

The worst result is from update-one which suffers from data contention as all updates are to the same row. A poor result is expected here.



No comments:

Post a Comment

Measuring scaleup for MariaDB with sysbench

This post has results to measure scaleup for MariaDB 11.8.3 on a 48-core server. tl;dr Scaleup is better for range queries than for point qu...