Small Datum: Measuring scaleup for MariaDB with sysbench

This post has results to measure scaleup for MariaDB 11.8.3 on a 48-core server.

tl;dr

Scaleup is better for range queries than for point queries
For tests where results were less than great, the problem appears to be mutex contention within InnoDB

Builds, Configuration & Hardware

The server has an AMD EPYC 9454P 48-Core Processor with AMD SMT disabled, 128G of RAM and SW RAID 0 with 2 NVMe devices. The OS is Ubuntu 22.04.

I compiled MariaDB 11.8.3 from source and the my.cnf file is here.

Benchmark

I used sysbench and my usage is explained here. To save time I only run 32 of the 42 microbenchmarks

and most test only 1 type of SQL statement. Benchmarks are run with the database cached by MariaDB. Each microbenchmark is run for 300 seconds.

The benchmark is run with 1, 2, 4, 8, 12, 16, 20, 24, 32, 40 and 48 clients. The purpose is to determine how well MariaDB scales up. All tests use 8 tables with 10M rows per table.

Results

The microbenchmarks are split into 4 groups -- 1 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation.

I still use relative QPS here, but in a different way. The relative QPS here is:

(QPS at X clients) / (QPS at 1 client)

The goal is to determine scaleup efficiency for MariaDB. When the relative QPS at X clients is a value near X, then things are great. But sometimes things aren't great and the relative QPS is much less than X. One issue is data contention for some of the write-heavy microbenchmarks. Another issue is mutex and rw-lock contention.

Perf debugging via vmstat and iostat

I use normalized results from vmstat and iostat to help explain why things aren't as fast as expected. By normalized I mean I divide the average values from vmstat and iostat by QPS to see things like how much CPU is used per query or how many context switches occur per write. And note that a high context switch rate is often a sign of mutex contention.

Charts: point queries

The spreadsheet with all of the results is here.

For point queries

tests for which the relative QPS at 48 clients is greater than 40

point-query

tests for which the relative QPS at 48 clients is between 30 and 40

none

tests for which the relative QPS at 48 clients is between 20 and 30

hot-points, points-covered-si, random-points_range=10

tests for which the relative QPS at 48 clients is between 10 and 20

points-covered-pk, points-notcovered-pk, points-notcovered-si, random-points_range=100

tests for which the relative QPS at 48 clients is less than 10

random-points_range=1000

For 5 of the 9 point query tests, QPS stops improving beyond 16 clients. And I assume that mutex contention is the problem.

Results for the random-points_range=Z tests are interesting. They use oltp_inlist_select.lua which does a SELECT with a large IN-list where the IN-list entries can find rows by exact match on the PK. The value of Z is the number of entries in the IN-list. And here MariaDB scales worse with a larger Z (1000) than with a smaller Z (10 or 100), which means that the thing that limits scaleup is more likely in InnoDB than the parser or optimizer.

From the normalized vmstat metrics (see here) for 1 client and 48 clients the number of context switches per query (the cs/o column) grows a lot more from 1 to 48 clients for random-points_range=1000 than for random-points_range=10. The ratio (cs/o at 48 clients / cs/o at 1 client) is 1.46 for random-points_range=10 and then increases to 19.96 for random-points_range=1000. The problem appears to be mutex contention.

Charts: range queries without aggregation

The spreadsheet with all of the results is here.

For range queries without aggregation:

tests for which the relative QPS at 48 clients is greater than 40

range-covered-pk, range-covered-si, range-notcovered-pk

tests for which the relative QPS at 48 clients is between 30 and 40

scan

tests for which the relative QPS at 48 clients is between 20 and 30

none

tests for which the relative QPS at 48 clients is between 10 and 20

none

tests for which the relative QPS at 48 clients is less than 10

range-notcovered-si

Only one test has less than great results for scaleup -- range-notcovered-si. QPS for it stops growing beyond 12 clients. The root cause appears to be mutex contention based on the large value for cs/o in the normalized vmstat metrics (see here). For all of the range-*covered-* tests, has the most InnoDB activity per query -- the query isn't covering so it must do PK index access per index entry it finds in the secondary index.

Charts: range queries with aggregation

The spreadsheet with all of the results is here.

For range queries with aggregation:

tests for which the relative QPS at 48 clients is greater than 40

read-only-distinct, read-only-order, read-only-range=Y, read-only-sum

tests for which the relative QPS at 48 clients is between 30 and 40

read-only-count, read-only-simple

tests for which the relative QPS at 48 clients is between 20 and 30

none

tests for which the relative QPS at 48 clients is between 10 and 20

none

tests for which the relative QPS at 48 clients is less than 10

none

Results here are excellent, and better than the results above for range queries without aggregation. The difference might mean that there is less concurrent activity within InnoDB because aggregation code is run after each row is fetched from InnoDB.

Charts: writes

The spreadsheet with all of the results is here.

For writes:

tests for which the relative QPS at 48 clients is greater than 40

none

tests for which the relative QPS at 48 clients is between 30 and 40

read-write_range=Y

tests for which the relative QPS at 48 clients is between 20 and 30

update-index, write-only

tests for which the relative QPS at 48 clients is between 10 and 20

delete, insert, update-inlist, update-nonindex, update-zipf

tests for which the relative QPS at 48 clients is less than 10

update-one

The best result is for the read-write_range=Y tests which are the classic sysbench transaction that does a mix of writes, point and range queries.

The worst result is from update-one which suffers from data contention as all updates are to the same row. A poor result is expected here.

Small Datum

Wednesday, October 1, 2025

Measuring scaleup for MariaDB with sysbench

No comments:

Post a Comment

Determine how much concurrency to use on a benchmark for small, medium and large servers