Monday, October 20, 2025

Determine how much concurrency to use on a benchmark for small, medium and large servers

What I describe here works for me given my goal, which is to find performance regressions. A benchmark run at low concurrency is used to find regressions from CPU overhead. A benchmark run at high concurrency is used to find regressions from mutex contention. A benchmark run at medium concurrency might help find both.

My informal way for classifying servers by size is:

  • small - has less than 10 cores
  • medium - has between 10 and 20 cores
  • large - has more than 20 cores
How much concurrency?

I almost always co-locate benchmark clients and the DBMS on the same server. This comes at a cost (less CPU and RAM is available for the DBMS) and might have odd artifacts because clients in the real world are usually not co-located. But it has benefits that matter to me. First, I don't worry about variance from changes in network latency. Second, this is much easier to setup.

I try to not oversubscribe the CPU when I run a benchmark. For benchmarks where there are few waits for reads from or writes to storage, then I will limit the number of benchmark users so that the concurrent connection count is less than the number of CPU cores (cores, not VPUs) and I almost always use servers with Intel Hyperthreads and AMD SMT disabled. I do this because DBMS performance suffers when the CPU is oversubscribed and back when I was closer to production we did our best to avoid that state.

Even for benchmarks that have some benchmark steps where the workload will have IO waits, I will still limit the amount of concurrency unless all benchmark steps that I measure will have IO waits.

Assuming a benchmark is composed of a sequence of steps (at minimum: load, query) then I consider the number of concurrent connections per benchmark user. For sysbench, the number of concurrent connections is the same as the number of users, although sysbench uses the --threads argument to set the number of users. I am just getting started with TPROC-C via HammerDB and that appears to be like sysbench with one concurrent connection per virtual user (VU).

For the Insert Benchmark the number of concurrent connections is 2X the number of users on the l.i1 and l.i2 steps and then 3X the number of users on the range-query read-write steps (qr*) and the point-query read-write steps (qp*). And whether or not there are IO-waits for these users is complicated, so I tend to configure the benchmark so that the number of users is no more than half the number of CPU cores.

Finally, I usually set the benchmark concurrency level to be less than the number of CPU cores because I want to leave some cores for the DBMS to do the important background work, which is mostly MVCC garbage collection -- MyRocks compaction, InnoDB purge and dirty page writeback, Postgres vacuum.

No comments:

Post a Comment

Determine how much concurrency to use on a benchmark for small, medium and large servers

What I describe here works for me given my goal, which is to find performance regressions. A benchmark run at low concurrency is used to fin...