Monday, July 13, 2020

Updates for the insert benchmark

I continue to run and improve the insert benchmark. This is an update to the overview. Details on how I share performance results is here
Per interval results

The first change is that the benchmark client displays per-second performance results including IPS, QPS and max response time for inserts and queries. This makes it easier to understand stalls.

This is output from a load and the legend for the columns is:
  • i_sec - number of seconds for this interval
  • t_sec - cumulative number of seconds
  • i_ips, i_qps - average insert and query rate for that interval
  • t_ips, t_qps - average insert and query rate from test start until now
  • max_i, max_q - max insert and query response time for that interval, in microseconds
  • t_ins, t_query - total inserts and queries from test start until now

i_sec   t_sec   i_ips   t_ips   i_qps   t_qps   max_i   max_q   t_ins   t_query
1.0     1.0     61833   61833   0       0       2919    0       61900   0
1.0     2.0     63625   62729   0       0       1983    0       125600  0
1.0     3.0     63829   63095   0       0       2080    0       189500  0

Test steps

The second change is in how I run the benchmark. As described in the overview, it used to be run as: create tables/collections with secondary indexes, load X million rows/documents, do a full scan of each secondary index, do read+write tests with rate-limited writers.

I have changed it since then. The new pattern is:
  • create tables/collections without secondary indexes
  • load data (l.i0)
  • create secondary indexes (l.x)
  • load more data (l.i1)
  • read+write tests (q100.1, q100.2, q200.1, q200.2, ...)
The read+write tests use rate-limited writers and the tests are run for varying limits. The test starts with 100 insert/s per client (q100) and then 200, 400, 600, 800 and 1000. In some cases the DBMS is unable to sustain the target insert rates. Also, for each limit the test is run twice so the results might use the names q100.1, q100.2, q200.1, q200.2, ..., q1000.1, q1000.2.


  1. Hi, Mark, recently we are trying to use the benchmark to test TiDB, I found it can be used for TiDB, thank you for your effort.
    And there are also some thing I want to know more, such as is there are best practice for this benchmark. And how do you use the results? Did it used for bottleneck?

    1. My focus is on relative results more than absolute. So I run the test:
      1) to compare impact of my.cnf settings for one version of a DBMS
      2) to look for regressions between different versions of a DBMS
      3) to compare different DBMS

      For #3 that can mean either benchmarketing (for marketing X is faster than Y means that X is better than Y) but I don't have a product to promote and I am just looking for things that can be improved and to learn more about how a DBMS behaves.

      The insert-heavy phases of the benchmark are a great way to stress parts of a DBMS, and that has helped me find many perf problems.

    2. I usually per-operation HW efficiency metrics like CPU/operation, storage reads, writes, read KB, write KB /operation. Those make it easier to see why one test result is better or worse than another. The metrics are measured iostat and vmstat, so I take the rates from those tools and divide them by the rate of transactions, operations, queries.

    3. I hope to replace my usage of the insert benchmark with something less synthetic. TSBS is interesting, except it doesn't have queries concurrent with writes.

      For now I will try to make sure the code works with pypy to reduce the client CPU overhead.