Per interval results
The first change is that the benchmark client displays per-second performance results including IPS, QPS and max response time for inserts and queries. This makes it easier to understand stalls.
This is output from a load and the legend for the columns is:
- i_sec - number of seconds for this interval
- t_sec - cumulative number of seconds
- i_ips, i_qps - average insert and query rate for that interval
- t_ips, t_qps - average insert and query rate from test start until now
- max_i, max_q - max insert and query response time for that interval, in microseconds
- t_ins, t_query - total inserts and queries from test start until now
i_sec t_sec i_ips t_ips i_qps t_qps max_i max_q t_ins t_query
1.0 1.0 61833 61833 0 0 2919 0 61900 0
1.0 2.0 63625 62729 0 0 1983 0 125600 0
1.0 3.0 63829 63095 0 0 2080 0 189500 0
Test steps
The second change is in how I run the benchmark. As described in the overview, it used to be run as: create tables/collections with secondary indexes, load X million rows/documents, do a full scan of each secondary index, do read+write tests with rate-limited writers.
I have changed it since then. The new pattern is:
- create tables/collections without secondary indexes
- load data (l.i0)
- create secondary indexes (l.x)
- load more data (l.i1)
- read+write tests (q100.1, q100.2, q200.1, q200.2, ...)
The read+write tests use rate-limited writers and the tests are run for varying limits. The test starts with 100 insert/s per client (q100) and then 200, 400, 600, 800 and 1000. In some cases the DBMS is unable to sustain the target insert rates. Also, for each limit the test is run twice so the results might use the names q100.1, q100.2, q200.1, q200.2, ..., q1000.1, q1000.2.
Hi, Mark, recently we are trying to use the benchmark to test TiDB, I found it can be used for TiDB, thank you for your effort.
ReplyDeleteAnd there are also some thing I want to know more, such as is there are best practice for this benchmark. And how do you use the results? Did it used for bottleneck?
My focus is on relative results more than absolute. So I run the test:
Delete1) to compare impact of my.cnf settings for one version of a DBMS
2) to look for regressions between different versions of a DBMS
3) to compare different DBMS
For #3 that can mean either benchmarketing (for marketing X is faster than Y means that X is better than Y) but I don't have a product to promote and I am just looking for things that can be improved and to learn more about how a DBMS behaves.
The insert-heavy phases of the benchmark are a great way to stress parts of a DBMS, and that has helped me find many perf problems.
I usually per-operation HW efficiency metrics like CPU/operation, storage reads, writes, read KB, write KB /operation. Those make it easier to see why one test result is better or worse than another. The metrics are measured iostat and vmstat, so I take the rates from those tools and divide them by the rate of transactions, operations, queries.
DeleteI hope to replace my usage of the insert benchmark with something less synthetic. TSBS is interesting, except it doesn't have queries concurrent with writes.
DeleteFor now I will try to make sure the code works with pypy to reduce the client CPU overhead.