I am good at benchmarks and not good at statistics. I have no doubt that my work lacks statistical rigor -- I am also wary of the X has a flaw, therefore X is useless argument that is overdone on the web, but I don't know whether that applies here.
Update - if you have a suggestions for a paper or book I should read then please share.
The challenge is that I have a finite budget (in my time and HW time) in which to learn something and what I want to learn are performance and efficiency characteristics about a DBMS for a given workload. Things I want to learn include:
- HW consumed (CPU and IO per request, database size on disk, etc)
- Response time at 50th, 99th and other percentiles. Histograms are also great but take more space. Think of response time percentiles as a lossy-compressed histogram (flawed but useful).
- Results at different concurrency levels
- Results for IO-bound and CPU-bound workloads
- Run the benchmark once and collect results from the remaining (B - S - W) units of time
- Run the benchmark N times and collect results from the remaining (B - S - NW) units of time assuming that I can archive/restore the database quickly after the load and index step. Otherwise the remaining time is (B - NS - NW).