Tuesday, April 7, 2015

Compared to what?

It is common to share big numbers for web-scale database deployments. I do it frequently in presentations. I am not alone in this practice. Is is easy to get large values for QPS with web-scale OLTP (small data). The same is true for database size and rows read rates with web-scale data warehousing (big data).

I hope that So What? is the first reaction when these big numbers are shared. Big numbers only mean that a lot of hardware has been used. Context is what makes these big numbers more or less interesting. Note that I am not saying this to take away from the work done by my peers. I have been fortunate to work with extremely talented teams.

Compared to what? is another useful response. When considering stock mutual funds we look at the performance of the fund relative to a benchmark such as S&P 500. When considering database performance it helps to understand whether an alternative product would have done better. We usually don't have an answer for this because it can be too expensive to do the comparison, but is still something to keep in mind.

We aren't in the business of growing QPS, database size and rows read rates. We are in the business of answering questions with efficiency and quality of service. The goals include increasing availability, reducing response time, reducing response time variation and doing more work with less HW. Details about these goals are less likely to be shared -- for business reasons and sometimes because the data isn't collected -- so the context required to appreciate the big numbers might always be missing.

1 comment:

  1. This is try #3 at typing this comment... (sigh)

    Big numbers don't always mean more hardware. Shard-Query can run a query on a partitioned table on a single machine many times faster than a single threaded query. Of course, you can add more hardware, but the efficiency is not reduced when you do. So you have to look at the numbers in context to see what they mean. big numbers absent context are useless.

    ReplyDelete

Evaluating vector indexes in MariaDB and pgvector: part 2

This post has results from the ann-benchmarks with the   fashion-mnist-784-euclidean  dataset for MariaDB and Postgres (pgvector) with conc...