Wednesday, January 13, 2021

T is for Tuning amplification

First there was the RUM Conjecture - an index structure can't be optimal for all of read, write & space efficiency. It became the CRUM Conjecture when I added a C for cache amplification because some algorithms require more data in cache per indexed row if they are to be performant and DRAM is costly (see key-value separation).

Can I add another letter to it and rename it the CRUMT conjecture where T is for Tuning amplification? By this I mean the cost (in HW and my time) from repeating tests after changing config options. I spend much time repeating tests and the common reasons are 1) I made a mistake and 2) I want to change a config option. While my mistakes are hard to avoid, it would be nice to spend less time tweaking options and self-tuning index structures will be great for production.

There are efforts in place (thanks OtterTune) to apply ML to the problem. But I hope for an additional effort to move the cleverness into the DBMS. Don't treat this as a black box. Figure out cost models, add objective functions and SLAs and then build things that are better at self-tuning.

No comments:

Post a Comment

Evaluating vector indexes in MariaDB and pgvector: part 2

This post has results from the ann-benchmarks with the   fashion-mnist-784-euclidean  dataset for MariaDB and Postgres (pgvector) with conc...