Small Datum: Default configuration benchmarks

Tuesday, August 7, 2018

Default configuration benchmarks

Default configuration benchmarks are an interesting problem. Most storage engines require some configuration tuning to get good performance and efficiency. We configure an engine to do the right thing for the expected workload and hardware. Unfortunately the configuration is done in the language of the engine (innodb_write_io_threads, rocksdb_default_cf_options) which requires a significant amount of time to understand.

Hardware comes in many sizes and engines frequently don't have code to figure out the size -- how many CPUs, how much RAM, how many GB of storage, how many IOPs from storage. Even when that code exists the engine might not be able to use everything it finds:

HW can be shared and the engine is only allowed a fraction of it.
It might be running on a VM that gets more CPU when other VMs on the host are idle.
SSDs get slower when more full. It can take a long time to reach that state.

Minimal configuration

I assume there is a market for storage engines that have better performance with the default configuration, but it will take time to get there. A step in the right direction is to enhance engines to get great performance and efficiency with minimal configuration (minimal != default). I am still figuring out what minimal means. I prefer to use the language of the engine user (HW capacity and performance/efficiency goals) rather than the language of the engine. I'd rather not set engine-specific options, even easy to understand ones like innodb_buffer_pool_size. I want the engine to figure out its configuration given the minimal tuning. For now I have two levels for minimal:

HW-only - tell the engine how much HW it can use -- number of CPU cores, GB of RAM, storage capacity and IOPs. Optionally you can ask it to use all that it finds.
HW + goals - in addition to HW-only this supports goals for read, write, space and cache amplification. For now I will be vague about the goals.

Things change

Another part of the configuration challenge is that database workloads change while configurations tend to be static. I prefer that the engine does the right thing, while respecting the advice provided via minimal configuration. I want the engine to adapt to the current workload without ruining performance for the future workload. Adapting by deferring index maintenance can make loads faster, but might hurt the queries that follow.

Types of change include:

The working set no longer fits in memory and the workload shifts from CPU to IO bound.
Daily maintenance (vacuum, reorg, defrag, DDL, reporting) runs during off-peak hours.
Web-scale workloads have daily peak cycles as people wake and sleep.
New features get popular, old features get deprecated. Their tables and indexes arrive, grow large, become read-only, get dropped and more. Some deprecated features get un-deprecated.
Access patterns to data changes. Rows might be write once, N times or forever and write once/N rows eventually become read-only. Rows might be read never, once, a few-times or forever.
Different types of data (see previous point) can live within the same index. Even if you were willing to tune per-index (some of us are) this isn't sufficient when there is workload diversity within an index.

Real workloads include the types of change listed above but benchmarks rarely include them. Any benchmark that includes such change is likely to need more than 24-hours to run which will limit its popularity -- but maybe that isn't a bad thing. I hope we see a few new benchmarks that include such types of change. I might even try to write one.