Small Datum: How many storage devices does a workload require?

Monday, October 21, 2019

How many storage devices does a workload require?

I read an interesting paper that was motivated by the inability of existing storage engines to utilize all of the read IO provided by new and fast storage devices. That is excellent motivation but the story has more nuance.

A storage device provides capacity, endurance and reads. For now read means read operations and I ignore read throughput and latency. For a given device and workload none of the dimensions (capacity, endurance, reads) might be saturated. When saturation occurs it is usually limited to one of the dimensions. In that case if you want to reduce the cost of storage then change something to be more efficient in the saturated dimension. When capacity (endurance, read IO) is the bottleneck then reducing space (write, read) amplification is an answer.

It is hard to saturate a storage device in all dimensions so I am cautious when insufficient utilization in any dimension is cited as a problem. Too much saturation leads to lousy quality of service. Besides, workloads rarely have constant workloads -- web-scale varies by time of day.

When endurance or capacity are the bottleneck then it can be OK to not use all of the read IO provided by a new storage device. A simple model for this is:

Storage device supplies r units of read IO, w units of write endurance and s units of capacity.
The workload demands R units of read IO, W units of write endurance and S units of capacity.
max(R/r, W/w, S/s) storage devices are required when amplification is ignored
max(R*a_r/r, W*a_w/w, S*a_s/s) storage devices are required when amplification is not ignored where a_r, a_w and a_s are read, write and space amplification
To reduce the number of storage devices focus on the saturated dimension and tune or change the index structure

An example

For (R=10000, W=1000, S=100, r=10, w=10, s=10, a_r=4, a_w=10, a_s=2) then max(R*a_r/r, W*a_w/w, S*a_s/s) = max(10000*4/10, 1000*10/10, 100*2/10) = max(4000, 1000, 20) = 4000 and read IO is the bottleneck. But change S from 100 to 100000 and this becomes max(4000, 1000, 20000) = 20000 and capacity is the bottleneck.

2 comments:

AnonymousDecember 4, 2019 at 8:39 AM
This is nice, and I guess when acquiring separate storage devices you will get additional R, W and S. However, in a cloud-based system, like dynamoDB, the resources can be added independently in terms or read/write capacity units. I imagine in that scenario there could be more like a budget and SLAs to complete a workload. So do you have any thoughts on how to incorporate SLAs to capacity planning? or should it always be about over-provisioning resources?
ReplyDelete
Replies

Add comment

Monday, October 21, 2019

How many storage devices does a workload require?

2 comments:

Postgres 18 beta2: large server, Insert Benchmark, part 2