Thursday, October 24, 2019

Tuning space and write amplification to minimize cost

In a previous post I described how to tune for space and write amplification with an LSM or index+log index structure. In this post I explain how to use that to minimize the storage cost for a workload.

This is an optimization problem and the objective function is to minimize the number of storage devices. The model I use was described in a previous post but I summarize it here:
  • Storage device supplies r units of read IO, w units of write endurance and s units of capacity.
  • Workload demands R units of read IO, W units of write endurance and S units of capacity.
  • max(R/r, W/w, S/s) storage devices are required when amplification is ignored
  • max(R*ar/r, W*aw/w, S*as/s) storage devices are required when amplification is not ignored where ar, aw and as are read, write and space amplification
Here I assume that read IO is never the bottleneck and the goal is to minimize max(W*aw/w, S*as/s). That occurs when W*aw/w = S*as/s. I assume that W, w, S and s are constants so the goal is to tune aw and as to solve the equation. Note that W/w and S/s are the number of devices needed for the workload based on endurance and capacity.

I thought I previously published a blog on this topic but I could not find it.

Index+log

For index+log I assume that aw = 100/(100-pfull) and a= 100/pfull where pfull is the percentage of device capacity available to the user. With that I can determine the value of pfull that solves the equation.

W*aw/w = S*as/s
# reorder LHS and RHS
W/w*aw = S/s*as
# replace aw and as
W/w * 100/(100-pfull) = S/s * 100/pfull
# multiply LHS and RHS by pfull and then (100-pfull)
pfull * W/w * 100 = (100-pfull) * S/s * 100
# divide LHS and RHS by 100
pfull * W/w = (100-pfull) * S/s
# expand RHS
pfull * W/w = 100 * S/s - pfull * S/s
# add pfull * S/s to LHS and RHS
pfull * W/w + pfull * S/s = 100 * S/s
# factor LHS
pfull * (W/w + S/s) = 100 * S/s
# done
pfull = 100 * S/s / (W/w + S/s)

From the solution if I need 10 devices based on endurance (W/w = 10) and 30 devices based on capacity (S/s = 30) then using pfull = 100 * 30 / (10+30) = 75% minimizes the number of devices required. With pfull=75 then aw=4 (100/25), as=1.33 (100/75), W/w*aw=40, S/s*as=40 and the workload needs 40 storage devices. Were I to use pfull=50 then aw=2, as=2, W/w*aw=20, S/s*as=60 and the workload needs 60 devices. Were I to use pfull=80 then aw=5, as=1.25, W/w*aw=50, S/s*as=38 (rounded up) and the workload needs 50 devices. So pfull=75 looks like a good choice.

LSM

Space and write amplification are inversely related for both LSM and index+log but the math is easier for index+log. We can start with the equation to solve, but I won't solve it.

W*aw/w = S*as/s
# reorder LHS and RHS
W/w*aw = S/s*as
# replace aw with 0.8 * fo * L and awith 1 + 1/fo
# where fo is per-level fanout and L is number of levels 
W/w * 0.8 * fo * L = S/s * (1 + 1/fo)
# alas fo (per-level fanout) is a function of L (number of levels) and total fanout
# assume total fanout is t then fo = t ^ 1/L where t is a constant
W/w * 0.8 * t^1/L * L = S/s * (1 + 1/t^1/L)

I stopped at this point because the math isn't easy and L must be an integer >= 1 so an easy way to solve this is to compute the LHS and RHS this for L=1, 2, ..., 10 and choose L that minimizes the difference. For the example above with W/w=10 and S/s=30 then LSM write-amp is always sufficiently larger than space-amp that write-amp determines the number of devices and L=6 or 7 minimizes write-amp and the number of storage devices. Were S/s increased to 200 then L=3 or 4 minimizes the number of storage devices.

No comments:

Post a Comment

Speedb vs RocksDB on a large server

I am happy to read about storage engines that claim to be faster than RocksDB. Sometimes the claims are true and might lead to ideas for mak...