Tuesday, October 29, 2019

Review of "5 Minute rule - thirty years later"

The 5 Minute rule paper has been updated and it is worth reading. The paper explains how to make cost-based decisions for tiered storage, applies that using price and performance of devices over the past 30 years and then makes predictions based on current trends.

There is a prediction in the paper that will provoke discussion --  the price vs performance advantage for disk vs tape is getting smaller and soon we might need Hadoop for tape. That will be an interesting project.

5 minute rule math

The 5 minute rule math explains what data should be in a cache by considering the price of the cache and the price/performance of the storage device. The math has been used where cache+storage are DRAM+disk, DRAM+SSD and SSD+disk. Now it has be updated for NVM.

The equation is presented as two ratios: technology and economic. Below I explain how to derive it.

Technology ratio            Economic ratio

Pages/MBofCache             Price/StorageDevice
-----------------------  *  -------------------
OpsPerSec/StorageDevice     Price/MBofCache

If you reduce the equation above you end up with Pages/OpsPerSec. The unit for this is seconds and this is inverse of the access rate. When Pages/OpsPerSec = X then the cache should be large enough to fit pages accessed more frequently than every X seconds (1/X accesses per second). The formula above doesn't tell you to buy X MB of cache. It does tell you to buy enough cache to fit all pages accessed more frequently then once per X seconds.

Now I derive the formula as the presentation above can confuse me. The cost of a cache page is:
Price/MBofCache * MBofCache/Pages

The cost of 1 OpsPerSec from a storage device is:
Price/StorageDevice * StorageDevice/OpsPerSec

Each cache page has a cost and a benefit. The benefit is the reduction in OpsPerSec demand for cached data. Assume there are K pages of cache and the access rate for each page is A. Then this cache reduces the OpsPerSec demand by K*A. From this we can determine the value of A for which the cache cost equals the benefit:

K * Price/MBofCache * MBofCache/Pages =
    K * A * Price/StorageDevice * StorageDevice/OpsPerSec

# And then solve for A, K cancels, move ratios to LHS


Price/MBofCache       MBofCache/Pages
------------------- * ----------------------- = A   
Price/StorageDevice   StorageDevice/OpsPerSec

# LHS can be reduced to 1/Pages / 1/OpsPerSec
# which is OpsPerSec / Pages while '5 minute rule'
# above solves for Pages / OpsPerSec, so invert LHS
# to get 1/A

Price/StorageDevice   StorageDevice/OpsPerSec
------------------- * ----------------------- = 1/A
Price/MBofCache       MBofCache/Pages

# Now reorder LHS

StorageDevice/OpsPerSec   Price/StorageDevice   
----------------------- * ------------------- = 1/A
MBofCache/Pages           Price/MBofCache

# Given that X/Y = 1/Y / 1/X when X, Y != 0 then
# first term in LHS can be rewritten to get

Pages/MBofCache           Price/StorageDevice
----------------------- * ------------------- = 1/A
OpsPerSec/StorageDevice   Price/MBofCache

# A is an access rate, OpsPerSec / Pages, and 1/A is
# Pages/OpsPerSec. Thus we have derived the formula.

Technology

The numerator for the technology ratio changes slowly. Assuming file system pages are cached, the page size is now a multiple of 4kb (ignoring compression). There is one case for smaller pages -- NVM on the memory bus might allow for the return of 512b pages. The denominator for the technology ratio is interesting. It has barely changed for disk but changed a lot SSD.

For the economic ratio the cost of DRAM, disk, SSD and NVM is not changing at the same rate. Thus advice changes depending on the technology. In 2007 the advice for DRAM+SSD was to cache objects accessed every 15 minutes and in 2017 it is every 7 minutes so less DRAM is needed. The advice for DRAM+disk is the opposite. DRAM cost dropped more than disk OpsPerSec improved so more DRAM cache is needed.

Cost

The paper has some discussion on cost. The cost can be the retail price from a device vendor or the all-in cost from a public cloud vendor. The retail price can be misleading as that isn't TCO. You need to consider power and more. In either case (retail, public cloud) capacities come in discrete sizes and you might not be able to deploy the optimal values.

Finally, the 5 minute rule doesn't solve for all constraints, nor should it. There  are limits to power, space and throughput that must be considered. Motherboards are limited by memory slots, PCIe slots and throughput to storage. Power might not be unlimited. Constrained optimization is an interesting topic, but not for this post.

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...