My old NUC cluster found a new home and I downsized to 2 new NUC servers. The new server is NUC8i7beh with 16g RAM, 500g Samsung 860 EVO for the OS and 500g Samsung 970 EVO for performance. The Samsung 860 is SATA and the Samsung 970 is an m.2 device. I expect to wear out the performance devices as I have done that in the past. With the OS on a separate device I avoid the need to reinstall the OS when that happens.
The new NUC has a post-Skylake CPU (i7-8559u), provides 4 cores (8 HW threads) compared to 2 cores (4 HW threads) in the old NUCs. I disabled turbo boost again to avoid performance variance as mentioned in the old post. I am not sure these have sufficient cooling for sustained boost and when boost isn't sustained there are frequent changed in CPU performance. I also disabled hyperthreads out of concern for both the impact from Spectre fixes and to avoid a different syscall overhead each time I update the kernel.
I might use these servers to examine the impact of the ~10x increase in PAUSE times on InnoDB with and without HT enabled. I might also use them for another round of MySQL performance testing when 8.0.14 is release.
I am a big fan of Intel NUC servers. But maybe I am not a fan of the SATA cables they use. I already had one of my old NUCs replaced under warranty after one of the SATA wires was bare. In the new NUCs I just setup a few of the SATA cables appear to be cut and I wonder if that eventually becomes bare.
Monday, December 17, 2018
Friday, December 14, 2018
LSM math - size of search space for LSM tree configuration
I have written before and will write again about using 3-tuples to explain the shape of an LSM tree. This makes it easier to explain the configurations supported today and configurations we might want to support tomorrow in addition to traditional tiered and leveled compaction. The summary is that n LSM tree has N levels labeled from L1 to Ln and Lmax is another name for Ln. There is one 3-tuple per level and the components of the 3-tuple are (type, fanout, runs) for Lk (level k) where:
- type is Tiered or Leveled and explains compaction into that level
- fanout is the size of a sorted run in Lk relative to a sorted run from Lk-1, a real and >= 1
- runs is the number of sorted runs in that level, an integer and >= 1
Given the above how many valid configurations exist for an LSM tree? There are additional constraints that can be imposed on the 3-tuple but I will ignore most of them except for limiting fanout and runs to be <= 20. The answer is easy - there are an infinite number of configurations because fanout is a real.
The question is more interesting when fanout is limited to an integer and the number of levels is limited to between 1 and 10. I am doing this to explain the size of the search space but I don't think that fanout should be limited to an integer.
There are approximately 2^11 configurations only considering compaction type, which has 2 values, and 1 to 10 levels because there are 2^N configurations of compaction types for a tree with N levels and the sum of 2^1 + 2^2 + ... + 2^9 + 2^10 = 2^11 - 1
But when type, fanout and runs are considered then there are 2 x 20 x 20 = 800 choices per level and 800^N combinations for an LSM tree with N levels. Considering LSM trees with 1 to 10 levels then the number of valid configurations is the sum 800^1 + 800^2 + ... + 800^9 + 800^10. That is a large number of configurations if exhaustive search were to be used to find the best configuration. Note that I don't think exhaustive search should be used.
Thursday, December 13, 2018
LSM math - how many levels minimizes write amplification?
How do you configure an LSM tree with leveled compaction to minimize write amplification? For a given number of levels write-amp is minimal when the same fanout (growth factor) is used between all levels, but that does not explain the number of levels to use. In this post I answer that question.
I don't recall reading this result elsewhere, but I am happy to update this post with a link to such a result. I was encouraged to answer this after a discussion with the RocksDB team and thank Siying Dong for stating #2 above while leaving the math to me. I assume the original LSM paper didn't address this problem because that system used a fixed number of levels.
One result from the original LSM paper and updated by me is that write-amp is minimized when the per-level growth factor is constant. Sometimes I use fanout or per-level fanout rather than per-level growth factor. In RocksDB the option name is max_bytes_for_level_multiplier. Yes, this can be confusing. The default fanout in RocksDB is 10.
Math
I solve this for pure-leveled compaction which differs from what RocksDB calls leveled. In pure-leveled all levels used leveled compaction. In RocksDB leveled the first level, L0, uses tiered and the other levels used leveled. I started to explain this here where I claim that RocksDB leveled is really tiered+leveled. But I am not asking for them to change the name.
Assumptions:
Specify function for write-amp and determine critical points
# wa is the total write-amp
# n is the number of levels
# per-level fanout is the nth root of the total fanout
# per-level fanout = per-level write-amp
# therefore wa = number of levels * per-level fanout
wa = n * t^(1/n)
# given the function for write-amp as wa = a * b
# ... then below is a' * b + a * b'
a = n, b = t^(1/n)
wa' = t^(1/n) + n * ln(t) * t^(1/n) * (-1) * (1/n^2)
# which simplifies to
wa' = t^(1/n) - (1/n) * ln(t) * t^(1/n)
# critical point for this occurs when wa' = 0
t^(1/n) - (1/n) * ln(t) * t^(1/n) = 0
t^(1/n) = (1/n) * ln(t) * t^(1/n)
1 = (1/n) * ln(t)
n = ln(t)
When t = 1024 then n = ln(1024) ~= 6.93. In this case write-amp is minimized when 7 levels are used although 6 isn't a bad choice.
Assuming the cost function is convex (see below) the critical point is the minimum for write-amp. However, n must be an integer so the number of levels that minimizes write-amp is one of: ceil(ln(t)) or floor(ln(t)).
The graph for wa when t=1024 can be viewed thanks to Desmos. The function looks convex and I show below that it is.
Determine whether critical point is a min or max
The critical point found above is a minimum for wa if wa is convex so we must show that the second derivative is positive.
wa = n * t ^ (1/n)
wa' = t^(1/n) - (1/n) * ln(t) * t^(1/n)
wa' = t^(1/n) * (1 - (1/n) * ln(t))
# assuming wa' is a * b then wa'' is a' * b + a * b'
a = t^(1/n)
a' = ln(t) * t^(1/n) * -1 * (1/n^2)
a' = - ln(t) * t^(1/n) * (1/n^2)
b = 1 - (1/n) * ln(t)
b' = (1/n^2) * ln(t)
# a' * b
- ln(t) * t^(1/n) * (1/n^2) --> called x below
+ ln(t) * ln(t) * (1/n^3) * t^(1/n) --> called y below
# b' * a
t^(1/n) * (1/n^2) * ln(t) --> called z below
# therefore wa'' = x + y + z
# note that x, y and z all contain: t^(1/n), 1/n and ln(t)
wa'' = t^(1/n) * (1/n) * ln(t) * (-(1/n) + (ln(t) * 1/n^2) + (1/n))
wa'' = t^(1/n) * (1/n) * ln(t) * ( ln(t) * 1/n^2 )'
wa'' = t^(1/n) * 1/n^3 * ln(t)^2
Therefore wa'' is positive, wa is convex and the critical point is a minimum value for wa
Solve for per-level fanout
The next step is to determine the value of the per-level fanout when write-amp is minimized. If the number of levels doesn't have to be an integer then this occurs when ln(t) levels are used and below I show that the per-level fanout is e in that case. When the number of levels is limited to an integer then the per-level fanout that minimizes write-amp is a value that is close to e.
# total write-amp is number of levels * per-level fanout
wa = n * t^(1/n)
# The per-level fanout is t^(1/n) and wa is minimized when n = ln(t)
# Therefore we show that t^(1/n) = e when n = ln(t)
Assume t^(1 / ln(t)) = e
ln (t^(1 / ln(t))) = ln e
(1 / ln(t)) * ln(t) = 1
1=1
When the t=1024 then ln(t) ~= 6.93. With 7 levels the per-level fanout is t^(1/7) ~= 2.69 while e ~= 2.72.
- The number of levels that minimizes write-amp is one of ceil(ln(T)) or floor(ln(T)) where T is the total fanout -- sizeof(database) / sizeof(memtable)
- When #1 is done then the per-level fanout is e when the number of levels is ln(t) and a value close to e when the number of levels is an integer.
Introduction
I don't recall reading this result elsewhere, but I am happy to update this post with a link to such a result. I was encouraged to answer this after a discussion with the RocksDB team and thank Siying Dong for stating #2 above while leaving the math to me. I assume the original LSM paper didn't address this problem because that system used a fixed number of levels.
One result from the original LSM paper and updated by me is that write-amp is minimized when the per-level growth factor is constant. Sometimes I use fanout or per-level fanout rather than per-level growth factor. In RocksDB the option name is max_bytes_for_level_multiplier. Yes, this can be confusing. The default fanout in RocksDB is 10.
Math
I solve this for pure-leveled compaction which differs from what RocksDB calls leveled. In pure-leveled all levels used leveled compaction. In RocksDB leveled the first level, L0, uses tiered and the other levels used leveled. I started to explain this here where I claim that RocksDB leveled is really tiered+leveled. But I am not asking for them to change the name.
Assumptions:
- LSM tree uses pure-leveled compaction and compaction from memtable flushes into the first level of the LSM tree uses leveled compaction
- total fanout is T and is size(Lmax) / size(memtable) where Lmax is the max level of the LSM tree
- workload is update-only so the number of keys in the database is fixed
- workload has no write skew and all keys are equally likely to be updated
- per-level write-amp == per-level growth factor. In practice and in theory the per-level write-amp tends to be less than the per-level growth factor.
- total write-amp is the sum of per-level write-amp. I ignore write-amp from the WAL.
Specify function for write-amp and determine critical points
# wa is the total write-amp
# n is the number of levels
# per-level fanout is the nth root of the total fanout
# per-level fanout = per-level write-amp
# therefore wa = number of levels * per-level fanout
wa = n * t^(1/n)
# given the function for write-amp as wa = a * b
# ... then below is a' * b + a * b'
a = n, b = t^(1/n)
wa' = t^(1/n) + n * ln(t) * t^(1/n) * (-1) * (1/n^2)
# which simplifies to
wa' = t^(1/n) - (1/n) * ln(t) * t^(1/n)
# critical point for this occurs when wa' = 0
t^(1/n) - (1/n) * ln(t) * t^(1/n) = 0
t^(1/n) = (1/n) * ln(t) * t^(1/n)
1 = (1/n) * ln(t)
n = ln(t)
When t = 1024 then n = ln(1024) ~= 6.93. In this case write-amp is minimized when 7 levels are used although 6 isn't a bad choice.
Assuming the cost function is convex (see below) the critical point is the minimum for write-amp. However, n must be an integer so the number of levels that minimizes write-amp is one of: ceil(ln(t)) or floor(ln(t)).
The graph for wa when t=1024 can be viewed thanks to Desmos. The function looks convex and I show below that it is.
Determine whether critical point is a min or max
The critical point found above is a minimum for wa if wa is convex so we must show that the second derivative is positive.
wa = n * t ^ (1/n)
wa' = t^(1/n) - (1/n) * ln(t) * t^(1/n)
wa' = t^(1/n) * (1 - (1/n) * ln(t))
# assuming wa' is a * b then wa'' is a' * b + a * b'
a = t^(1/n)
a' = ln(t) * t^(1/n) * -1 * (1/n^2)
a' = - ln(t) * t^(1/n) * (1/n^2)
b = 1 - (1/n) * ln(t)
b' = (1/n^2) * ln(t)
# a' * b
- ln(t) * t^(1/n) * (1/n^2) --> called x below
+ ln(t) * ln(t) * (1/n^3) * t^(1/n) --> called y below
# b' * a
t^(1/n) * (1/n^2) * ln(t) --> called z below
# therefore wa'' = x + y + z
# note that x, y and z all contain: t^(1/n), 1/n and ln(t)
wa'' = t^(1/n) * (1/n) * ln(t) * (-(1/n) + (ln(t) * 1/n^2) + (1/n))
wa'' = t^(1/n) * (1/n) * ln(t) * ( ln(t) * 1/n^2 )'
wa'' = t^(1/n) * 1/n^3 * ln(t)^2
Therefore wa'' is positive, wa is convex and the critical point is a minimum value for wa
Solve for per-level fanout
The next step is to determine the value of the per-level fanout when write-amp is minimized. If the number of levels doesn't have to be an integer then this occurs when ln(t) levels are used and below I show that the per-level fanout is e in that case. When the number of levels is limited to an integer then the per-level fanout that minimizes write-amp is a value that is close to e.
# total write-amp is number of levels * per-level fanout
wa = n * t^(1/n)
# The per-level fanout is t^(1/n) and wa is minimized when n = ln(t)
# Therefore we show that t^(1/n) = e when n = ln(t)
Assume t^(1 / ln(t)) = e
ln (t^(1 / ln(t))) = ln e
(1 / ln(t)) * ln(t) = 1
1=1
When the t=1024 then ln(t) ~= 6.93. With 7 levels the per-level fanout is t^(1/7) ~= 2.69 while e ~= 2.72.
Saturday, December 1, 2018
Pixelbook review
This has nothing to do with databases. This is a review of a Pixelbook (Chromebook laptop) that I got on sale last month. This one has a core i5, 8gb RAM and 128gb storage. It runs Linux too but I haven't done much with that. I expected a lot from this given that my 2013 Nexus 7 tablet is still awesome. I have been mostly happy with the laptop but if you care about keyboards and don't like the new Macs thanks to the butterfly keyboard then this might not be the laptop for you. My 3 complaints:
- keyboard is hard to read. It is grey on grey and too hard to read when there is light on my back even with the backlight (backlit?) turned all the way up. I don't get it -- grey on grey. So this is a great device for using in a dark room or for improving your touch typing skills.
- touchpad control is too coarse grained so it is either too fast or too slow. The settings has 5 values via a slider (1=slowest, 5=fastest). I have been using it at 3 which is a bit too fast for me while 2 is a bit too slow. I might go back to 2 but that means picking up my finger more frequently when moving a pointer across the screen.
- no iMessage - my family uses Apple devices and I can't run that here as I can on a Mac laptop
- the "" ey is flay --> the "k" key is flaky -> spacebar is flaky. Keys go bad for a few days, then get better, repeat. Ugh, his is one-off Google hardware. Maybe they don't want Apple and the butterfly keyboard to have all the fun. Fortunately I bought from an authorized reseller (Best Buy) so the 1 year warranty should apply.
- Charger failed, fortunately that is easy to replace.
Subscribe to:
Posts (Atom)
RocksDB on a big server: LRU vs hyperclock, v2
This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...
-
This provides additional results for Postgres versions 11 through 16 vs Sysbench on a medium server. My previous post is here . The goal is ...
-
I often use HWE kernels with Ubuntu and currently use Ubuntu 22.04. Until recently that meant I ran Linux 6.2 but after a recent update I am...
-
I am trying out a dedicated server from Hetzner for my performance work. I am trying the ax162-s that has 48 cores (96 vCPU), 128G of RAM a...