tag:blogger.com,1999:blog-9149523927864751087.post7422752188597307977..comments2022-10-03T14:43:38.708-07:00Comments on Small Datum: LSM math - how many levels minimizes write amplification?Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-9149523927864751087.post-84060163866360048572019-01-11T11:54:02.958-08:002019-01-11T11:54:02.958-08:00I appreciate the feedback.
The goal is to explain...I appreciate the feedback.<br /><br />The goal is to explain the number of levels that minimizes write-amp for an LSM that implements pure-leveled. I am not suggesting that doing this is the best configuration -- that depends on many things.<br /><br />I distinguish between pure-leveled and RocksDB-leveled. RocksDB-leveled has an L0 with many sorted runs, while in pure-leveled the L0 is not allowed as each level only has one sorted run. So while I agree that writing new sorted runs to the L0 and never compacting minimizes write-amp, that can only happen with RocksDB-leveled but not with pure-leveled.<br />-- see http://smalldatum.blogspot.com/2018/12/lsm-math-how-many-levels-minimizes.html<br /><br />I explained that using the same per-level growth factor for all levels is required. Explaining that isn't a new result, but I extended it.<br />-- see http://smalldatum.blogspot.com/2018/12/lsm-math-how-many-levels-minimizes.htmlMark Callaghanhttps://www.blogger.com/profile/09590445221922043181noreply@blogger.comtag:blogger.com,1999:blog-9149523927864751087.post-4871008914799566802019-01-11T11:33:47.308-08:002019-01-11T11:33:47.308-08:00This seems to be a very interesting problem, and t...This seems to be a very interesting problem, and there were several posts to address it. However, the problem definition seems to be not clear to me and I would appreciate further clarifications.<br /><br />If the goal is to compute the number of levels to minimize the total write amp (without any other constraints), then the number of levels should be set to infinite. That is, there should not be any merge at all. In this case, write amp is definitely minimized (actually 0 ignoring flush cost). However, the number of levels is directly related to space amplification/query performance, and in my mind it should be treated as a constraint instead of an optimization goal.<br /><br />The original LSM-tree solves the problem that how the size ratio should be set given a fixed number of components (i.e., levels, which are treated as constraint) to minimize the write amplification. When designing an LSM-tree, the number of levels (in my mind) should be set based on the requirement of query performance/space amplification, and based on that we can optimize the size ratio to minimize the total write cost by letting the largest level slightly larger than the database size.Anonymoushttps://www.blogger.com/profile/16044559053868911156noreply@blogger.com