I have been trying to solve the problem of finding an optimal LSM configuration for a given workload. The real problem is larger than that, which is to find the right index structure and the right configuration for a given workload. But my focus is RocksDB so I will start by solving for an LSM.
This link is to slides that summarizes my effort. I have expressed the problem to be solved using differentiable functions to express the cost that is to be minimized. The cost functions have a mix of real and integer valued parameters for which values must be determine to minimize the cost. I have yet to solve the functions, but I am making progress and learning more math. This might be a constrained optimization problem and Lagrange Multipliers might be useful. The slides are from a talk I am about to present at the MongoDB office in Sydney where several WiredTiger developers are based. I appreciate that Henrik Ingo set this up.
My work has things in common with the excellent work by Harvard DASlab lead by Stratos Idreos. I have years of production experience on my side, they have many smart and ambitious people on their side. There will be progress. I look forward to more results from their Data Calculator effort. And I have learned a lot from the Monkey and Dostoevsky papers by Niv Dayan et al.