Monday, January 21, 2019

Optimal configurations for an LSM and more

I have been trying to solve the problem of finding an optimal LSM configuration for a given workload. The real problem is larger than that, which is to find the right index structure and the right configuration for a given workload. But my focus is RocksDB so I will start by solving for an LSM.

This link is to slides that summarizes my effort. I have expressed the problem to be solved using differentiable functions to express the cost that is to be minimized. The cost functions have a mix of real and integer valued parameters for which values must be determine to minimize the cost. I have yet to solve the functions, but I am making progress and learning more math. This might be a constrained optimization problem and Lagrange Multipliers might be useful. The slides are from a talk I am about to present at the MongoDB office in Sydney where several WiredTiger developers are based. I appreciate that Henrik Ingo set this up.

My work has things in common with the excellent work by Harvard DASlab lead by Stratos Idreos. I have years of production experience on my side, they have many smart and ambitious people on their side. There will be progress. I look forward to more results from their Data Calculator effort. And I have learned a lot from the Monkey and Dostoevsky papers by Niv Dayan et al.

2 comments:

  1. Hello sir ,
    Question related to RocksDB
    Can i know what kind of datasture of data structure RocksDB is using in its L0 level(that is in memory)
    And what kind of data structure it is using in disk area ??
    It will be very helpful for me
    Thanks in advance ��

    ReplyDelete
    Replies
    1. The memtable is guaranteed to be in memory and uses a skip list. The L0 is not guaranteed to be in memory, but should be there for performance reasons. The L0, L1, L2, etc all use SSTables.

      Delete

Evaluating vector indexes in MariaDB and pgvector: part 2

This post has results from the ann-benchmarks with the   fashion-mnist-784-euclidean  dataset for MariaDB and Postgres (pgvector) with conc...