This is a short writeup of two ideas I might never have the time to implement -- self-aware DBMS and RID-lists for LSM tree.
Tuning a B-Tree index structure is hard and tuning an LSM tree is usually harder. Fortunately it will get easier over time. We tune a DBMS today via knobs and there is clever work in progress to automate changing of those knobs. But we can do even better. I hope one day for self-aware DBMS algorithms that understand the impact of their knobs, accept declarative goals & priorities from a user and are able to improve performance with respect to an objective function. Lets move the cleverness into the DBMS.
RID-lists for LSM trees
tl;dr - can I use the merge operator to implement RID-lists for an LSM?
I worked on bitmap indexes at Oracle. It would be fair to call them compressed RID-lists. Many smart people worked on that code in Oracle prior to me and the inventor, Gennady Antoshenkov, has a strong record of innovation. IBM DB2 has RID-lists, but I don't know whether they can be compressed. All of these are useful for analytics whether this is a bitmap index, compressed RID-list or uncompressed RID-list.
Postgres 13 has index deduplication which looks like an uncompressed RID-list and I look forward to new query execution features to do the equivalent of bitmap-AND and bitmap-OR.
But this is about a RID-list for an LSM tree. A RID-list is useful for an index that can have many duplicates for a key -- meaning some secondary indexes and not for a primary index. The LSMs I care about don't know much about schema and secondary index maintenance requires knowledge of schema but I will ignore that for now. MyRocks uses RocksDB for primary and secondary indexes, so this is something that can be solved by an application. WiredTiger supports schemas and secondary indexes without also providing a query language and someone could add the same to RocksDB.
I am curious whether the merge operator can be used to implement RID-lists? Disclaimer - I am far from an expert on the merge operator.
- the rowid is the value of the primary key. Secondary index entries for an LSM and for Oracle IOT use the primary key value as the rowid because the base row can be relocated (during compaction with an LSM, during B-Tree leaf node splits and merges with IOT). Bitmap indexes for Oracle IOT solves this via a mapping table. If that is not done then the problem is that the primary key value might be large and a list of them might not be compression friendly.
- the delta that is written by the merge operator specifies whether a rowid is added or removed for a given secondary key value. I wonder if something like SingleDelete is needed.
- the callback function that combines merge operator entries during compaction will merge entries to create larger RID-lists