While the Learned Indexes paper has ideas for hash maps and bloom filters most research has focused on a B-Tree. I expect learned indexes to work better with an LSM because its search structures (block indexes) are created off line with respect to user operations. There is more time for analysis and the per-SST search structures are write once so there is no need to deal with updates.
Other search optimizations
One of the goals for learned tree indexes is to avoid log2(N) comparisons on a search so I will mention a few things in InnoDB and RocksDB that help with that:
- The adaptive hash index (AHI) in InnoDB is a non-persistent hash table built on demand that maps a primary key value to a block in the buffer pool. With a clustered index like InnoDB (and MyRocks) non-covering secondary index queries must probe the PK index per row to get the missing columns. The AHI can avoid that CPU overhead.
- RocksDB uses bloom filters to avoid some of the search overhead per level of the LSM tree. They don't avoid all of the per-level overhead as a search must be done to find the SST before the bloom filter can be checked.
- RocksDB optionally uses a hash index per data block to avoid binary search within the data block. I assume that learned index approaches can augment or replace some of the data block index, the per block hash index and the bloom filter.