Tuesday, October 14, 2025

Is it time for TPC-BLOB?

If you want to store vectors in your database then what you store as a row, KV pair or document is likely to be larger than the fixed-page size (when your DBMS uses fixed-page sizes) and you will soon care about efficient and performant support for large objects. I assume this support hasn't been the top priority for many DBMS implementations and there will be some performance bugs.

In a SQL DBMS, support for large objects will use the plumbing created to handle LOB (Large OBject) datatypes. We should define what the L in LOB means here and I will wave my hands and claim larger than a fixed-page in your favorite DBMS but smaller than 512kb because I limit my focus to online workloads.

Perhaps now is the time for industry standard benchmarks for workloads with large objects. Should it be TPC-LOB or TPC-BLOB?

Most popular DBMS use fixed-size pages whether that storage is index-organized via an update-in-place b-tree (InnoDB) or heap-organized (Postgres, Oracle). For rows that are larger than the page size, which is usually between 4kb and 16kb, the entire row or largest columns will be stored out of line and likely split across several pages in the out of line storage. When the row is read, additional reads will be done to gather all of the too-large parts from the out of line locations.

This approach is far from optimal as there will be more CPU overhead, more random IO and might be more wasted space. But this was good enough because support for LOBs wasn't a priority for these DBMS as their focus was on OLTP where rows were likely to be smaller than a fixed-size page.

Perhaps by luck, perhaps it was fate, but WiredTiger is a great fit for MongoDB because it is more flexible about page sizes. And it is more flexible because it isn't an update-in-place b-tree, instead it is a copy-on-write random (CoW-R) b-tree that doesn't need or use out-of-line storage, although for extra large documents there might be a benefit from out-of-line.

MyRocks, and other LSM-based DBMS, also don't require out-of-line storage but they can benefit from it as shown by WiscKey and other engines that do key-value separation. Even the mighty RocksDB has an implementation of key-value separation via BlobDB.

No comments:

Post a Comment

Is it time for TPC-BLOB?

If you want to store vectors in your database then what you store as a row, KV pair or document is likely to be larger than the fixed-page s...