Sunday, July 16, 2023

Keeping up with the SQL DBMS market

Sometimes it is easier to talk about tech when you can name things. There is much innovation in progress in the SQL DBMS market. The same is true for NoSQL, but I focus on SQL.

My current attempt to classify SQL DBMS products is:

  • TradSQL
    • traditional SQL DBMS solutions that arrived long before cloud native was a thing. They can be used as the nodes in ShardSQL. They don't provide ACID across shards, although brave people use XA to get atomic writes across shards. Examples are Oracle, Postgres and MySQL.
  • ShardSQL
    • Run many TradSQL DBMS in one cluster, provide some way to figure out where data exists and you have ShardSQL. This might involve but doesn't require a proxy or middleware. Examples include roll your own, Vitess and CitusDB. These have been popular with web-scale companies and I supported roll your own deployments at Google and Facebook. These provide limited support for cross-shard semantics -- perhaps XA for atomic writes, it will be interesting to see what happens with HLC in MySQL otherwise there isn't support for consistent cross-shard reads. Even Oracle has a sharding product, but I don't know much about it.
  • NewSQL (DisaggSQL)
    • The NewSQL name might have been claimed by others and systems with that name didn't end well. I hope to reclaim that name. If NewSQL doesn't work out then the other name is DisaggSQL. By NewSQL I mean a SQL DBMS that is unsharded and cloud-native. The goal is to provide better characteristics, such as throughput, performance and HA, while also supporting much larger databases than are typically supported by TradSQL courtesy of cloud-native storage. Examples include Aurora from AWS, AlloyDB from Google and Neon. A NewSQL DBMS offloads many things that are usually not offloaded by TradSQL. One benefit from offloading is to make more compute and memory available for query processing. Update - I think that DisaggSQL is a better name.
  • DistSQL
    • These provide ACID across shards. If you want to do ACID with a PB-scale database then DistSQL is the answer. There is a cost to DistSQL in more latency and over time we will get a better understanding of that cost. Regardless, this is a big step forward for academia and industry. If you like fancy algorithms, then you will love DistSQL. Clustrix is an early example, for me, but Spanner made the world aware. And now we have TiDB, Yugabyte, CockroachDB, YDB and more. While MongoDB isn't a SQL DBMS (yes, it has some support for SQL) it is definitely a great example of ACID across shards.

No comments:

Post a Comment

Evaluating vector indexes in MariaDB and pgvector: part 2

This post has results from the ann-benchmarks with the   fashion-mnist-784-euclidean  dataset for MariaDB and Postgres (pgvector) with conc...