Small Datum: Transaction Processing in NewSQL

Monday, October 1, 2018

Transaction Processing in NewSQL

This is a list of references for transaction processing in NewSQL systems. The work is exciting. I don't have much to add and wrote this to avoid losing interesting links. My focus is on OLTP, but some of these systems support more than that.

By NewSQL I mean the following. I am not trying to define "NewSQL" for the world:

Support for multiple nodes because the storage/compute on one node isn't sufficient.
Support for SQL with ACID transactions. If there are shards then cross-shard operations can be consistent and isolated.
Replication does not prevent properties listed above when you are wiling to pay the price in commit overhead. Alas synchronous geo-replication is slow and too-slow commit is another form of downtime. I hope NewSQL systems make this less of a problem (async geo-replication for some or all commits, commutative operations). Contention and conflict are common in OLTP and it is important to understand the minimal time between commits to a single row or the max number of commits/second to a single row.

NewSQL Systems

MySQL Cluster - this was NewSQL before NewSQL was a thing. There is a nice book that explains the internals. There is a company that uses it to make HDFS better. Cluster seems to be more popular for uses other than web-scale workloads.
VoltDB - another early NewSQL system that is still getting better. It was after MySQL Cluster but years before Spanner and came out of the H-Store research effort.
Spanner - XA across-shards, Paxos across replicas, special hardware to reduce clock drift between nodes. Sounds amazing, but this is Google so it just works. See the papers that explain the system and support for SQL. This got the NewSQL movement going.
CockroachDB - the answer to implementing Spanner without GPS and atomic clocks. From that URL they explain it as "while Spanner always waits after writes, CockroachDB sometimes waits before reads". It uses RocksDB and they help make it better.
FaunaDB - FaunaDB is inspired by Calvin and Daniel Abadi explains the difference between it and Spanner -- here and here. Abadi is great at explaining distributed systems, see his work on PACELC (and the pdf). A key part of Calvin is that "Calvin uses preprocessing to order transactions. All transactions are inserted into a distributed, replicated log before being processed." This approach might limit the peak TPS on a large cluster, but I assume that doesn't matter for a large fraction of the market.
YugaByte - another user of RocksDB. There is much discussion about it in the recent Abadi post. Their docs are amazing -- slides, transaction IO path, single-shard write IO path, distributed ACID and single-row ACID.
TiDB - I don't know much about it but they are growing fast and are part of the MySQL community. It uses RocksDB (I shouldn't have forgotten that).

Other relevant systems

FoundationDB - I am curious where this goes given the competition explained above.
Aurora - not NewSQL yet because this doesn't scale across nodes. It does support large nodes and that might be sufficient for a large part of the market. But Amazon moves fast (see the new parallel query feature) so I wouldn't be surprised if this became NewSQL one day. I appreciate that they have begun to explain the internals -- here and here.
MongoDB - not SQL, but starting to get interesting with the new features for read and write concerns. There is also new support for causal consistency and retryable writes.
Clustrix - a NewSQL system that is now part of MariaDB. Maybe this becomes open source.
Kudu - awesome paper, interesting research on HybridTime, useful docs on the internals.
Vitess - was created to scale MySQL for Youtube. Now is part of CNCF, backed by a startup and used by many companies. Cross-shard writes are atomic, but isolation is weaker.
Splice Machine - SQL on HBase. Summary is "100% ACID via snapshot isolation with optimistic concurrency via write-write conflicts" and details are here. Has integration to use Spark for OLAP, so this is HTAP.

Small Datum

Monday, October 1, 2018

Transaction Processing in NewSQL

No comments:

Post a Comment

Postgres 18 beta2: large server, Insert Benchmark, part 2