Open and closed source scale-out DBMS that are compatible with MySQL and Postgres have emerged on the market. This is great for the community but there will be much confusion about the meaning of compatible.
This post has yet to have anything on the cloud vendors in China. They are doing impressive work but I don't know enough about it. I am happy to update this post when I learn more.
But first, the MySQL and Postgres teams.
- Team MySQL includes PlanetScale (Vitess), PingCAP (TiDB), Amazon (Aurora), MariaDB (Xpand), SingleStore, ClickHouse and Dolt.
- Team Postgres includes CockroachDB, Yugabyte, Google (Spanner), Amazon (Aurora & Redshift), Microsoft (CitusDB), TimescaleDB, CrateDB, YellowBrick, Greenplum and Materialize.
- It wouldn't be fair to exclude MongoDB, so Team MongoDB is MongoDB, Amazon (DocumentDB) and Microsoft (Cosmos DB API for MongoDB). But the rest of the post is about MySQL and PG.
One way to describe compatibility is via three levels: protocol, syntax and semantics. By upstream I mean MySQL and PostgreSQL.
- protocol - an app can use existing client libraries to authenticate and connect
- syntax - the DBMS will parse SQL that upstream parses. It is not guaranteed to provide semantics that matches upstream and some syntax can be (or might be) parsed but ignored. Syntax compatible implies a best effort to match upstream semantics but that isn't guaranteed nor must it be guaranteed to be useful.
- semantics - the DBMS will match upstream semantics. This implies syntax compatible.
- QA compatible - the DBMS will match upstream WRT passing and failing the QA tests provided with upstream. This is slightly stronger than semantics compatible.
- version - to which version is this (somewhat) compatible? This is easier to explain when the compatible thing is forked from the real thing. When it is just list the release from which it has merged changes.
Note that syntax and semantics compatibility aren't all or nothing. A syntax compatible DBMS can be useful without supporting (parsing) 100% of the upstream syntax. A semantics compatible DBMS can be useful without supporting or matching behavior for 100% of upstream syntax.
Also note that semantics compatible implies syntax compatible. But protocol compatible implies neither.
Elsewhere in DBMS land
While this post is about MySQL and PostgreSQL, compatibility is growing in popularity elsewhere:
- MariaDB provides an Oracle compatible mode that provides syntax but not protocol or semantics compatibility.
- EnterpriseDB provides an Oracle compatibility product that I don't know much about.
- Amazon will soon open source Babelfish that is protocol compatible with SQL Server.
- Amazon DocumentDB is protocol, syntax and semantics compatible with (an older version of) MongoDB. It supports some (much) of the MongoDB 4.0 API as of October, 2021 per Wikipedia. Public statements suggest this was built on top of, or reusing, the Aurora PostgreSQL code.
Updates
- added TimescaleDB to Team Postgres
- added Redshift to Team Postgres
- added SingleStore to Team MySQL
- added ClickHouse to Team MySQL
- added CrateDB to Team Postgres
- added Dolt to Team MySQL
- added Team MongoDB
- added YellowBrick and Greenplum to Team Postgres
- added Materialize to Team Postgres
- Building a DBMS to be compatible with MySQL costs more than for Postgres. The Team Postgres projects can reuse BSD licensed PG code while the Team MySQL projects would have to respect the GPL.
- I have a vague memory of this but to be JDBC compliant the MySQL JDBC driver does a few queries at connect time (either via tables or session/global variables). My JDBC-related bugs are here.
I have been describing it as 3 levels as well! The only differences is naming, where I have been using s/semantics/quirks/
ReplyDeleteMeasuring compatibility with protocol and syntax is relatively straight forward, but semantics is difficult. Some behaviors are undefined, others are documented.
One of my favorite examples of a MySQL semantic is that ORDER BY LIMIT 1 and MIN() produce different results. This is documented: https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_min
It a project cares about protocol+syntax only, they might not consider this a bug. But if they care about semantics, it is reasonable to expect an application might depend on this and it should be fixed. For TiDB it was https://github.com/pingcap/tidb/issues/21451
Semantic compatibility will be a big deal for TiDB
Delete"Team MySQL" also has SingleStore (used to be called MemSQL)
ReplyDeleteThank you, added
DeleteThere's another database emerging as a compatibility standard: ClickHouse. Like PostgreSQL we see people beginning to fork it to create their own analytic solutions. Hydrolix and ByteDance have both done this. There will be more!
ReplyDeleteemerging -> has emerged. ClickHouse is already a big deal.
Delete"As the owner of application X, I want to run my application, which was written for MySQL, against your product, which is not MySQL."
ReplyDeleteThe outcome of that varies for different values of X. But in the end, this is what counts.
It will be interesting to watch this develop.
DeleteThanks for this blog, Mark. I too have been watching this phenomenon with great interest. There is a 4th type of compatibility that often gets short shrift (most likely because it's not an issue for DBaaS) -- "DBA compatibility". That is, whether and to what extent an -compatible database has the same tunable parameters and DBA-level configuration commands as compared to . I am not aware of any -compatible DBs that claim to be "DBA-compatible".
ReplyDeleteExcellent point -- command line tools, config options, monitoring API, etc
DeleteA good follow-up to this would be a survey of *how* these various DBs have achieved some level of compatibility, and the level of compatibility they support. I'm not volunteering, but would love to someone write an article about this.
ReplyDelete