Wednesday, October 13, 2021

Compatible with MySQL or Postgres?

Open and closed source scale-out DBMS that are compatible with MySQL and Postgres have emerged on the market. This is great for the community but there will be much confusion about the meaning of compatible

This post has yet to have anything on the cloud vendors in China. They are doing impressive work but I don't know enough about it. I am happy to update this post when I learn more.

But first, the MySQL and Postgres teams.

One way to describe compatibility is via three levels: protocol, syntax and semantics. By upstream I mean MySQL and PostgreSQL.

  • protocol - an app can use existing client libraries to authenticate and connect 
  • syntax - the DBMS will parse SQL that upstream parses. It is not guaranteed to provide semantics that matches upstream and some syntax can be (or might be) parsed but ignored. Syntax compatible implies a best effort to match upstream semantics but that isn't guaranteed nor must it be guaranteed to be useful. 
  • semantics - the DBMS will match upstream semantics. This implies syntax compatible.
  • QA compatible - the DBMS will match upstream WRT passing and failing the QA tests provided with upstream. This is slightly stronger than semantics compatible.
  • version - to which version is this (somewhat) compatible? This is easier to explain when the compatible thing is forked from the real thing. When it is just list the release from which it has merged changes.
Note that syntax and semantics compatibility aren't all or nothing. A syntax compatible DBMS can be useful without supporting (parsing) 100% of the upstream syntax. A semantics compatible DBMS can be useful without supporting or matching behavior for 100% of upstream syntax.

Also note that semantics compatible implies syntax compatible. But protocol compatible implies neither.

Elsewhere in DBMS land

While this post is about MySQL and PostgreSQL, compatibility is growing in popularity elsewhere:
  • MariaDB provides an Oracle compatible mode that provides syntax but not protocol or semantics compatibility. 
  • EnterpriseDB provides an Oracle compatibility product that I don't know much about. 
  • Amazon will soon open source Babelfish that is protocol compatible with SQL Server.
  • Amazon DocumentDB is protocol, syntax and semantics compatible with (an older version of) MongoDB. It supports some (much) of the MongoDB 4.0 API as of October, 2021 per Wikipedia. Public statements suggest this was built on top of, or reusing, the Aurora PostgreSQL code.
Updates
  • added TimescaleDB to Team Postgres
  • added Redshift to Team Postgres
  • added SingleStore to Team MySQL
  • added ClickHouse to Team MySQL
  • added CrateDB to Team Postgres
  • added Dolt to Team MySQL
  • added Team MongoDB
  • added YellowBrick and Greenplum to Team Postgres
  • added Materialize to Team Postgres
  • Building a DBMS to be compatible with MySQL costs more than for Postgres. The Team Postgres projects can reuse BSD licensed PG code while the Team MySQL projects would have to respect the GPL.
  • I have a vague memory of this but to be JDBC compliant the MySQL JDBC driver does a few queries at connect time (either via tables or session/global variables). My JDBC-related bugs are here.

Friday, October 1, 2021

The other way to compress InnoDB: outsource it

There are at least three ways to do compression for InnoDB - classic, holepunch and outsource. 

The classic approach (table compression) was used and enhanced by the FB MySQL team. It might not have been widely used elsewhere. While it works, even for read/write workloads, the implementation is complicated and it isn't clear that it has a bright future.

The holepunch approach (page compression) is much simpler than the classic approach. Alas, I am skeptical that a filesystem will be happy tracking the metadata from doing a holepunch for every (or most) pages written. I am also skeptical that unlink() response times of seconds to minutes (because of the holepunch usage) will be good for a production DBMS. I wrote a few posts about my experience with the holepunch approach: here, here, here and here.

The outsource approach is the most simple from the perspective of InnoDB - let the filesystem or storage do the compression for you. In this case InnoDB continues to do filesystem reads and writes as if pages have a fixed size and the FS/storage compresses prior to writing to storage, decompresses after reading form storage and does all of the magic to make this work. While there are log-structured filesystems in OSS that might make this possible, such filesystems aren't widely used relative to XFS and the EXT family. There is also at least one storage device on the market that supports this.

tl;dr - the outsource approach is useful when the data is sufficiently compressible and the cost of this approach (more write-amp) is greatly reduced when it provides atomic page writes.

After publishing I noticed this interesting Twitter thread on support for atomic writes.

Performance model

I have a simple performance model to understand when the outsource approach will work. As always, the performance model makes assumptions that can be incorrect. Regardless the model is a good start when comparing the space and write amplification with the outsource approach relative to uncompressed InnoDB.

Assumptions:

  • The log-structured filesystem has 1+ log segments open for writing. Compressed (variable length) pages are written to the end of an open segment. Such a write can create garbage - the previous version of the page stored elsewhere in a log segment. Garbage collection (GC) copies live data from previously written log segments into open log segments to reclaim space from old log segments that have too much garbage. 
  • The garbage in log segments is a source of space amplification. Garbage collection is a source of write amplification.
  • g - represents the average percentage of garbage (free space) in a previously written log segment. 
  • r - represents the compression rate. With r=0.5 then a 16kb page is compressed to 8kb.
  • Write and space amplification are functions of g:
    • write-amp = 100 / g
    • space-amp = 100 / (100 - g)
Risks in the assumptions:
  • Assumes that g is constant across log segments. A better perf model would allow for variance.
  • Assumes that r is constant across pages. A better perf model might allow for variance.
  • Estimates of write and space amplification might be more complicated than the formula above.
Results

Now I estimate the space-amp and write-amp for outsource relative to uncompressed InnoDB. The ratios are (value for outsource / value for uncompressed InnoDB). For space-amp when the ratio is < 1 then outsource uses less space vs uncompressed InnoDB. For write-amp when the ratio is > 1 then outsource writes more to storage vs uncompressed InnoDB. 

I show below that when the compression rate (r above) is < 0.6 then outsource provides much less space-amp without suffering from a too-large increase in write-amp. But when r is >= 0.6 the increase in write-amp, relative to uncompressed InnoDB, might be a problem.

However, whether a large increase in write-amp is a problem depends on your workload. For example, if 99% of storage IOPs are reads and 1% are writes with uncompressed InnoDB then a write-amp penalty that changes this from 1% writes to 2% writes is unlikely to be a problem.

I created a graph with Desmos to show the three ratios. The graph allows r to be adjusted. I then copied some values from the graph into a table below. The graph has 3 curves, one for each ratio:
  • s is the space-amp ratio where s = r * 100 / (100 - g).
  • w1 is the write-amp ratio assuming the doublewrite buffer is enabled where w1 = r * 100/g.
  • w2 is the write-amp ratio assuming the doublewrite buffer is disabled. This assumes that outsource provides atomic page writes for free. The formula is w2 = r/2 * 100/g.
The table below has values from the graph for r = 0.4, 0.5 and 0.6. What I see in the graph is that with r in (0.4, 0.5) and the doublewrite buffer disabled (w2) it is possible to get much of the compression benefit (see s) from outsource without a significant increase in write-amp. But the write-amp penalty can be a problem when r >= 0.6. Of course, whether or not more write-amp is an issue depends on the storage read and write rates as I explained above. 

r=0.4
g       s       w1      w2
20      0.50    2       1
30      0.57    1.33    0.67
40      0.67    1       0.5

r=0.5
g       s       w1      w2
20      0.63    2.5     1.25
30      0.71    1.67    0.83
40      0.83    1.25    0.63

r=0.6
g       s       w1      w2
20      0.75    3       1.5
30      0.86    2       1
40      1       1.5     0.75


RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...