Sunday, February 23, 2014

Use case?

It is time for a new blog. I still work on small data (OLTP) but my focus is not limited to MySQL. Fortunately the product is in great shape thanks to investment by Oracle/MySQL, SkySQL/MariaDB and Percona. The MySQL teams at Facebook are also doing amazing work and some of that will be described at the MySQL UC.

I don't think that semi-sync replication was sufficiently appreciated. Perhaps it was misunderstood but I might be biased because my team at Google did the first implementation of it. Rather than dwell on the past lets focus on the future with enhanced semi-sync. I think this provides the semantics that people hoped to get from semi-sync and will be a huge feature beyond the already big deal that fast & automated failover will become courtesy of global transaction IDs and semi-sync replication.

So we will get faster & automated failover in 2014 and might get a solution that makes the binlog 2-safe without much complexity and with very low latency. But is that enough? I think we need better support for dynamic schemas (see documents in MongoDB) and Postgres might need the same. By this I mean that a table should support the use of columns that are not declared in the schema. And then we need to support indexes on those columns. As a bonus it would be nice to avoid repeating dynamic column names in every row when stored on disk because use really short column names isn't a desirable best practice for long-lived applications with many developers. MariaDB has some support for dynamic columns and the feature continues to get better in MariaDB v10. I don't know whether there are plans for indexing.

The other feature is support for auto-sharding. This is a much simpler problem to solve for a document data model than for a relational model where there are multiple tables with PK-FK constraints.

The standard response when making feature requests is to explain the use case and that is sometimes followed up with an expression of doubt enough customers want this feature. I have memories from my time at BigDBMS company when product managers would repeat that BigDBMS wasn't losing customers to product X or BigDBMS customers weren't asking for features provided by product X. The implication was that new features in product X weren't compelling. This ignored the real problem -- customers who needed the features in product X were never becoming customers of BigDBMS and the new license revenue growth rate might have made that clear.

I think we are in a similar situation with MySQL. Customers who want some support for sharding and documents are using other products. I think we can make MySQL competitive for this market and leverage some of the awesome features that are much better in MySQL than elsewhere like InnoDB, TokuDB, monitoring, support and a lot of expertise.

No comments:

Post a Comment

Evaluating vector indexes in MariaDB and pgvector: part 2

This post has results from the ann-benchmarks with the   fashion-mnist-784-euclidean  dataset for MariaDB and Postgres (pgvector) with conc...