In a perfect world we could always afford the network latency and/or extra hardware cost from synchronous replication. But production would be much less fun in a perfect world. Even for systems that have implemented sync replication I am skeptical that the system behaves as advertised until it has aged in production for a few years
Replication can also be lossy in practice because of bugs. MySQL replication slaves were not crash-safe for too long because the replication state was maintained in a file that wasn't kept consistent with InnoDB on crash recovery. This was first fixed in the Google patch for MySQL via rpl_transaction_enabled. It was fixed many years later in upstream MySQL.
Reputations from bugs and missing features can linger long after the bugs have been fixed and the features have been implemented. MySQL suffered from this because of MyISAM but some of the damage was self inflicted. Compare the claims about integrity for atomic operations & MyISAM in an older version of the manual with the current version.
In transactional terms, MyISAM tables effectively always operate in autocommit = 1 mode. Atomic operations often offer comparable integrity with higher performance.
MongoDB had a MyISAM moment with the initial version of its storage engine that was not crash safe. Fortunately they rallied and added journalling. I think they are having another MyISAM moment with replication. The TokuMX team has an excellent series of posts and a great white paper describing performance and correctness problems that exist in MongoDB and have been fixed in TokuMX. The problems include slow elections and unexpected data loss on master failover.
The post by Aphyr is also worth reading. While there read the posts on other systems. A lot of innovative correctness testing & debugging was done to produce those results.
Will these problems be fixed before the reputation sticks? This is a good reason to consider TokuMX.
No comments:
Post a Comment