Tuesday, September 30, 2014

Sync replication in MySQL 5.X

The MySQL 5.7.5 beta is here and looks great. Group replication is in a labs preview and provides synchronous replication. We will soon have 2 choices for sync replication in the MySQL family (this and Galera). Descriptions of sync replication tend to focus on the details and I prefer to understand behavior at a high level before going into the details. Sync replication has a wonderful property -- there is no need for failover because every replica is a master. However that comes at a cost. A commit that is too slow is another form of downtime. Putting more replicas closer together is a workaround (reduces network round trip time) if you don't mind the HW cost. Lossless semi-sync replication is the alternative where you are willing to deal with failover, want a 2-safe binlog but don't want extra database servers. I am happy to see both solutions for the MySQL community.

My standard questions are:

  1. When all commits originate from the same replica, does commit processing (getting consensus) require 1 or more network round trips between replicas? AFAIK there are Paxos variations for which 1 round trip is sufficient.
  2. When all commits don't originate from the same replica, does commit processing require 2 or more than 2 network round trips? Is the expected usage to originate commits at the same replica or to let them use any replica? Because originate-at-any usually has a performance overhead of 1 extra network round trip and this matters when replicas are far away.
  3. What occurs between 2 commits to the same key from different transactions? Assuming 1 network round trip for commit (see #1 above) then my guess is fsync on replica that originates commit, send message to other replicas, fsync on other replicas, send reply to originator. So the limit on commits per second to one key is 1 / (network-round-trip + 2 * fsync).
  4. What occurs between 2 commits to different keys from different transactions? Is there grouping or pipelining so that the commit rate here is better than for #3?
  5. Does "commit" mean that the change is accepted by or applied to a majority of replicas? If it only means the change has been accepted, then I can read from a majority of replicas after commit and not see my change.
  6. What is the workaround for a key with a lot of logical contention? Is there any support for pessimistic locking? Should I route/originate all such transactions from the same replica?

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...