Tuesday, December 10, 2019

Historical - SemiSync replication

This post was shared on code.google.com many years ago but code.google has been shutdown. It describes work done by my team at Google. I am interested in the history of technology and with some spare time have been enable to republish it.

Semisync was useful to but misunderstood. Lossless semisync was awesome but perhaps arrived too late as Group Replication has a brighter future. I like Lossless semisync because it provides similar durability guarantees to GR without the overhead of running extra instances locally. Not running extra instances locally for GR means that commit will be slow courtesy of the speed of light. I hope that GR adds support for log-only voters (witnesses).

Regular semisync was misunderstood because people thought it provided extra durability. It didn't do that. It rate limited busy writers to reduce replication lag. It also limited a connection to at most one transaction that wasn't on at least one slave to reduce the amount of data that can be lost when a primary disappears.

Wei Li implemented semisync during his amazing year of work on replication. Then it was improved to lossless semisync by Zhou Zhenzing (see the first and second post and feature request) and work done upstream. Lossless semisync was widely deployed at FB courtesy of Yoshinori Matsunobu.

Introduction

Heikki Tuuri had the idea and perhaps a PoC but there wasn't much demand for it beyond me. Solid will offer their version of this later in 2007. We couldn't wait and implemented it.

The MySQL replication protocol is asynchronous. The master does not know when or whether a slave gets replication events. It is also efficient. A slave requests all replication events from an offset in a file. The master pushes events to the slave when they are ready.

Usage

We have extended the replication protocol to be semi-synchronous on demand. It is on demand because each slave registers as async or semi-sync. When semi-sync is enabled on the master, it blocks return from commit until either at least one semi-sync slave acknowledges receipt of all replication events for the transaction or until a configurable timeout expires.

Semi-synchronous replication is disabled when the timeout expires. It is automatically reenabled when slaves catch up on replication.

Configuration

The following parameters control this:
  • rpl_semi_sync_enabled configures a master to use semi-sync replication
  • rpl_semi_sync_slave_enabled configures a slave to use semi-sync replication. The IO thread must be restarted for this to take effect
  • rpl_semi_sync_timeout is the timeout in milliseconds for the master

Monitoring

The following variables are exported from SHOW STATUS:
  • Rpl_semi_sync_clients - number of semi-sync replication slaves
  • Rpl_semi_sync_status - whether semi-sync is currently ON/OFF
  • Rpl_semi_sync_slave_status - TBD
  • Rpl_semi_sync_yes_tx - how many transaction got semi-sync reply
  • Rpl_semi_sync_no_tx - how many transaction do not get semi-sync reply
  • Rpl_semi_sync_no_times - TBD
  • Rpl_semi_sync_timefunc_failures - how many gettimeofday() function fails
  • Rpl_semi_sync_wait_sessions - how many sessions are waiting for replies
  • Rpl_semi_sync_wait_pos_backtraverse - how many time we move waiting position back
  • Rpl_semi_sync_net_avg_wait_time(us) - the average network waiting time per tx
  • Rpl_semI_sync_net_wait_time - total time in us waiting for ACKs
  • Rpl_semi_sync_net_waits - how many times the replication thread waits on the network
  • Rpl_semi_sync_tx_avg_wait_time(us) - the average transaction waiting time
  • Rpl_semi_sync_tx_wait_time - TBD
  • Rpl_semi_sync_tx_waits - how many times transactions wait
  • Rpl_semi_sync_timefunc_failures - #times gettimeofday calls fail

Design Overview

Semi-sync replication blocks any COMMIT until at least one replica has acknowledged receipt of the replication events for the transaction. This ensures that at least one replica has all transactions from the master. The protocol blocks return from commit. That is, it blocks after commit is complete in InnoDB and before commit returns to the user.

This option must be enabled on a master and slaves that are close to the master. Only slaves that have this feature enabled participate in the protocol. Otherwise, slaves use the standard replication protocol.

Deployment

Semi-sync replication can be enabled/disabled on a master or slave without shutting down the database.

Semi-sync replication is enabled on demand. If there are no semi-sync replicas or they are all behind in replication, semi-sync replication will be disabled after the first transaction wait timeout. When the semi-sync replicas catch up, transaction commits will wait again if the feature is not disabled.

Implementation

The design doc is here.

Each replication event sent to a semi-sync slave has two extra bytes at the start that indicate whether the event requires acknowledgement. The bytes are stripped by the slave IO thread and the rest of the event is processed as normal. When acknowledgement is requested, the slave IO thread responds using the existing connection to the master. Acknowledgement is requested for events that indicate the end of a transaction, such as commit or an insert with autocommit enabled.


No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...