Friday, September 4, 2020

When is it OK to lose a commit?

I am thrilled that OSS DBMS are better at avoiding the loss of commits thanks to replication that can do more than async. But there are good reasons for some workloads to not pay the cost (in latency) from such features.

By losing a commit here I mean the loss of the most recent N commits, not the lost of any commit from the past. By workload I mean one of the applications using the DBMS and there can be a variety of workloads using the DBMS. For some DBMS, all applications must use the same level of commit durability. But that is an implementation artifact, not something fundamental to the algorithms we implement.

A workload for which it is OK to lose the most recent N commits is inserting into a DBMS from a log file. If this has limited concurrency then commit latency can be a problem. If commits are lost then they can be replayed. This assumes that the application 1) can determine when a commit has been lost and 2) figure out where to begin the replay.

To determine when a commit has been lost the application queries DBMS state while connecting and while reconnecting. One reason to reconnect is after a failover and a tradeoff for faster commit is the potential for losing the last N commits on failover.

In the simple case there is one single-threaded application doing inserts from a log file into one DBMS table. It is feasible but not trivial to make sure there really is at most one application running in this case -- because distributed system challenges extend to applications. But I ignore that complexity for now.

The DBMS state is found in either the table to which the inserts are done or a separate state table. If there is a way to find the last inserted row, then that tells you where to restart, assuming you can map the value of that row back to an offset in the source (log). Otherwise, a separate state table can maintain the log offset from which inserts should continue and the state table is updated in the transaction that does bulk inserts from the log.

Update

I just remembered something that I never understood. Ingres Replicator does async replication via 2pc between the primary and a secondary -- see here and here. I never understood why. That is an expensive way to replay a log (from the primary) on a secondary. The alternative is for a secondary to track its state independent of the primary, replay transactions on a secondary (commit locally on the secondary, update state table on the secondary) and some time after commit the state (queue of transactions to be replayed) can be updated (truncated).


1 comment:

  1. Today we have so blazing drives and plenty of cores, that merly thinking about that is a waste of time.

    ReplyDelete

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...