Wednesday, September 4, 2019

InnoDB repeatable read in a nutshell

InnoDB repeatable read (RR) is complicated.  This is my short description. It isn't exactly correct.

Summary

Processing of statements within a transaction for InnoDB RR:
  1. Statements that are read-only and x-lock free use snapshot isolation
  2. Other statements use read committed (RC) with gap locks
  3. All statements observe uncommitted changes from the transaction
It would have been simpler to use snapshot isolation as-is but that leads to more write-write conflicts.

Sources of Confusion

  • The repeatable read snapshot is created at the start of the first real statement (SELECT, INSERT, etc) and not by BEGIN or START TRANSACTION. It is created by START TRANSACTION WITH CONSISTENT SNAPSHOT.
  • Some RR transactions use SELECT to find rows to modify followed by UPDATE when they should have used SELECT ... FOR UPDATE. Without the FOR UPDATE a SELECT statement uses the snapshot from transaction start while the UPDATE uses RC and sees all committed changes. So SELECT might observe different rows from the UPDATE that follows.
  • I never tried to learn whether SELECT ... FOR UPDATE and UPDATE use 1) use a snapshot from statement start or 2) see any row that is committed at the time statement processing encounters that row. I assume that it is #2. The manual states "reads the latest available data".
  • The rules for gap locks and next-key locks are non-trivial. I don't claim they are more complicated than they need to be because concurrency is hard and they provide useful semantics by preventing phantom reads and more. While the academic community tends towards stronger isolation the common choices in popular systems are weaker (RC in Postgres and Oracle, RR in InnoDB, ? in SQL Server).
When does InnoDB RR implement RR?

I am not an expert on isolation. Experts have yet to review this post. I hope they do. The ANSI spec requires that RR prevent dirty read, non-repeatable read and phantom reads. Alas, InnoDB can do reads (evaluate the WHERE clause) at different points in time (see above) which can allow for some of non-repeatable reads and phantom reads unless you are careful to use the FOR UPDATE clause.

2 comments:

  1. Mark, your summary is fairly accurate. All locking reads in InnoDB are essentially READ COMMITTED. This includes reads done by UPDATE, DELETE, or SELECT…FOR UPDATE or SELECT…LOCK IN SHARE MODE.

    In early 2018, I commented on https://falseisnotnull.wordpress.com/2018/02/06/about-read-uncommitted/ and presented an example where a mix of transactions with SERIALIZABLE isolation level and REPEATABLE READ isolation level seems to lead to non-serializable execution.

    I think that it is worth mentioning that in order for REPEATABLE READ to work with distributed transactions, some kind of synchronized read view creation would be needed. In a MariaDB meeting in June 2018, we had some brainstorming around this as noted in the comments at https://jira.mariadb.org/browse/MDEV-16610 but sadly there has been no progress on that.

    Also somewhat related to this, I found a race condition in the transaction commit of MySQL 5.7 that would seem to allow conflicting exclusive record locks to be granted. While fixing that in MariaDB Server (MDEV-15326), I had to remind myself how the read view creation is supposed to work. I came to the conclusion that because there is no locking around read view creation, some nondeterminism is to be expected when a transaction commits around the time the read view is created. Example: Transactions T1 and T2 are being committed while transaction T3 is creating a read view. That read view is valid if it contains any of the following sets of committed transactions: {},{T1},{T2},{T1,T2}. In the context of MDEV-16610, the transactions would be split to multiple database instances, say a,b. Any of the following distributed read views would be valid: {},{T1a,T1b},{T2a,T2b},{T1a,T1b,T2a,T2b}. But, for example {T1a} or {T2a,T1b} (which are possible now) are not consistent distributed read views.

    ReplyDelete
    Replies
    1. MongoDB added stronger consistency/durability via async replication (see http://smalldatum.blogspot.com/2019/09/tunable-consistency-in-mongodb.html). While upstream MySQL is all in for group replication (sync log shipping).

      Assuming that MariaDB will not pursue the GR approach as that adds a lot of complexity per-engine and async replication already works, perhaps MariaDB will pursue the MongoDB approach of layering support for stronger consistency above async replication.

      That might also be useful for Spider and Vitess and could make MariaDB the preferred DBMS for Vitess. Something is needed to do cross-shard consistent reads.

      Delete

Speedb vs RocksDB on a large server

I am happy to read about storage engines that claim to be faster than RocksDB. Sometimes the claims are true and might lead to ideas for mak...