Thursday, September 5, 2019

Adapting TPC-C for MongoDB - reviewing a VLDB paper

This is a review of Adapting TPC-C Benchmark to Measure Performance of Multi-Document Transactions in MongoDB which was published in VLDB 2019. I appreciate that MongoDB and Asya Kamsky took the time to get this published. That can be a weekend and nights project when in industry. I also appreciate that this not a benchmarketing effort. The purpose wasn't to overstate performance. The purpose was to show how to get good performance on a TPC-C like workload with MongoDB and realistic hardware and configurations. I hope for a similar effort on MongoDB with Linkbench.

My comments:
  • Work was done to reduce write-write conflicts which will be more likely given the extra commit latency from using w:majority writeConcern on a 3-node cluster. That work included 1) moving conflicting writes early in the transaction 2) moving writes before reads 3) using findAndModify instead of select/update and 4) batching writes. I wonder if non-stored procedures will be useful.
  • A small amount of denormalization was done by storing order lines in the order document. Denormalize everything isn't feasible here or in Linkbench because that leads to too-big documents.
  • Code and details were shared that will allow you to reproduce results.
  • w:majority was used on a 3-node cluster. The goal was to get realistic results, not a benchmarketing special.
I was confused by two things. First, section 3.3 states that majority write concern guarantees that a write is flushed to disk by any replica that ack'd the write. I thought this was determined by the value of the j option in writeConcern. Second, section 3.5.2 is about causal consistency and that (causal reads feature and logical clocks) seems like overkill when shards aren't used. If you want to avoid going back in time when moving from a primary to a secondary isn't it sufficient to remember the point-in-time at which primary queries are done? But maybe that is just a simple logical clock.




1 comment:

  1. Whether w: majority implies fsync is actually settable with the record long option writeConcernMajorityJournalDefault. It defaults to true! This breaks with mongodb precedent in 2 ways: 1 something can be set server side, and 2 defaulting to fsync. The reason for defaulting to fsync is that the Raft paper says nodes must persist log entries. Otoh even without fsync it is true that commits will survive failure of a minority of nodes. Which of these you consider "safe" is subjective/religious.

    "remember point in time" is exactly what causal consistency does.

    ReplyDelete

RocksDB on a big server: LRU vs hyperclock

This has benchmark results for RocksDB using a big (48-core) server. I ran tests to document the impact of the the block cache type (LRU vs ...