Comments on Small Datum: Redo logs in MongoDB and InnoDB

My memory is vague but I think Asya can give you a...

2014-10-08T09:52:51.734-07:00

My memory is vague but I think Asya can give you a great answer on https://groups.google.com/forum/#!forum/mongodb-user

From http://docs.mongodb.org/manual/core/journalin...

2014-10-08T09:45:50.171-07:00

From http://docs.mongodb.org/manual/core/journaling/: "a group commit must block all writers during the commit." Do you know whether "writers" refer to data writers or specifically to "other group committers"? Since readers are apparently not affected, is this a second locking mechanism or another state of the readers/writers lock? TIA.

No wonder the MongoDB guys asked us to write an LM...

2014-09-11T15:35:32.816-07:00

No wonder the MongoDB guys asked us to write an LMDB driver for them. LMDB is immune to torn writes.

Updated with JIRAs: https://jira.mongodb.org/brows...

2014-03-25T10:33:15.492-07:00

Updated with JIRAs:
https://jira.mongodb.org/browse/SERVER-13186
https://jira.mongodb.org/browse/SERVER-13343
https://jira.mongodb.org/browse/SERVER-13344
https://jira.mongodb.org/browse/SERVER-13345
https://jira.mongodb.org/browse/SERVER-13346

syncdelay=1 as mentioned in the article must make ...

2014-03-24T08:32:45.597-07:00

syncdelay=1 as mentioned in the article must make the SSD vendor happy, more writes = less endurance on flash.

Not sure what you mean by soft/hard offlining. While they have some detail on how the mmap is done, maybe they could be more clear (http://docs.mongodb.org/manual/core/journaling/). There are two mmap views -- one regular and the other created with MMAP_PRIVATE (the private view). The private view gets all changes prior to the msync call every syncdelay secs. My guess is that there is only one private view, so all changes get applied to it in sequence. At msync time something is done to copy the changes from the private view to the other view and then msync is done there (haven't read enough of that code yet). It is probably OK to copy changes from the private view over to the other view as soon as journal changes are forced to disk. Maybe that is a future optimization.

I don't think MongoDB has protection against torn writes (partial writes). So writes in progress during a crash may result in a corrupt database that can't be recovered from the journal. That requires the full copy of the page to be written. InnoDB has that via the doublewrite buffer, but that recovery safety comes at a high cost (it doubles the write rate to storage) so you need to be careful about which storage device gets the doublewrite writes.

given that mongo uses mmaps, would it really be co...

2014-03-23T22:39:11.852-07:00

given that mongo uses mmaps, would it really be considered ACID in the case of soft/hard offlining of pages before they make it to the journal ? I had some investigation into the usage of mmaps, from my reading of, there are two mmaps that are used per instance.

interesting op's note, when upgrading to 2.2 from 1.8 and having the journal enabled by default, the disks had to do a lot more [ even with 512mb write back ] and being able to increase the commit interval to 499 ms during run time kept io wait from climbing too badly. eventually moving to a dual hybrid raid config allowed the journal to run at 100ms without issues. changing the sync delay down from 60 makes a huge difference as well, although the flush is not *suppose* to be blocking, it is, but less now than it was in earlier releases.

http://blog.mongodb.org/post/64962828969/performance-tuning-mongodb-on-solidfire

~Alexis

Great post as usual :-) I have some WIP changes t...

2014-03-23T17:54:35.194-07:00

Great post as usual :-)

I have some WIP changes that allow any block size for the InnoDB redo logs. What we want to do is separate the log buffer writes from the log disk writes, separate logwr thread perhaps. That way making it O_DIRECT or buffered could be a log writer thread property. All very experimental yet. There was a suggestion from Taobao to improve the writes to the log buffer (see http://bugs.mysql.com/bug.php?id=70950). In our tests on 5.7.4 we didn't see any improvement but then we didn't use O_DIRECT either. However, I don't think O_DIRECT would have improved things.