TokuMX, MongoDB and InnoDB versus the insert benchmark with disks
I used the insert benchmark on servers that use disks in my quest to learn more about MongoDB internals. The insert benchmark is interesting for a few reasons. First while inserting a lot of data isn't something I do all of the time it is something for which performance matters some of the time. Second it subjects secondary indexes to fragmentation and excessive fragmentation leads to wasted IO and wasted disk space. Finally it allows for useful optimizations including write-optimized algorithms ( fractal tree via TokuMX , LSM vis RocksDB and WiredTiger ) or the InnoDB insert buffer . Hopefully I can move onto other workloads after this week. This test is interesting for another reason that I don't really explore here but will in a future post. While caching all or most of the database in RAM works great at eliminating reads it might not do much for avoiding random writes. So a write heavy workload with a cached database can still be limited by random write IO and this will