Monday, June 29, 2015

Examining performance for MongoDB and the insert benchmark

My previous post has results for the insert benchmark when the database fits in RAM. In this post I look at MongoDB performance as the database gets larger than RAM. I ran these tests while preparing for a talk and going on vacation, so I am vague on some of the configuration details. The summary is that RocksDB does much better than mmapv1 and the WiredTiger B-Tree when the database is larger than RAM because it is more IO efficient. RocksDB doesn't read index pages during non-unique secondary index maintenance. It also does fewer but larger writes rather than the many smaller/random writes required by a B-Tree. This is more of a benefit for servers that use disk.

Average performance

I present performance results using a variety of metrics. The first is average throughput during the test. RocksDB is much faster than WiredTiger and WiredTiger is much faster than mmapv1. But you should be careful about benchmark reports that only include the average. Read on to learn more.

Cumulative average

This displays the cumulative average. That is the average from test start to the current point in time. At test end the value is the same as the average performance. This metric is a bit more useful than the average performance because it can show some variance. In the result below RocksDB quickly reaches a steady rate while WiredTiger and mmapv1 degrade over time as the database gets larger than RAM. However this can still hide intermittent variance.

Variance

This displays throughput per 10-second interval for a subset of the test. mmapv1 has the least variance while WiredTiger and RocksDB have much more. The variance is a problem and was not visible in previous graphs.

Variance, part 2

The final two graphs show the per-interval throughput for all engines and then only for WiredTiger and RocksDB. The second graph was added to avoid compressing the results for RocksDB to the left hand side of the graph. The lines for RocksDB and WiredTiger are very wide because of the large variance in throughput.


No comments:

Post a Comment