How much extra space is used for attribute names? Does page level compression in the WiredTiger and TokuMX engines make this a non-issue? Repeated values in every document from long attribute names seem like something that is easy to compress. And the alternative of using extra short attribute names will cause pain for anyone trying to use the database so page level compression might be the preferred solution.
While page level compression can remove the bloat from long attribute names for compressed copies of pages it doesn't solve the problem for uncompressed copies of pages. So there is still a cost from dynamic schemas. Perhaps one day we will get an engine that encodes long attribute names efficiently even for uncompressed pages. The impact from this is that fewer uncompressed pages can fit in cache because of the space overhead. Note that when page level compression is used there are some database pages that are in cache in both compressed and uncompressed forms. I assume that the WiredTiger and TokuMX block caches only cache uncompressed pages, but I am not an expert in either engine, and the OS filesystem cache has copies of compressed pages. I am not sure what happens when direct IO is used with WiredTiger because that prevents use of the OS filesystem cache.
Results
I used iibench for MongoDB and loaded 2B documents for a few configurations: mmapv1 and WiredTiger without compression (wt-none), with snappy compression (wt-snappy) and with zlib compression (wt-zlib). To keep the documents small I edited the iibench test to use a 1-byte character field per document and disabled creation of secondary indexes. I used two versions of iibench. The first used the attribute names as-is (long) and the second used shorter versions (short) for a few of the attribute names:- long attribute names: price, customerid, cashregisterid, dateandtime
- short attribute names: price, cuid, crid, ts
This shows the database size in GB for each of the engine configurations. Note that WiredTiger with zlib uses about 1/8th the space compared to mmapv1 and even the uncompressed WiredTiger engine does a lot better than mmapv1. I suspect that most of the benefit for wt-none versus mmapv1 is the overhead from using power of 2 allocations in mmapv1. As a side note, I am not sure we will be able to turn off power of 2 allocation for mmapv1 in future releases.
As you mention, compression can eliminate much of the overhead on disk, but field names are uncompressed in the cache and reduce it's effectiveness.
ReplyDeleteForget to re-share the earlier results from TokuMX. Are there others? http://www.tokutek.com/2013/08/tokumx-tip-create-any-field-name-you-want/
DeleteRegarding your question about turning off powerOf2, it is still available in 2.8 using the "noPadding" argument to createCollection or via collMod command on an existing collection. It's a little buried in the release notes at the moment: http://docs.mongodb.org/v2.8/release-notes/2.8-general-improvements/#mmapv1-record-allocation-behavior-changed. powerOf2Sizes can definitely cost space in an insert-only experiment like this.
ReplyDeleteFavorite topic of mine :-) Just wanted to jot down here some comments from our fb thread:
ReplyDelete- I think you present here a worst case: Values are 1 character only, an insert only workload has no fragmentation, and I believe the usePowerOf2Sizes (which you wouldn't really want to use in an insert only workload) also amplifies the difference.
- Nevertheless, it is useful to know what the worst case can be. (Of course, you could have ridiculously long key names like commonly seen in Java or .Net to show an even worse case.)
- The similar test by your brother, that you linked to in comments, showed only 10% difference with mmap engine. I believe this is closer to real world metrics, but have not done any measurements of my own. (But it seems Tim's test also has no fragmentation, so a real world situation could be even below 10%.)
- A lot of "experts" routinely advice every MongoDB user to use short key names. I believe as a general piece of advice this is misguided. The loss in readability is a bigger problem than the overhead from longer key names. Such advice should always be accompanied with actual numbers proving the overhead that is avoided. As this blog shows, the real fix is to fix the data storage to avoid the overhead. (...where optimizing RAM consumption remains on the TODO list for MongoDB.)
- Most interesting part of your results is the observation that snappy compression in WT does very little to fix this problem. It's good to note the commentary from your previous blog post that WT is configured to use different page sizes with snappy vs zlib (as the assumed use cases are different). It would be interesting (but not that important, really) to know if this difference is due to the compression algorithm or just the page size.
We need more documentation on the WT engines to make it easier to understand things like the difference between page sizes used for snappy and zlib. Maybe someone should write a book?
DeleteI find it interesting that mongodb does not have something similar to innodb's unzip LRU. Is this on purpose or is it just not implemented? For io-bound workloads it would be more efficient to keep the compressed copies and not the uncompressed ones.
ReplyDeleteWith buffered IO and WiredTiger the OS filesystem cache is the LRU for compressed pages. Not as fast as accessing them from the mongod address space, but avoids a lot of complexity.
Delete