How much extra space is used for attribute names? Does page level compression in the WiredTiger and TokuMX engines make this a non-issue? Repeated values in every document from long attribute names seem like something that is easy to compress. And the alternative of using extra short attribute names will cause pain for anyone trying to use the database so page level compression might be the preferred solution.
While page level compression can remove the bloat from long attribute names for compressed copies of pages it doesn't solve the problem for uncompressed copies of pages. So there is still a cost from dynamic schemas. Perhaps one day we will get an engine that encodes long attribute names efficiently even for uncompressed pages. The impact from this is that fewer uncompressed pages can fit in cache because of the space overhead. Note that when page level compression is used there are some database pages that are in cache in both compressed and uncompressed forms. I assume that the WiredTiger and TokuMX block caches only cache uncompressed pages, but I am not an expert in either engine, and the OS filesystem cache has copies of compressed pages. I am not sure what happens when direct IO is used with WiredTiger because that prevents use of the OS filesystem cache.
ResultsI used iibench for MongoDB and loaded 2B documents for a few configurations: mmapv1 and WiredTiger without compression (wt-none), with snappy compression (wt-snappy) and with zlib compression (wt-zlib). To keep the documents small I edited the iibench test to use a 1-byte character field per document and disabled creation of secondary indexes. I used two versions of iibench. The first used the attribute names as-is (long) and the second used shorter versions (short) for a few of the attribute names:
- long attribute names: price, customerid, cashregisterid, dateandtime
- short attribute names: price, cuid, crid, ts
power of 2 allocations in mmapv1. As a side note, I am not sure we will be able to turn off power of 2 allocation for mmapv1 in future releases.