- Linkbench for a cached database
- Linkbench on a small server
- Linkbench for an uncached database
- Linkbench with a disk array
MySQL schema
From the MySQL Setup section in the Linkbench README file there are 4 indexes created across 3 tables:- linktable has a primary key index on (link_type, id1, id2) and a secondary index on (id1, link_type, visibility, time, id2, version, data). The secondary index is covering for the most frequent query. While it almost doubles the size of linktable it also greatly reduces the worst-case random IO required for that query from a few thousand to a few. This can be a good tradeoff even when using PCIe flash.
- counttable has a primary key index on (id, link_type)
- nodetable has a primary key index on (id)
Why 5 extra indexes?
Why does MongoDB need 5 more indexes than MySQL? From the MongoDB driver in LinkbenchX there are 3 indexes created by the user and then 3 implicit primary key indexes on _id:
- the link collection has a primary key index on _id, a unique secondary index on (link_type, id1, id2) and a non-unique secondary index on (id1, link_type, visibility, time, id2, version, data)
- the count collection has a primary key index on _id and a unique secondary index on (id, link_type)
- the node collection has a primary key index on _id
I have described 6 indexes but claimed that 9 were used. The other 3 are used for WiredTiger and RocksDB but not for mmapv1 and are indexes on DiskLoc (key is DiskLoc, value is the document). for each collection. While WiredTiger and RocksDB support a clustered primary key that is not supported (yet) in MongoDB because the storage engine API is still evolving from what was needed for mmapv1. With mmapv1 the base table is heap organized and the value in a primary key index is a DiskLoc which I assume is a filename and offset where the document resides. WiredTiger and RocksDB are copy-on-write and the location for a document will change one or more times after an update so they can't use DiskLoc to point to the one place on disk where the document exists, thus they need the extra index. Server-14569 and server-15354 are open for this.
If we get proper support for a clustered primary key index, then MongoDB would use 6 indexes for Linkbench versus 4 for MySQL. How do we reduce that from 6 to 4? The unique secondary indexes for the link and count collections are not needed in MySQL. The problem here is that MongoDB doesn't support a composite PK. One workaround is to concatenate the fields (link_type | id1 | id2 -> _id for the link collection, (id | link_type -> _id) for the count collection. I don't recommend that because it must be done by the client, and done correctly, on every insert and update. It is even more work to make the concatenated value order preserving, which is easy in code, but makes it harder to construct the _id value when manually querying the database. This also means that some fields will be stored multiple times: once in the _id field, once separately.
No comments:
Post a Comment