My test used 1 collection with 1M documents and 1 client thread. The sysbench client was changed to add extra attributes immediately after the _id attribute. The extra attributes used names of the form x${K} for K in 1 to 1, 1 to 10, 1 to 100 and 1 to 1000. Each of these attributes were assigned the value of K (x1=1, x2=2, ...). Immediately after these another attribute was added with the name y and assigned the same value as the _id attribute. Each test query matches one document and returns the values of the _id and y attributes after matching the _id and y attributes. The purpose of this test is to determine the overhead from skipping the extra attributes to find the y attribute. For N=1 the results and test was kind-of similar to what I did for a recent cached & read-only test.
This is data for the chart above. Note that TokuMX does much better for small N but MongoDB does much better for large N. I repeated the tests with the y attribute before the x${K} attributes and the results are about the same as below so my assumption was wrong about the overhead for attribute searches and the real overhead is from BSON parsing. Note that I ran extra tests for 200, 400, 500 and 750 attributes to understand the rate at which QPS decreases.
queries per second by #attributes
1 10 100 200 400 500 750 1000 number of x${K} attributes
14326 13643 10317 8638 6543 5832 4515 3306 tokumx141
9866 9398 8473 8305 7242 6903 6164 4954 mongo249
9119 9530 8547 8613 7890 7577 6602 5289 mongo260
Overhead for 1000 attributes
I looked at top when the tests were running and for all cases mongod was using ~90% of 1 CPU core and Java was using ~10 of another. So MongoDB was using CPU at the same rate as TokuMX while serving many more QPS. I then used the Linux perf utility to understand what was using CPU in mongod. The CPU overhead for TokuMX is from calls to free & malloc. If I had an easy way to get hierarchical profiles I would include that, alas I don't have that. I do have the -g option with perf but the output isn't very interesting. It would be nice if TokuMX figured out how to get more debug symbols into their binary without expecting users to download/install a separate debug package.This is the output from "perf -g" for TokuMX 1.4.1
14.56% mongod mongod [.] free
|
--- free
|
|--91.43%-- 0x7fe27c1c09d8
| (nil)
|
|--7.66%-- (nil)
--0.91%-- [...]
12.71% mongod mongod [.] malloc
|
--- malloc
|
|--76.17%-- mongo::ElementMatcher::~ElementMatcher()
| (nil)
|
|--16.90%-- operator new(unsigned long)
| 0x100000000
| 0x3031317473657462
|
|--4.22%-- (nil)
| |
| --100.00%-- 0xffb
|
|--1.37%-- 0xffffffff00000000
| 0x3031317473657462
|
--1.34%-- mongo::BSONObj::init
And this is the non-hierarchical output from Linux perf during a 5 second interval of the test.
TokuMX 141
15.83% mongod mongod [.] free
11.64% mongod mongod [.] malloc
11.23% mongod mongod [.] mongo::Projection::transform
6.42% mongod libstdc++.so.6.0.17 [.] _ZNSs4_Rep10_M_disposeERKSaIcE.part.13
6.40% mongod mongod [.] std::_Rb_tree<>::find
4.81% mongod libc-2.5.so [.] strlen
4.01% mongod libstdc++.so.6.0.17 [.] std::basic_string::basic_string
3.49% mongod libstdc++.so.6.0.17 [.] char* std::string::_S_construct
2.43% mongod libc-2.5.so [.] memcmp
2.38% mongod libc-2.5.so [.] memcpy
2.00% mongod libstdc++.so.6.0.17 [.] operator new(unsigned long)
1.93% mongod mongod [.] mongo::Projection::append
1.27% mongod libstdc++.so.6.0.17 [.] std::string::_Rep::_S_create
MongoDB 2.4.9
13.78% mongod mongod [.] mongo::Projection::transform
11.75% mongod mongod [.] mongo::UnorderedFastKeyTable::Area::find
7.95% mongod libc-2.13.so [.] __strlen_sse42
7.28% mongod mongod [.] mongo::Projection::append
4.21% mongod mongod [.] (anonymous namespace)::cpp_alloc
3.86% mongod mongod [.] mongo::BSONElement::size()
2.07% mongod mongod [.] _ZN12_GLOBAL__N_121do_free_with_callback
1.18% mongod mongod [.] mongo::KeyV1::toBson()
0.92% mongod mongod [.] mongo::BucketBasics::KeyNode::KeyNode
0.86% mongod mongod [.] boost::detail::shared_count::~shared_count()
0.71% mongod libpthread-2.13.so [.] pthread_mutex_lock
0.70% mongod libstdc++.so.6.0.16 [.] std::basic_string::~basic_string()
0.69% mongod mongod [.] tc_malloc
0.68% mongod libc-2.13.so [.] __memcpy_ssse3
MongoDB 2.6.0
14.28% mongod mongod [.] mongo::ProjectionStage::transform
10.03% mongod mongod [.] MurmurHash3_x64_128
4.45% mongod mongod [.] mongo::BSONElement::size()
3.93% mongod libc-2.13.so [.] __strlen_sse42
2.83% mongod mongod [.] _ZNKSt3tr110_HashtableIN5mongo10StringDataES2_SaIS2_ESt9_IdentityIS2_ESt8equal_toIS2_ENS2_6HasherENS_8__detail18_Mod_range_hashingENS9_20_Default_ranged_hashENS9_20_Prime_rehash_policyELb0ELb1ELb1EE12_M_find_nodeEPNS9_10_Hash_nodeIS2_Lb0EEERKS2_m.isra.162.constprop.255
2.65% mongod mongod [.] operator new(unsigned long)
1.98% mongod libc-2.13.so [.] __memcmp_sse4_1
1.74% mongod mongod [.] mongo::StringData::Hasher::operator()
1.62% mongod mongod [.] operator delete(void*)
1.46% mongod mongod [.] mongo::KeyV1::toBson()
1.20% mongod mongod [.] mongo::BucketBasics::KeyNode::KeyNode
1.06% mongod libc-2.13.so [.] __memcpy_ssse3
0.83% mongod mongod [.] tc_malloc
0.82% mongod mongod [.] mongo::ps::Rolling::access
0.78% mongod libpthread-2.13.so [.] pthread_mutex_lock
Overhead for 10 attributes
This lists the CPU profile from Linux perf during 5 seconds of the test. Unlike the 1000 attribute result above, here malloc/free or their equivalent are the top two sources for TokuMX and MongoDB.
TokuMX 1.4.1
4.53% mongod mongod [.] malloc
4.12% mongod mongod [.] free
2.89% mongod libtokufractaltree.so [.] _ZNK4toku3omtIP13klpair_structS2_Lb0EE24find_internal_plus_arrayIR9ft_searchXa
dL_Z15wrappy_fun_findIS6_XadL_ZL23heaviside_from_search_tRK10__toku_db
1.80% mongod libc-2.5.so [.] strlen
1.53% mongod mongod [.] mongo::storage::KeyV1::woCompare
1.14% mongod libpthread-2.5.so [.] pthread_mutex_lock
1.02% mongod libc-2.5.so [.] memcpy
0.99% mongod mongod [.] mongo::Projection::transform
0.96% mongod libstdc++.so.6.0.17 [.] _ZNSs4_Rep10_M_dispose
0.87% mongod libstdc++.so.6.0.17 [.] operator new(unsigned long)
0.86% mongod mongod [.] mongo::Projection::init
0.85% mongod libtokufractaltree.so [.] _Z26toku_ft_search_which_childP17__toku_descriptorPFiP9__toku_dbPK10__toku_dbt
S5_EP6ftnodeP9ft_search
0.85% mongod libc-2.5.so [.] memcmp
MongoDB 2.4.9
7.47% mongod mongod [.] cpp_alloc
3.94% mongod mongod [.] _ZN12_GLOBAL__N_121do_free_with_callback
2.24% mongod mongod [.] mongo::KeyV1::toBson()
2.21% mongod libc-2.13.so [.] __strlen_sse42
1.58% mongod mongod [.] boost::detail::shared_count::~shared_count()
1.51% mongod mongod [.] mongo::BSONElement::size()
1.48% mongod libc-2.13.so [.] __memcpy_ssse3
1.40% mongod mongod [.] mongo::BucketBasics::KeyNode::KeyNode
1.39% mongod libstdc++.so.6.0.16 [.] _ZNSs4_Rep10_M_disposeERKSaIcE
1.22% mongod libstdc++.so.6.0.16 [.] std::basic_string::~basic_string()
1.22% mongod libc-2.13.so [.] __memcmp_sse4_1
1.19% mongod mongod [.] tc_malloc
0.99% mongod mongod [.] operator new(unsigned long)
0.95% mongod mongod [.] boost::intrusive_ptr::~intrusive_ptr()
0.94% mongod libpthread-2.13.so [.] pthread_mutex_lock
0.89% mongod libstdc++.so.6.0.16 [.] std::basic_string::basic_string
0.77% mongod libstdc++.so.6.0.16 [.] std::basic_string::basic_string
0.69% mongod mongod [.] mongo::BtreeBucket::customBSONCmp
0.68% mongod mongod [.] mongo::UnorderedFastKeyTable::Area::find
0.64% mongod [ip_tables] [k] ipt_do_table
0.64% mongod mongod [.] CoveredIndexMatcher::matches
MongoDB 2.6.0
4.26% mongod mongod [.] operator new(unsigned long)
2.86% mongod mongod [.] operator delete(void*)
2.70% mongod mongod [.] mongo::KeyV1::toBson() const
1.62% mongod mongod [.] mongo::BucketBasics::KeyNode::KeyNode
1.57% mongod libc-2.13.so [.] __memcpy_ssse3
1.56% mongod libc-2.13.so [.] __strlen_sse42
1.14% mongod libc-2.13.so [.] vfprintf
1.13% mongod libpthread-2.13.so [.] pthread_mutex_lock
1.13% mongod libc-2.13.so [.] __memcmp_sse4_1
1.06% mongod mongod [.] mongo::BSONElement::size()
0.92% mongod libstdc++.so.6.0.16 [.] _ZNSs4_Rep10_M_dispose
0.87% mongod mongod [.] tc_malloc
0.84% mongod libstdc++.so.6.0.16 [.] std::basic_string::~basic_string()
I like this test because it is something probably nobody has published anything on before :-)
ReplyDeleteAre you sure this measures what you think it measures? I would expect the bson parsing code to be exactly the same in both mongo and toku. If I read your notes correctly, it seems the documents also grow for each test (ie the first test has 3 fields, the last test has 1002 fields?) If that is the case, I would interpret this to say that with 1000 fields - regardless of which fields you are fetching - you see overhead from toku de-compression (or something else, like poorer cache hit rate, probably combined with de-compression...) whereas mongodb degrades slower.
Everything is in cache -- OS filesystem cache for MongoDB, TokuMX buffer pool. For TokuMX documents are uncompressed when in the buffer pool. Repeating test with "y" attribute added before all of the x${K} attributes. Expected result in that case is better QPS as K grows.
DeleteBut I like your point that there is more overhead from BSON parsing when there are more attributes. I hadn't considered that and the overhead from that usually isn't mentioned (or is conveniently ignored) when extolling the virtues of the document data model.
DeleteWell, the document data model is great. But it's well known that it's certainly possible to optimize the "bson encoding" from what it is today. It's just good enough for most use cases, but adding 1k fields on a single level will slow you down. (The workaround is to group things into sub-levels, essentially creating your own b-tree...)
DeleteAnyway, what you wrote now as your interpretation would require you to fetch the first (and tenth) field in a document which has 1002 fields. Then you'd see the difference between overhead from seeking vs overhead from the document size.
Are you sure that is the workaround?
DeleteLet me elaborate... The street smart wisdom claims that it is the number of keys on the same level that makes seeking slow (it's a linear search). So documents like yours with 1000 keys together, or an array of 1000 elements. However, if you'd group them into groups of 10, then you only seek 10 keys per level = max 30 keys to find what you want.
DeleteIe instead of accessing { x123 : 1 } you would get the same value much more efficiently from { x1.y2.z3 : 1 }.
Is this something you know to be true because you tested it (like I do here) or just something people think is true?
DeleteIt's something people think is true, which is why I phrased it as "word on the street". However, if you read bsonspec.org, you will see that the bson structure itself makes the belief credible. Basically, each field, including a list or subdocument, starts with its length.
DeleteI have not personally verified that the libbson in fact takes advantage of this opportunity to skip unneeded sub-documents, but I find it's likely.
Btw, these results are not "kind of like" your previous read-only test. In the previous test mongo24 was clearly faster, here you start with toku leading for small documents. The difference seems large enough to be significant. (which is surprising)
ReplyDeleteIf the overhead is in malloc, it might be interesting to benchmark with documents that have a single large attribute so that the total document size matched the N=1000 document size -- that might help isolate whether it's actually the BSON parsing or just the total document size stressing the allocator in some way.
ReplyDelete