Tuesday, December 30, 2014

malloc and MongoDB performance

I used iibench to understand the impact of malloc on MongoDB performance to compare malloc from glibc 2.17, jemalloc 3.6.0 and tcmalloc/gperftools 2.2 for an insert-only workload with 10 concurrent clients. The test server has 40 cores with hyperthread enabled. Alas, I lost a few configuration details, but think I used MongoDB 2.8.0rc4 and the WiredTiger b-tree.

The summary is that tcmalloc and jemalloc provide a similar benefit and are much better than glibc malloc. I made no attempt to tune tcmalloc or jemalloc. I ran iibench 3 times for each configuration and chose the median result. There are four metrics for each configuration:

  1. Test duration in seconds
  2. Address space size per the VSZ column in ps
  3. Normalized context switch rate - the number of context switches per N documents inserted. I will leave N as undefined for now so the absolute value isn't interesting. The value can be compared between malloc implementations.
  4. Normalized CPU utilization - CPU utilization per N documents inserted. See #3.

Results

          seconds  VSZ(GB)   context-switch(relative)   CPU(relative)
tcmalloc   1943     65.1            6.6                   26510
jemalloc   1877     64.5            7.1                   27165
glibc      2251     72.3           23.0                   32120

For this test glibc uses 1.19X more time, 1.12X more address space, 3.23X more context switches and 1.18X more CPU than jemalloc and tcmalloc is similar to jemalloc. As tcmalloc is bundled with the source distribution for MongoDB I assume it is used in the binaries they distribute. This appears to be a good thing.

1 comment:

  1. Actually, that's generally not a safe assumption :-) I've seen files (I think it was related to async client server protocol) which exist in the repo but weren't used by any other parts of the code.

    ReplyDelete

Fixing some of the InnoDB scan perf regressions in a MySQL fork

I recently learned of Advanced MySQL , a MySQL fork, and ran my sysbench benchmarks for it. It fixed some, but not all, of the regressions f...