Sunday, April 22, 2018

MyRocks, malloc and fragmentation -- a strong case for jemalloc

While trying to reproduce a MyRocks performance problem I ran a test using a 4gb block cache and tried both jemalloc and glibc malloc. The test server uses Ubuntu 16.04 which has glibc 2.23 today. The table below lists the VSZ and RSS values for the mysqld process after a test table has been loaded. RSS with glibc malloc is 2.6x larger than with jemalloc. MyRocks and RocksDB are much harder on an allocator than InnoDB and this test shows the value of jemalloc.

VSZ(gb) RSS(gb) malloc  
 7.9     4.8    jemalloc-3.6.0
13.6    12.4    glibc-2.23

I am not sure that it is possible to use a large RocksDB block cache with glibc malloc, where large means that it gets about 80% of RAM.

I previously shared results for MySQL and for MongoDB. There have been improvements over the past few years to make glibc malloc perform better on many-core servers. I don't know whether that work also made it better at avoiding fragmentation.

2 comments:

  1. Have you used hugepages via https://github.com/facebook/rocksdb/wiki/Allocating-Some-Indexes-and-Bloom-Filters-using-Huge-Page-TLB and fiddled with arena_block_size as well? Moving indexes to huge pages should reduce your fragmentation.... Also arena_block_size also helps force use of the rocksdb private allocator as opposed to malloc/jemalloc at all....

    ReplyDelete
    Replies
    1. I am interested in reading results from such tuning, but I won't run those tests.

      I am wary of depending on huge pages. I have smart friends who prefer we don't use them on production servers. I am also wary of tuning malloc, we already have too much tuning in RocksDB so I don't want to extend that cost to more things.

      I have yet to repeat tests to determine the impact of arena_block_size. It is 8mb on MyRocks in Percona Server 5.7.21. That seems large enough to avoid fragmentation, but I don't have time to determine the impact of changing it. Eventually we run out of time and HW for running perf tests and need systems that don't require so much tuning.

      Delete