VSZ(gb) RSS(gb) malloc
7.9 4.8 jemalloc-3.6.0
13.6 12.4 glibc-2.23
I am not sure that it is possible to use a large RocksDB block cache with glibc malloc, where large means that it gets about 80% of RAM.
I previously shared results for MySQL and for MongoDB. There have been improvements over the past few years to make glibc malloc perform better on many-core servers. I don't know whether that work also made it better at avoiding fragmentation.
Have you used hugepages via https://github.com/facebook/rocksdb/wiki/Allocating-Some-Indexes-and-Bloom-Filters-using-Huge-Page-TLB and fiddled with arena_block_size as well? Moving indexes to huge pages should reduce your fragmentation.... Also arena_block_size also helps force use of the rocksdb private allocator as opposed to malloc/jemalloc at all....
ReplyDeleteI am interested in reading results from such tuning, but I won't run those tests.
DeleteI am wary of depending on huge pages. I have smart friends who prefer we don't use them on production servers. I am also wary of tuning malloc, we already have too much tuning in RocksDB so I don't want to extend that cost to more things.
I have yet to repeat tests to determine the impact of arena_block_size. It is 8mb on MyRocks in Percona Server 5.7.21. That seems large enough to avoid fragmentation, but I don't have time to determine the impact of changing it. Eventually we run out of time and HW for running perf tests and need systems that don't require so much tuning.