Pineapple and ham work great together on pizza. RocksDB and glibc malloc don't work great together. The primary problem is that for RocksDB processes the RSS with glibc malloc is much larger than with jemalloc or tcmalloc. I have written about this before -- see here and here. RocksDB is a stress test for an allocator.
tl;dr
- For a process using RocksDB the RSS with glibc malloc is much larger than with jemalloc or tcmalloc. There will be more crashes from the OOM killer with glibc malloc.
Benchmark
The benchmark is explained in a previous post.
The insert benchmark was run in the IO-bound setup and the database is larger than memory.
The benchmark used a c2-standard-30 server from GCP with Ubuntu 22.04, 15 cores, hyperthreads disabled, 120G of RAM and 1.5T of storage from RAID 0 over 4 local NVMe devices with XFS.
The benchmark is run with 8 clients and 8 tables (client per table). The benchmark is a sequence of steps.
- l.i0
- insert 500 million rows per table
- l.x
- create 3 secondary indexes. I usually ignore performance from this step.
- l.i1
- insert and delete another 100 million rows per table with secondary index maintenance. The number of rows/table at the end of the benchmark step matches the number at the start with inserts done to the table head and the deletes done from the tail.
- q100, q500, q1000
- do queries as fast as possible with 100, 500 and 1000 inserts/s/client and the same rate for deletes/s done in the background. Run for 1800 seconds.
Configurations
The benchmark was run with 2 my.cnf files: c5 and c7 edited to use a 40G RocksDB block cache. The difference between them is that c5 uses the LRU block cache (older code) while c7 uses the Hyper Clock cache.
Malloc
The test was repeated with 4 malloc implementations:
- je-5.2.1 - jemalloc 5.2.1, the version provided by Ubuntu 22.04
- je-5.3.0 - jemalloc 5.3.0, the current jemalloc release, built from source
- tc-2.9.1 - tcmalloc 2.9.1, the version provided by Ubuntu 22.04
- glibc 2.3.5 - this is the version provided by Ubuntu 22.04
Results
I measured the peak RSS during each benchmark step.
The benchmark completed for all malloc implementations using the c5 config, but had some benchmark steps run for more time there would have been OOM with glibc. All of the configs used a 40G RocksDB block cache.
The benchmark completed for jemalloc and tcmalloc using the c7 config and fails with OOM with glibc on the q1000 step. Had the l.i1, q100 and q500 steps run for more time then the OOM would have happened sooner.