Insert performance for RocksDB and LevelDB did not improve beyond one thread when sync on commit was disabled because of contention on the mutex. It could improve with more threads when sync on commit was enabled courtesy of group commit. Write-only workloads are rare for a DBMS that uses a B-Tree because read-modify-write of leaf nodes is done for updates and inserts. Write-only is more common with RocksDB thanks to the merge operator and because non-unique secondary index maintenance doesn't need RMW. We expect to make this more common via SQL features in MyRocks.
We continue to make RocksDB better and today I show how the concurrent insert rate improves by 3X with sync on commit disabled and 2X with it enabled. Hopefully we will get a wiki entry to explain the code changes, until then thank you Nathan. The new behavior is enabled by setting two configuration options to true: allow_concurrent_memtable_write and enable_write_thread_adaptive_yield.
These tests were done using a server with 24 cores (48 HW threads) and enough RAM to cache the database. The test script is here. The workload is write-only with threads doing Put operations on randomly selected keys.
Results with sync on commit disabled
This section has results with sync on commit disabled. The first graph shows the rate in inserts/second for RocksDB with the concurrent memtable enabled (New) and disabled (Old). The second graph shows the ratio of (New/Old) and the speedup is about 3X at high concurrency.
Results with sync on commit enabled
This section has results with sync on commit enabled. The first graph shows the rate in inserts/second for RocksDB with the concurrent memtable enabled (New) and disabled (Old). The second graph shows the ratio of (New/Old) and the speedup is about 2X at high concurrency.
Speedup for MyRocks
I used a smaller server (20 cores instead of 24) to measure the benefit from this change for loading linkbench tables in parallel. The linkbench load step was run with generate_nodes=false to disable the single-threaded load of the node table and maxid1=30M. The database is small enough to fit in cache. The speedup is 1.4X at high concurrency. I will be vague but there is another configuration change that improves the speedup to 2.4X.