Tuesday, June 9, 2015

RocksDB & ForestDB via the ForestDB benchmark: cached database

For this test I use a database smaller than RAM so it should remain cached even after space-amplification occurs. Tests were repeated with both a disk array and SSD as the database still needs to be made durable and some engines do more random IO for that. Tests were also run for N=10 and N=20 where that is the number of threads to use for the tests with concurrent readers or writers. The test server has 24 hyperthread cores. All tests used a database with 56M documents and ~100G block cache. All tests also set the ForestDB compaction threshold to 25%.

Disk array, 10 threads

This configuration used a disk array and 10 user threads (N=10) for the concurrency tests. Unlike the IO-bound/disk test the load here was faster for RocksDB. Had more documents been inserted eventually ForestDB would be faster.

RocksDB continues to be faster on the write-only tests (ows.1, ows.n, owa.1, owa.n). I did not spend much time trying to explain the difference.

For the point query test ForestDB is faster at 1 thread but much slower at 10 threads. I think the problem is mutex contention on the block cache. I present stack traces at the end of the post to explain this. For the range query tests RocksDB is always faster because ForestDB has to do more work to get the data and RocksDB benefits from a clustered index.

operations/second for each step
        RocksDB  ForestDB
load     133336     86453
ows.1      4649      2623
ows.n     11479      8339
pqw.1     63387    102204
pqw.n    531653    397048
rqw.1    503364     24404
rqw.n   4062860    205627
pq.1      99748    117481
pq.n     829935    458360
rq.1     774101    292723
rq.n    5640859   1060490
owa.1     75059     28082
owa.n     73922     27092

The command lines are below. The config used 8 files for ForestDB.

bash rall.sh 56000000 log /disk 102400 64 10 600 3600 1000 1 rocksdb 20 no 1
bash rall.sh 56000000 log /disk 102400 64 10 600 3600 1000 1 fdb 20 no 8

SSD, 10 threads

This configuration used an SSD and 10 user threads (N=10) for the concurrency tests. The results are similar to the results above for the disk array with a few exceptions. RocksDB does worse on the load and write-only tests because the disk array has more IO throughput.

operations/second for each step
        RocksDB  ForestDB
load      46895     86328
ows.1      2899      2131
ows.n     10054      6665
pqw.1     63371    102881
pqw.n    525750    389205
rqw.1    515309     23648
rqw.n   4032487    203822
pq.1      99894    115806
pq.n     819258    454507
rq.1     756546    294490
rq.n    5708140   1074295
owa.1     30469     22305
owa.n     29563     20671

The command lines are below. The config used 8 files for ForestDB.

bash rall.sh 56000000 log /ssd1 102400 64 10 600 3600 1000 1 rocksdb 20 no 1
bash rall.sh 56000000 log /ssd1 102400 64 10 600 3600 1000 1 fdb 20 no 8

SSD, 20 threads

This configuration used an SSD and 20 user threads (N=20) for the concurrency tests. RocksDB makes better use of the extra concurrency in the workload. In some cases throughput for ForestDB was already limited by mutex contention with N=10 and did not improve here.

operations/second for each step
        RocksDB  ForestDB
load      46357     85053
ows.1      2987      2082
ows.n     13595      7263
pqw.1     62684    102648
pqw.n    708154    354919
rqw.1    510009     24122
rqw.n   5958109    253565
pq.1     100403    117666
pq.n    1227031    387373
rq.1     761143    294078
rq.n    8337013   1013277
owa.1     30487     22219
owa.n     28972     21487

The command lines are below. The config used 8 files for ForestDB.

bash rall.sh 56000000 log /ssd1 102400 64 20 600 3600 1000 1 rocksdb 20 no 1
bash rall.sh 56000000 log /ssd1 102400 64 20 600 3600 1000 1 fdb 20 no 8

Stack traces and nits

I used PMP to get stack traces to explain performance for some tests. I have traces mutex contention and one other problem. I ended up reproducing one problem by reloading 1B docs into a database. 

This thread stack shows mutex contention while creating iterators for range queries. This stack trace was common during the range query tests.

This thread stack shows mutex contention on the block cache during range queries. I am not certain, but I think this was from the point query tests.

This has 3 stack traces to show the stall on the commit code path where disk reads are done. That was a problem for ows.1, ows.n, owa.1 and owa.n.


No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...