The test used a database with 600M documents. I reduced the compaction threshold for ForestDB from 50% to 25% to reduce the worst case space-amplification from 2 to 4/3 and get more data into the test database. This change isn't reflected in the configuration template I published in github. RocksDB was configured with an 8G block cache versus 16G for ForestDB. Otherwise the configuration was similar to the IO-bound/disk test.
Results
The difference in load performance is much wider here than on the disk array. I assume that write-amplification was the problem for RocksDB.
The difference in ows.1 and ows.n here is smaller than on the disk array. If ForestDB is doing random disk reads on the commit code path than the impact is much less for SSD because disk read latency is smaller. But RocksDB is still much faster for ows.1, ows.n, owa.1 and owa.n.
RocksDB continues to be faster for the point query tests (pqw.1, pqw.n, pq.1, pq.n). The difference is larger for the single threaded tests and I assume that ForestDB continues to do more disk reads per query. RocksDB is still much faster on the range query tests as explained in the previous post.
Unlike the test with the disk-array, the ForestDB tests with 1 writer thread were able to sustain 1000 writes/second as configured via the rate limit.
operations/second for each step
RocksDB ForestDB
load 24540 81297
ows.1 3616 1387
ows.n 10727 2029
pqw.1 3601 1805
pqw.n 22448 14069
rqw.1 30477 1419
rqw.n 214060 13134
pq.1 3969 2878
pq.n 24562 19133
rq.1 30621 3805
rq.n 230673 23009
owa.1 24742 1967
owa.n 22692 2319
I had to repeat this test several times to find good values for the number of documents in the database and the compaction threshold for ForestDB. I was using 50% at first for the threshold and the database was doubling in size. That doubling, 2X space amplification, is expected with the threshold set to 50% so I reduced it to 25% which should have a worst case space amp of 4/3.
Unfortunately, with 64 database files and one compaction thread the worst case space amplification can be worse than theory predicts. All database files can trigger compaction at the same point in time, but only one will be compacted at a time by the one compaction thread. So others will get much more dead data than configured by the threshold.
I expect ForestDB to do much better when it supports concurrent compaction threads. The results below show the database size per test. There is more variance in the database size with ForestDB especially during the ows.1 and ows.n tests. This can make it harder to most of the available space on a storage device.
Unfortunately, with 64 database files and one compaction thread the worst case space amplification can be worse than theory predicts. All database files can trigger compaction at the same point in time, but only one will be compacted at a time by the one compaction thread. So others will get much more dead data than configured by the threshold.
I expect ForestDB to do much better when it supports concurrent compaction threads. The results below show the database size per test. There is more variance in the database size with ForestDB especially during the ows.1 and ows.n tests. This can make it harder to most of the available space on a storage device.
Size in GB after each step
RocksDB ForestDB
load 151 228
ows.1 149 340
ows.n 155 353
pqw.1 155 316
pqw.n 155 290
rqw.1 156 262
rqw.n 156 277
pq.1 156 276
pq.n 156 277
rq.1 156 277
rq.n 156 277
owa.1 166 282
owa.n 177 288
Command lines
Command lines for the tests are:
bash rall.sh 600000000 log /ssd1 8192 64 10 600 3600 1000 1 rocksdb 20 no 1
bash rall.sh 600000000 log /ssd1 16384 64 10 600 3600 1000 1 fdb 20 no 64
No comments:
Post a Comment