Monday, February 15, 2021

Not finding CPU regressions across RocksDB releases

I looked for CPU regressions in recent RocksDB releases and was happy to not find them. 

My workload was low concurrency (1 or 2 threads), an in-memory workload and a small server. I tested RocksDB versions 6.4 and 6.11 through 6.17. My servers use Ubuntu 20.04 which has g++ version 9.3. I wasn't able to compile versions prior to 6.4 because there were compiler errors that I didn't try to resolve. Such is the cost of using modern C++.

Running the tests

I used the all3.sh, run3.sh and rep_all3.sh scripts from my github repo. The scripts do two things for me. First, they handles changes to the db_bench options across RocksDB versions. Second, all3.sh runs tests in a sequence that is interesting to me. I need to update all3.sh for changes in the 6.X branch with respect to the db_bench options.

I forgot to do ulimit -n 50000 prior to running tests and repeated tests after fixing that. I have forgotten to do that many times in the past.

I ran the test in two modes: not-cached and cached. By not-cached I mean that the database is larger than the RocksDB block cache and cached means the database fits in the RocksDB block cache. In both cases all data is in the OS page cache.

For the benchmarks:10M KV pairs were inserted in the initial load, each test step was run for 300 seconds and the LSM tree was made small (8M write buffer, 32M L1) to get more levels in the LSM tree.

Command lines that are useful to me, but the scripts might be inscrutable to you:

# To run for cached
bash run3.sh 100000000 64 300 8 32 $(( 10 * 1024 * 1024 ))
# To run for not-cached
bash run3.sh 100000000 64 300 8 32 $(( 1 * 1024 * 1024 ))
# To generate summaries for response time and throughput
bash rep_all3.sh v64 v611 v612 v613 v614 v615 v616 v617

Results

While there are many test steps (test step == one run of db_bench) the most interesting are the first two (fillrandom, overwrite) and the last 5 (readwhilewriting, seekrandomwhilewriting with different range sizes). Results can be misleading for the read-only tests that are done in between these as the performance on them depends on the shape of the LSM tree. The amount of data in the memtable, L0 and L1 isn't deterministic and can have a large impact on the CPU overhead for queries. In MyRocks I reduce the impact from this by flushing the memtable and compacting the L0 but db_bench doesn't have options for that (yet). It does have an option to do a full compaction but that is too much for me.

So I will share the results for all test steps but focus on the first 2 and last 5. There aren't significant regressions from v64 to v617. Results with a larger font and numbers for both response time and throughput are in github for cached and not-cached.

This has the QPS from the cached test:

v64     v611    v612    v613    v614    v615    v616    v617    test

357763  339811  329584  342793  339205  347598  339990  348218  fillrandom
344379  344567  340444  330834  325129  333669  331436  347029  overwrite
2817650 2779196 2877354 2886832 2887711 2774030 2710212 2823124 readseq
159063  129527  121106  123190  121325  134453  120146  137377  readrandom
77047   73334   64849   71194   49592   93552   64414   98480   seekrandom
3760630 3319862 3436593 3424777 3416794 3468542 3348936 3419843 readseq
216189  168233  170755  167659  177929  194189  170752  197241  readrandom
77176   74307   67099   73279   83014   95671   66752   97165   seekrandom
76688   73207   65771   71620   80992   93068   64924   94364   seekrandom
67994   65093   59306   65372   72698   80427   59296   83996   seekrandom
35619   33270   32388   34049   36375   38317   32091   38652   seekrandom
155204  151360  150730  151218  149980  150653  150261  148748  readwhilewriting
57080   54777   56317   55931   55271   55564   55581   56334   seekrandomwhilewriting
56184   53540   54838   54445   54450   54633   54465   54410   seekrandomwhilewriting
51143   49391   50092   50338   49548   50055   49486   50481   seekrandomwhilewriting
29553   27373   28491   28734   28586   28242   27853   28267   seekrandomwhilewriting


And this has the QPS from the not-cached test:

v64     v611    v612    v613    v614    v615    v616    v617    test
349918  349072  341224  347164  348470  340888  347850  334909  fillrandom
344040  327776  334852  332857  336480  343888  339678  332415  overwrite
2888170 2704291 2869560 2838685 2708847 2630220 2743535 2634374 readseq
167660  133981  130999  112923  120657  120273  87018   121126  readrandom
79615   58025   66542   49269   71643   71525   94862   71959   seekrandom
3784203 3284938 3411521 3404096 3414857 3409997 3448335 3366118 readseq
222893  165198  172372  175113  169132  174096  190337  166636  readrandom
80397   59224   67565   83540   73345   73666   94354   73855   seekrandom
78815   58232   65396   81865   72491   71938   92648   72689   seekrandom
70153   52933   60468   73907   64593   64626   80768   65317   seekrandom
36654   29389   32587   36881   34236   33753   38028   33618   seekrandom
154127  150561  150021  151168  148856  149113  149967  150643  readwhilewriting
57050   55440   55498   55576   55258   56255   55440   55162   seekrandomwhilewriting
56178   54348   55160   54893   54251   54699   54177   54770   seekrandomwhilewriting
51651   49415   50537   50627   49488   49281   49268   50172   seekrandomwhilewriting
29567   27303   28346   28359   28376   27969   27932   28204   seekrandomwhilewriting

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...