Small Datum: Sysbench, IO-bound, small server: MyRocks over time

In this post I compare four MyRocks releases from February to October using IO-bound sysbench and a small server. The goal is to understand where we have made MyRocks faster and slower this year. I previously shared results for in-memory sysbench with MyRocks and IO-bound sysbench with InnoDB. Tests were done for builds of MyRocks from February 10, April 14, June 16, August 15 and October 16.

tl;dr

There is more variance in QPS on IO-bound sysbench than on in-memory sysbench. I didn't try to determine how much of that is caused by storage devices and how much by MyRocks.
Not much QPS is lost when compression is used
A typical result is a loss of 10% of QPS from February 10 to October 16
Full-scan might have lost 15% of throughput from February 10 to October 16
Full-scan throughput is between 1.2X and 1.6X better when filesystem readahead is enabled
Some read-heavy tests run after write-heavy tests lose more QPS in October 16 than February 10 when compared to the same test run before write-heavy tests. This was also seen on in-memory sysbench.

Configuration

The tests used MyRocks from FB MySQL which is currently based on 5.6.35. Builds were done using FB MySQL as of February 10, April 14, June 16, August 15 and October 16. The git hashes for these builds are:

February 10 - FB MySQL f3019b, RocksDB c2ca7a
April 14 - FB MySQL e28823, RocksDB 9300ef
June 16 - FB MySQL 52e058, RocksDB 7e5fac
August 15 - FB MySQL 0d76ae, RocksDB 50a969
October 16 - FB MySQL 1d0132, RocksDB 019aa7

All tests used jemalloc with mysqld. The i3 and i5 NUC servers are described here. My use of sysbench is described here. The my.cnf files are here for the i3 NUC and i5 NUC. I tried to tune my.cnf for all engines but there are a few new & changed options in that time. For all tests the binlog was enabled but fsync was disabled for the binlog and database redo log. Compression was not used.

Sysbench is run with 2 tables, 80M rows/table on the i3 NUC and 160M rows/table on the i5 NUC. Each test is repeated for 1 and 2 clients. Each test runs for 600 seconds except for the insert-only test which runs for 300 seconds. The database is much larger than RAM.

I repeat tests on an i5 NUC and i3 NUC. The i5 NUC has more RAM, a faster SSD and faster CPU than the i3 NUC, but I disabled turbo boost on the i5 NUC many months ago to reduce variance in performance and with that the difference in CPU performance between these servers is smaller.

Tests are repeated for MyRocks without compression and then with LZ4 for the middle levels of the LSM tree and zstandard for the max level.

Results

All of the data for the tests is on github for the i3 NUC and the i5 NUC. Results for each test are listed separately below. The graphs have the relative QPS where that is the QPS for a configuration relative to the base case. The base case is the QPS for the Feb10 build without compression. When the relative QPS is less than 1 then the base case is faster. The tables that follow have the absolute and relative QPS. The tests are explained here.

Graphs

The graphs have the QPS relative to the Feb10 build without compression. i3-none and i5-none are results for the i3 and i5 NUCs without compression. i3-zstd and i5-zstd are results for the i3 and i5 NUCs with zstandard compression.

There are 4 types of tests and I provided a graph for each type: write-heavy, scan-heavy, point-query, inlist-query. The results within each group are not as similar as for the in-memory tests, so I provide extra graphs here. The tests are explained here.

The write-heavy group includes update-inlist, update-one, update-index, update-nonindex, delete and insert. The graphs are for update-nonindex and update-index. To keep this from getting out of hand I save the analysis for the per-test sections.

For write-heavy most of the results have a relative QPS of ~0.9 on the Oct16 builds that don't use compression. There is more variance on the i3 NUC as seen below for i3-none.

The scan-heavy group includes a full scan of the PK index, read-write with range-size set to 100 and 10,000 and then read-only with range-size set to 100 and 10,000. The graphs are for read-write with range-size=100 and read-only with range-size=10,000. The largest regression comes after Feb10 or Apr14. From the graphs below the QPS decrease was larger on the i3 NUC.

The point-query group includes the point-query test run before and then after the write-heavy tests. The graph is for the test run after the write-heavy tests. The largest regression comes after Apr14. The Oct16 builds without compression have a relative QPS of ~0.9.

The inlist-query group includes the hot-points test and the random-points tests run before and then after the write-heavy tests. The graph is for the test run after the write-heavy tests.

full-scan

Here and the sections that follow have the QPS and relative QPS. The relative QPS is the QPS for the test with 1 client relative to the QPS for feb10.none. Values are provided for the i3 and i5 NUC.

The full scan of the PK index is done before and after the write-heavy tests. There is a regression on full scan throughput for the i5 NUC without compression. Otherwise there is a lot of variance.

QPS in the Oct16 build relative to Feb10:

For the i3 NUC gets better for the before and worse for the after write-heavy tests
For the i5 NUC gets worse for both the before and after write-heavy tests. The reduction for the after write-heavy tests in oct16.none on both the i3 and i5 NUC might be worth debugging as it is ~15%.

I repeated the Jun16 test with an option to make filesystem readahead more likely and that increased throughput by between 1.2X and 1.6X - see jun16.none.ra and jun16.zstd.ra. This option, rocksdb_advise_random_on_open=0, isn't safe to set for general purpose workloads.

before write-heavy
i3 NUC i5 NUC
Mrps ratio Mrps ratio engine
0.796 1.00 1.454 1.00 feb10.none
1.019 1.39 1.409 0.97 apr14.none
0.879 1.10 1.194 0.82 jun16.none
1.927 2.42 2.318 1.59 jun16.none.ra
0.860 1.08 1.198 0.82 aug15.none
0.898 1.13 1.230 0.85 oct16.none
-
0.714 0.90 0.916 0.63 feb10.zstd
0.761 0.96 0.930 0.64 apr14.zstd
0.714 0.90 0.860 0.59 jun16.zstd
1.006 1.26 1.280 0.88 jun16.zstd.ra
0.737 0.93 0.833 0.57 aug15.zstd
0.747 0.94 0.876 0.60 oct16.zstd

after write-heavy
i3 NUC i5 NUC
Mrps ratio Mrps ratio engine
0.698 1.00 1.327 1.00 feb10.none
0.758 1.09 1.280 0.96 apr14.none
0.610 0.87 1.126 0.85 jun16.none
0.969 1.39 2.133 1.61 jun16.none.ra
0.620 0.89 1.081 0.81 aug15.none
0.597 0.86 1.134 0.85 oct16.none
-
0.653 0.94 0.886 0.67 feb10.zstd
0.575 0.82 0.881 0.66 apr14.zstd
0.477 0.68 0.816 0.61 jun16.zstd
0.963 1.38 1.212 0.91 jun16.zstd.ra
0.522 0.75 0.804 0.61 aug15.zstd
0.522 0.75 0.814 0.61 oct16.zstd

update-inlist

QPS in the Oct16 build relative to Feb10:

For the i3 NUC is better
For the i5 NUC is unchanged for oct16.none and better for oct16.zstd

i3 NUC i5 NUC
QPS ratio QPS ratio engine
375 1.00 403 1.00 feb10.none
477 1.27 492 1.22 apr14.none
445 1.19 430 1.07 jun16.none
449 1.20 488 1.21 aug15.none
455 1.21 405 1.00 oct16.none
-
344 0.92 443 1.10 feb10.zstd
374 1.00 466 1.16 apr14.zstd
363 0.97 458 1.14 jun16.zstd
376 1.00 437 1.08 aug15.zstd
372 0.99 463 1.15 oct16.zstd

update-one

QPS in the Oct16 build relative to Feb10 is worse in all cases.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
8514 1.00 9287 1.00 feb10.none
7854 0.92 8972 0.97 apr14.none
7656 0.90 8508 0.92 jun16.none
7470 0.88 8377 0.90 aug15.none
7823 0.92 8655 0.93 oct16.none
-
8280 0.97 9180 0.99 feb10.zstd
7884 0.93 9270 1.00 apr14.zstd
7774 0.91 8749 0.94 jun16.zatd
7596 0.89 8517 0.92 aug15.zstd
7704 0.90 8512 0.92 oct16.zstd

update-index

QPS in the Oct16 build relative to Feb10 is slightly worse for oct16.none and the same or better for oct16.zstd.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
2515 1.00 3057 1.00 feb10.none
1570 0.62 3084 1.01 apr14.none
2477 0.98 3004 0.98 jun16.none
2460 0.98 3008 0.98 aug15.none
2411 0.96 3038 0.99 oct16.none
-
2295 0.91 2704 0.88 feb10.zstd
2279 0.91 2787 0.91 apr14.zstd
2296 0.91 2778 0.91 jun16.zstd
2242 0.89 2779 0.91 aug15.zstd
2294 0.91 2799 0.92 oct16.zstd

update-nonindex

QPS in the Oct16 build relative to Feb10 is worse for oct16.none and better for oct16.zstd.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
2393 1.00 2987 1.00 feb10.none
2265 0.95 3115 1.04 apr14.none
1391 0.58 2888 0.97 jun16.none
1403 0.59 2893 0.97 aug15.none
1445 0.60 2938 0.98 oct16.none
-
2257 0.94 2562 0.86 feb10.zstd
2279 0.95 2839 0.95 apr14.zstd
2237 0.98 2715 0.91 jun16.zstd
2266 0.95 2680 0.90 aug15.zstd
2265 0.95 2725 0.91 oct16.zstd

delete

QPS in the Oct16 build relative to Feb10 is worse for all cases except oct16.zstd on the i3 NUC.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
7924 1.00 9076 1.00 feb10.none
7810 0.99 9602 1.06 apr14.none
7666 0.97 8790 0.97 jun16.none
7566 0.95 8806 0.97 aug15.none
7505 0.95 8802 0.97 oct16.none
-
7373 0.93 8079 0.89 feb10.zstd
7222 0.91 9002 0.99 apr14.zstd
7281 0.92 8268 0.91 jun16.zstd
6955 0.88 8313 0.92 aug15.zstd
7000 0.88 8397 0.93 oct16.zstd

read-write with range-size=100

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
2992 1.00 3360 1.00 feb10.none
2831 0.95 3316 0.99 apr14.none
2565 0.86 3126 0.93 jun16.none
2608 0.87 3092 0.92 aug15.none
2595 0.87 3105 0.92 oct16.none
-
2543 0.85 2988 0.89 feb10.zstd
2572 0.86 3008 0.90 apr14.zstd
2517 0.84 2901 0.86 jun16.zstd
2472 0.83 2780 0.83 aug15.zstd
2514 0.84 2887 0.86 oct16.zstd

read-write with range-size=10000

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
168 1.00 226 1.00 feb10.none
163 0.97 223 0.99 apr14.none
146 0.87 202 0.89 jun16.none
147 0.88 205 0.91 aug15.none
149 0.89 202 0.89 oct16.none
-
142 0.85 175 0.77 feb10.zstd
134 0.80 170 0.75 apr14.zstd
132 0.79 163 0.72 jun16.zstd
132 0.79 161 0.71 aug15.zstd
136 0.81 163 0.72 oct16.zstd

read-only with range-size=100

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
2866 1.00 3257 1.00 feb10.none
2677 0.93 3137 0.96 apr14.none
2464 0.86 3011 0.92 jun16.none
2528 0.88 3069 0.94 aug15.none
2531 0.88 3011 0.92 oct16.none
-
2569 0.90 3142 0.96 feb10.zstd
2581 0.90 3003 0.92 apr14.zstd
2406 0.84 2779 0.85 jun16.zstd
2419 0.84 2777 0.85 aug15.zstd
2476 0.86 2819 0.87 oct16.zstd

read-only.pre with range-size=10000

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
150 1.00 189 1.00 feb10.none
150 1.00 195 1.03 apr14.none
137 0.91 174 0.92 jun16.none
137 0.91 176 0.93 aug15.none
136 0.91 173 0.92 oct16.none
-
118 0.79 145 0.77 feb10.zstd
117 0.78 143 0.76 apr14.zstd
112 0.75 138 0.73 jun16.zstd
112 0.75 136 0.72 aug15.zstd
114 0.76 139 0.74 oct16.zstd

read-only with range-size=100000

QPS in the Oct16 build relative to Feb10 is worse for all cases except oct16.zstd on the i3 NUC.

The QPS here is less compared to the same test from the previous section. The tests in the previous section are run before write-heavy tests while tests here are run after them. It costs more to search the LSM structures after random updates. I have written more about mistakes to avoid when doing a benchmark with an LSM.

The decrease in QPS from Feb10 to Oct16 is larger here than in the previous section. That is similar to the result on in-memory sysbench.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
129 1.00 184 1.00 feb10.none
102 0.79 181 0.98 apr14.none
102 0.79 166 0.90 jun16.none
95 0.74 166 0.90 aug15.none
101 0.78 164 0.89 oct16.none
-
101 0.78 142 0.77 feb10.zstd
108 0.84 138 0.75 apr14.zstd
105 0.81 132 0.72 jun16.zstd
104 0.81 130 0.71 aug15.zstd
107 0.83 132 0.72 oct16.zstd

point-query.pre

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
4435 1.00 4900 1.00 feb10.none
4596 1.04 4994 1.02 apr14.none
4177 0.94 4370 0.89 jun16.none
4137 0.93 4494 0.92 aug15.none
4226 0.95 4438 0.91 oct16.none
-
3422 0.77 4370 0.89 feb10.zstd
3439 0.78 4325 0.88 apr14.zstd
3354 0.76 3969 0.81 jun16.zstd
3293 0.74 3992 0.81 aug15.zstd
3305 0.75 3962 0.81 oct16.zstd

point-query

QPS in the Oct16 build relative to Feb10 is worse for all cases.

The QPS here is less compared to the same test from the previous section, which is expected for read-heavy tests that follow write-heavy tests. But the decrease is huge for the i3 NUC. I didn't debug that.

The decrease in QPS from Feb10 to Oct16 is larger here than in the previous section. That is similar to the result on in-memory sysbench.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
2735 1.00 4420 1.00 feb10.none
2858 1.04 4261 0.96 apr14.none
2361 0.86 3966 0.90 jun16.none
2452 0.90 3995 0.90 aug15.none
2346 0.86 4022 0.91 oct16.none
-
2764 1.01 4117 0.93 feb10.zstd
2638 0.96 3958 0.90 apr14.zstd
2742 1.00 3707 0.84 jun16.zstd
2667 0.98 3721 0.84 aug15.zstd
2628 0.96 3731 0.84 oct16.zstd

random-points.pre

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
68 1.00 70 1.00 feb10.none
73 1.07 65 0.93 apr14.none
65 0.96 57 0.81 jun16.none
65 0.96 65 0.93 aug15.none
64 0.94 54 0.77 oct16.none
-
52 0.76 65 0.93 feb10.zstd
52 0.76 65 0.93 apr14.zstd
50 0.74 61 0.87 jun16.zstd
50 0.74 60 0.86 aug15.zstd
50 0.74 61 0.87 oct16.zstd

random-points

QPS in the Oct16 build relative to Feb10 is worse for all cases. What I wrote in the point-query section is mostly true here, especially the part about QPS being worse for the test run after write-heavy tests.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
50 1.00 56 1.00 feb10.none
44 0.88 54 0.96 apr14.none
36 0.72 62 1.11 jun16.none
40 0.80 63 1.13 aug15.none
40 0.80 50 0.89 oct16.none
-
43 0.86 62 1.11 feb10.zstd
44 0.88 62 1.11 apr14.zstd
41 0.82 57 1.02 jun16.zstd
40 0.80 55 0.98 aug15.zstd
37 0.74 57 1.02 oct16.zstd

hot-points

While this is an IO-bound benchmark the hot-points test is always in-memory. But the results here have more variance than on in-memory sysbench. I didn't debug that.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
1437 1.00 1327 1.00 feb10.none
1263 0.88 1456 1.10 apr14.none
1000 0.70 1125 0.85 jun16.none
1162 0.81 1307 0.98 aug15.none
1288 0.90 1339 1.01 oct16.none
-
1311 0.91 1417 1.07 feb10.zstd
1399 0.97 1450 1.09 apr14.zstd
1117 0.78 1088 0.82 jun16.zstd
1139 0.79 1391 1.05 aug15.zstd
1310 0.91 1378 1.04 oct16.zstd

insert

QPS in the Oct16 build relative to Feb10 is worse for all cases.

i3 NUC i5 NUC
QPS ratio QPS ratio engine
8056 1.00 8654 1.00 feb10.none
8233 1.02 9403 1.09 apr14.none
7867 0.98 8652 1.00 jun16.none
7930 0.98 8864 1.02 aug15.none
7398 0.92 8236 0.95 oct16.none
-
7922 0.98 8540 0.99 feb10.zstd
8386 1.04 8981 1.04 apr14.zstd
7828 0.97 8299 0.96 jun16.zstd
7637 0.95 8538 0.99 aug15.zstd
6194 0.77 8075 0.93 oct16.zstd

1 comment:

Mark CallaghanNovember 21, 2017 at 1:37 PM
I am not sure. I ran the insert benchmark on work HW and the i3, i5 NUCs for MongoDB 3.0, 3.2 and 3.4. I have plenty of data to share. After a few more weeks of publishing MySQL results (tpcc, linkbench still coming) I will need a break.

But the summary is that MySQL does much better than MongoDB on the insert benchmark -- both in-memory and IO-bound. But I have to explain why, so maybe I am to blame.

Small Datum

Tuesday, November 21, 2017

Sysbench, IO-bound, small server: MyRocks over time

1 comment:

Is it time for TPC-BLOB?