Thursday, November 16, 2017

A new optimization in gcc 5.x and MySQL

We were testing the impact of changing from gcc 4.9 to 5.x and from older to newer jemalloc. For one of the in-memory sysbench tests the QPS at high-concurrency dropped by 10%. The test was read-only with range-size set to 1000. If the test was limited to the order-ranges query then the QPS dropped by ~25%. This wasn't good.

I repeated the test with newer jemalloc and gcc 4.9 and there was no loss of QPS. So now it looked like a problem with gcc 5.x and not jemalloc. I then did a build with tcmalloc, but at startup mysqld would get an illegal free error from tcmalloc. After finding what I think is a RocksDB bug and then another problem, I found a workaround and soon discovered that for tcmalloc and glibc malloc there was also a decrease in QPS for gcc 5.x but not 4.9. Now I was worried that I might lose the debugging expertise of the internal jemalloc team at work, but fortunately they found the problem while collaborating with the MyRocks team.

AFAIK this isn't a MyRocks-only problem because it comes from the allocation done to sort for the order by clause. But I am tired of running tests and won't test it for InnoDB. Good news for the rest of the world. This is an issue for MySQL 5.6 but probably not for 5.7 and 8.x.

The problem is a gcc5 optimization (see gcc issues 67618 and 83022) that transforms the call sequence malloc, memset into a call to calloc. This appears to be done even for 2mb allocations (my.cnf had sort_buffer_size=2m). The output from perf wasn't clear about the problem this creates. For jemalloc it reported a new CPU overhead from smp_call_function_interrupt calling flush_tlb_func and all of that is kernel code. I was told that was from jemalloc zero-ing pages. A workaround that doesn't require a code change is to compile with -fno-builtin-malloc. There are workarounds that require code changes that I won't list here.

Here are performance results from mysqld compiled with gcc 4.9 vs 5.x and linked with different allocators (jemalloc, tcmalloc, glibc malloc). The test is in-memory sysbench read-only with range-size=1000, 8 tables and 1M rows/table. The test uses 48 concurrent connections and the server has 48 HW threads (24 cores, HT enabled).

QPS     configuration
78622   gcc4.9, jemalloc
76787   gcc4.9, tcmalloc
73340   gcc4.9, glibc malloc
-
65673   gcc5.x, jemalloc
58958   gcc5.x, tcmalloc
48750   gcc5.x, glibc malloc
-
78028   gcc5.x, jemalloc, -fno-builtin-malloc
78135   gcc5.x, tcmalloc, -fno-builtin-malloc
78207   gcc5.x, glibc malloc, -fno-builtin-malloc


Tuesday, November 14, 2017

Sysbench, in-memory, small server: MyRocks over time

In this post I compare four MyRocks releases from February to October using in-memory sysbench and a small server. The goal is understand where we have made MyRocks faster and slower this year.

tl;dr
  • For many tests there is no decrease in QPS from February to October
  • For some tests the QPS decreased by 3% to 8%
  • The largest regression is for read-heavy tests that run after write-heavy tests. Querying an LSM tree after many updates usually costs more in CPU and/or IO compared to querying it before the updates. But the CPU overhead might have increased since February.

Configuration

The tests used MyRocks from FB MySQL which is currently based on 5.6.35. Builds were done using HEAD from February 10, April 14, June 16, August 15 and October 16. The git hashes for these builds are:

  • February 10 - FB MySQL f3019b, RocksDB c2ca7a
  • April 14 - FB MySQL e28823, RocksDB 9300ef
  • June 16 - FB MySQL 52e058, RocksDB 7e5fac
  • August 15 - FB MySQL 0d76ae, RocksDB 50a969
  • October 16 - FB MySQL 1d0132, RocksDB 019aa7
All tests used jemalloc with mysqld. The i3 and i5 NUC servers are described here. My use of sysbench is described here. The my.cnf files are here for the i3 NUC and i5 NUC. I tried to tune my.cnf for all engines but there are a few new & changed options in that time. For all tests the binlog was enabled but fsync was disabled for the binlog and database redo log. Compression was not used.

Sysbench is run with 2 tables and 2M rows per table. Each test is repeated for 1 and 2 clients. Each test runs for 600 seconds except for the insert-only test which runs for 300 seconds. The database fits in RAM.

I repeat tests on an i5 NUC and i3 NUC. The i5 NUC has more RAM, a faster SSD and faster CPU than the i3 NUC, but I disabled turbo boost on the i5 NUC many months ago to reduce variance in performance and with that the difference in CPU performance between these servers is smaller.

Results

All of the data for the tests is on github for the i3 NUC and the i5 NUC. Results for each test are listed separately below. The results below have the QPS for the test with 1 client relative to the QPS for the February 10 build. The tests are explained here.

Graphs

No graphs this time. The results aren't that interesting.

update-inlist

Here and the sections that follow have the QPS and relative QPS. The relative QPS is the QPS for the test with 1 client relative to the QPS for the February 10 build. Values are provided for the i3 and i5 NUC.

There is no regression.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
1713    1.00    2002    1.00    feb10
1826    1.07    2133    1.07    apr14
1605    0.94    1987    0.99    jun16
1698    0.99    2017    1.01    aug15
1761    1.03    2087    1.04    oct16

update-one

QPS decreased by ~8%.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
8375    1.00    9295    1.00    feb10
8036    0.96    9151    0.98    apr14
7774    0.93    8602    0.93    jun16
7705    0.92    8555    0.92    aug15
7732    0.92    8620    0.93    oct16

update-index

There is no regression on the i3 NUC and QPS has decreased by 7% on the i5 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
5981    1.00    6861    1.00    feb10
5799    0.97    6722    0.98    apr14
5678    0.98    6300    0.92    jun16
5809    1.00    6306    0.92    aug15
6022    1.04    6392    0.93    oct16

update-nonindex

QPS decreased by 3% on the i3 NUC and 6% on the i5 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
6521    1.00    7184    1.00    feb10
6346    0.97    7127    0.99    apr14
5913    0.91    6516    0.91    jun16
6066    0.93    6565    0.91    aug15
6310    0.97    6724    0.94    oct16

delete

QPS decreased by 7% on the i3 NUC and 5% on the i5 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
15301   1.00    16361   1.00    feb10
14714   0.96    16552   1.01    apr14
13973   0.91    15736   0.96    jun16
14216   0.93    15447   0.94    aug15
14233   0.93    15515   0.95    oct16

read-write with range-size=100

QPS decreased by 7%.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
8101    1.00    8583    1.00    feb10
7700    0.95    8162    0.95    apr14
7333    0.91    7812    0.91    jun16
7366    0.91    7747    0.90    aug15
7532    0.93    7967    0.93    oct16

read-write with range-size=10000

QPS decreased by 5%.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
262     1.00    308     1.00    feb10
253     0.97    300     0.97    apr14
241     0.92    290     0.94    jun16
246     0.94    285     0.93    aug15
249     0.95    294     0.95    oct16

read-only with range-size=100

QPS decreased by 8% on the i3 NUC and 21% on the i5 NUC. I suspect that the 21% regression on i5 NUC is an outlier and unlikely to repeat, but I will find out when I test a new build. I think it is an outlier because there is variance with MyRocks for read-heavy tests that follow write-heavy tests. The state of the LSM tree (number of entries in the memtable, number of files in L0) is not deterministic across test runs and that impacts read performance. This is made worse because the LSM tree state can remain in that state for the duration of the read-only test. I prefer to run read-write tests when evaluating MyRocks to avoid this variance.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
8610    1.00    9272    1.00    feb10
7818    0.91    8223    0.89    apr14
7506    0.87    8463    0.91    jun16
7604    0.88    7660    0.93    aug15
7920    0.92    7321    0.79    oct16

read-only.pre with range-size=10000

QPS decreased by 4% on the i3 NUC and 3% on the i5 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
215     1.00    261     1.00    feb10
213     0.99    256     0.98    apr14
203     0.94    261     1.00    jun16
208     0.97    256     0.98    aug15
207     0.96    254     0.97    oct16

read-only with range-size=100000

QPS decreased by 6% on the i3 NUC and 8% on the i5 NUC. The decrease here is larger than for the previous test. The difference is that this test is run after write-heavy tests while the previous test is run before them. It costs more to search the LSM structures after random updates (compare the QPS here with the QPS in the previous test), and that cost may have increased. I have written more about mistakes to avoid when doing a benchmark with an LSM and if you only do read-only tests before fragmenting the LSM tree you might be an optimist.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
214     1.00    257     1.00    feb10
203     0.95    242     0.94    apr14
194     0.91    240     0.93    jun16
197     0.92    230     0.89    aug15
201     0.94    237     0.92    oct16

point-query.pre

QPS decreased by 3% on the i3 NUC and 1% on the i5 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
15775   1.00    16190   1.00    feb10
15314   0.97    16035   0.99    apr14
14504   0.92    15218   0.94    jun16
14627   0.93    15462   0.96    aug15
15277   0.97    16022   0.99    oct16

point-query

QPS decreased by 7%. The decrease here is larger than for the previous test. See the comment two sections about about read-only tests that follow write-heavy tests.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
15326   1.00    15900   1.00    feb10
14126   0.92    14556   0.92    apr14
13612   0.89    15030   0.95    jun16
13557   0.88    13721   0.86    aug15
14328   0.93    14801   0.93    oct16

random-points.pre

QPS decreased by 3% on the i3 NUC and 2% on the i5 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
1450    1.00    1527    1.00    feb10
1459    1.01    1499    0.98    apr14
1301    0.90    1360    0.89    jun16
1374    0.95    1394    0.91    aug15
1401    0.97    1502    0.98    oct16

random-points

QPS decreased by 8% on the i3 NUC and 12% on the i5 NUC. See the comment two sections about about read-only tests that follow write-heavy tests.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
1063    1.00    1151    1.00    feb10
 928    0.87     940    0.82    apr14
 952    0.90     947    0.82    jun16
 962    0.90     847    0.74    aug15
 973    0.92    1008    0.88    oct16

hot-points

QPS decreased by 15%. This is like random-points except it fetches the same values for every query. It is run after the write-heavy tests. The regression is similar to random-points.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
1565    1.00    1809    1.00    feb10
1384    0.88    1531    0.85    apr14
1239    0.79    1422    0.79    jun16
1334    0.85    1341    0.74    aug15
1329    0.85    1535    0.85    oct16

insert

QPS decreased by 8% on the i3 NUC and 7% on the i5 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
8337    1.00    9102    1.00    feb10
8377    1.00    9086    1.00    apr14
7871    0.94    8723    0.96    jun16
8074    0.97    8785    0.97    aug15
7650    0.92    8446    0.93    oct16

Monday, November 13, 2017

Sysbench, in-memory, small server: InnoDB & MyRocks

In this post I compare MyRocks and InnoDB using in-memory sysbench and a small server. Previous posts explained performance for MySQL from 5.0 to 8.0 for MyISAM and InnoDB. In this post I limit the results to MySQL 5.6.35 for MyRocks and both 5.6.35 and 5.7.17 for InnoDB.

tl;dr - the QPS for MyRocks relative to InnoDB in MySQL 5.6 where 1.0 means they are equal
  • For write heavy tests MyRocks gets ~0.80 vs InnoDB except on the update-index test where it benefits from read-free secondary index maintenance
  • For scan heavy tests MyRocks gets between 0.70 and 0.94 vs InnoDB
  • For point query tests MyRocks gets between 0.71 and 0.80 vs InnoDB
  • For in-list query tests MyRocks gets between 0.26 and 0.45 vs InnoDB
Not really tl;dr
  • Range scans are much faster for InnoDB in 5.7 than in 5.6
  • QPS for InnoDB in MySQL 5.7 is frequently worse than in 5.6 because there is more CPU overhead per query. I assume this is the cost from more features. Most of the overhead appears to be in code above the storage engine (optimizer, parser, etc). This test uses low concurrency and doesn't highlight great improvements at high concurrency in InnoDB 5.7. The random-points section below has more details.
  • MyRocks read performance suffers more than InnoDB when the database is subject to a write-heavy workload prior to the read-heavy tests. It costs more CPU time to search the LSM structures after random changes. This is more true for in-memory workloads than for IO-bound. Note that many bulk-loads done by benchmarks don't show this cost because they insert data in key order. Benchmarks that do bulk-load and then query can be misleading, especially if the bulk-load is in key order.

From the ratios above you might ask why I like MyRocks. Some of the performance differences are things we need to fix and work has been in progress, so better results are coming. But I like it because it is more efficient than InnoDB for two reasons. First, it uses less space than InnoDB (less space-amplification) so you need less SSD to store the same data. Second, it writes less to storage per transaction so SSD devices last longer. InnoDB, WiredTiger and mmapv1 are all increasing the rate at which I must replace SSDs on my test servers. Finally, this workload is in-memory and MyRocks is usually better for IO-bound than for in-memory. 

Configuration

The tests used InnoDB from upstream MySQL 5.6.35 and 5.7.17 and then MyRocks from FB MySQL 5.6.35. The MyRocks build is from August 15 with git hash 0d76ae. All tests used jemalloc with mysqld. The i3 and i5 NUC servers are described here. My use of sysbench is described here. The my.cnf files are here for the i3 NUC and i5 NUC. I tried to tune my.cnf for all engines including: disabled SSL for MySQL 5.7, used the same charset and collation, set innodb_purge_threads=1 to reduce mutex contention. For all tests the binlog was enabled but fsync was disabled for the binlog and database redo log. I compiled all of the MySQL versions on the test servers and did not use binaries from upstream. Compression was not used for InnoDB or MyRocks.

Sysbench is run with 2 tables and 2M rows per table. Each test is repeated for 1 and 2 clients. Each test runs for 600 seconds except for the insert-only test which runs for 300 seconds. The database fits in RAM.

I repeat tests on an i5 NUC and i3 NUC. The i5 NUC has more RAM, a faster SSD and faster CPU than the i3 NUC, but I disabled turbo boost on the i5 NUC many months ago to reduce variance in performance and with that the difference in CPU performance between these servers is much smaller.

Results

All of the data for the tests is on github for the i3 NUC and the i5 NUC. Results for each test are listed separately below. For all tests but scan the result has the QPS for the test with 1 client relative to the QPS for InnoDB in MySQL 5.6.35. The tests are explained here.

Graphs

There are 4 types of tests and I provided a graph for each type: write-heavy, scan-heavy, point-query, inlist-query. Fortunately the results within each group are similar and one graph per group is sufficient. The tests are explained here.

The write-heavy group includes update-inlist, update-one, update-index, update-nonindex, delete and insert. For write-heavy InnoDB does better than MyRocks on all of the tests except update-index where MyRocks benefits from read-free secondary index maintenance. The workload is low-concurrency so InnoDB in 5.7 suffers from more CPU overhead without getting a return from its high-concurrency improvements. The write-heavy tests do a read-modify-write per query, InnoDB is more efficient for the reads so it gets better QPS than MyRocks. There are two numbers for each test - the first is for the i5 NUC, the second is for the i3 NUC and each is the QPS for MyRocks relative to InnoDB 5.6.
  • update-inlist - 0.60 & 0.77
  • update-one - 0.78 & 0.80
  • update-index - 1.15 & 1.85
  • update-nonindex - 0.65 & 0.77
  • delete - 0.78 & 0.97
  • insert - 0.78 & 0.86
For scan-heavy InnoDB does better on all of the tests and given the range scan improvements to InnoDB in MySQL 5.7 the comparisons will get harder. MyRocks suffers from more CPU overhead in the merge iterator. There are two numbers for each test - the first is for the i5 NUC, the second is for the i3 NUC and each is the QPS for MyRocks relative to InnoDB 5.6. The graph is for read-write with range-size=100.
  • read-write with range-size=100 - 0.67 & 0.75
  • read-write with range-size=10000 - 0.85 & 0.87
  • read-only with range-size=100 - 0.69 & 0.79
  • read-only.pre with range-size=10000 - 0.94 & 0.92
  • read-only with range-size=10000 - 0.85 & 0.89
The point-query group includes the point-query test run before and then after the write-heavy tests. InnoDB does better on both of the tests because MyRocks uses more CPU per query. There are two numbers for each test - the first is for the i5 NUC, the second is for the i3 NUC and each is the QPS for MyRocks relative to InnoDB 5.6. The graph is for the test run after the write-heavy tests.
  • point-query run before - 0.75 & 0.80
  • point-query run after - 0.71 & 0.73
The inlist-query group includes the hot-points test and the random-points tests run before and after the write-heavy tests. InnoDB does better on all tests because MyRocks uses more QPS per query. There are two numbers for each test - the first is for the i5 NUC, the second is for the i3 NUC and each is the QPS for MyRocks relative to InnoDB 5.6. The graph is for the test run after the write-heavy tests.
  • hot-points - 0.32 & 0.36
  • random-points run before - 0.44 & 0.45
  • random-points run after - 0.26 & 0.32
update-inlist

Here and the sections that follow have the QPS and relative QPS. The relative QPS is the QPS for the engine relative to the QPS for InnoDB in MySQL 5.6.35 from the result with one client. Values are provided for the i3 and i5 NUC.

InnoDB loses ~5% of QPS from 5.6 to 5.7. InnoDB does better than MyRocks.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
2201    1.00    3349    1.00    InnoDB-5.6
2050    0.93    3192    0.95    InnoDB-5.7.17
1698    0.77    2017    0.60    MyRocks-5.6

update-one

InnoDB loses ~12% of QPS from 5.6 to 5.7. InnoDB does better than MyRocks.

i3 NUC          i5 NUC

QPS     ratio   QPS     ratio   engine
9650    1.00    10969   1.00    InnoDB-5.6
8097    0.84     9648   0.88    InnoDB-5.7.17
7705    0.80     8555   0.78    MyRocks-5.6

update-index

MyRocks does much better than InnoDB because secondary index maintenance is read free.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
3134    1.00     5489   1.00    InnoDB-5.6
2983    0.95     5861   1.07    InnoDB-5.7.17
5809    1.85     6306   1.15    MyRocks-5.6

update-nonindex

See comments in the update-index section above. MyRocks QPS here is similar to the QPS for update-index, but InnoDB QPS is ~2X better here than for update-index.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
7860    1.00    10148   1.00    InnoDB-5.6
6337    0.81     9135   0.90    InnoDB-5.7.17
6066    0.77     6565   0.65    MyRocks-5.6

delete

Not much to add here.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
14590   1.00    19900   1.00    InnoDB-5.6
12758   0.87    17621   0.89    InnoDB-5.7.17
14216   0.97    15447   0.78    MyRocks-5.6

read-write with range-size=100

Range scans are more efficient in InnoDB 5.7 than in 5.6. MyRocks gets less QPS because it uses more CPU per query

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
 9844   1.00    11591   1.00    InnoDB-5.6
10152   1.03    12290   1.06    InnoDB-5.7.17
 7366   0.75     7747   0.67    MyRocks-5.6

read-write with range-size=10000

Range scans are more efficient in InnoDB 5.7 than in 5.6 and the benefit is larger here than the previous test because the range scans are larger (10,000 rows vs 100). MyRocks gets less QPS because it uses more CPU per query

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
283     1.00    335     1.00    InnoDB-5.6
442     1.56    500     1.49    InnoDB-5.7.17
246     0.87    285     0.85    MyRocks-5.6

read-only with range-size=100

Range scans are more efficient in InnoDB 5.7 than in 5.6. MyRocks gets less QPS because it uses more CPU per query.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
 9644   1.00    11120   1.00    InnoDB-5.6
11369   1.18    12247   1.10    InnoDB-5.7.17
 7604   0.79     7660   0.69    MyRocks-5.6

read-only.pre with range-size=10000

Range scans are more efficient in InnoDB 5.7 than in 5.6 and the benefit is larger here than the previous test because the range scans are larger (10,000 rows vs 100). MyRocks gets less QPS because it uses more CPU per query.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
226     1.00    272     1.00    InnoDB-5.6
358     1.58    394     1.45    InnoDB-5.7.17
208     0.92    256     0.94    MyRocks-5.6

read-only with range-size=100000

See the comment in the previous section. This test was run after the write-heavy tests. The test for the previous section was run before the write-heavy tests. QPS for InnoDB here and above is similar, but QPS for MyRocks here is 5% to 10% less because it costs more to search the LSM structures after random updates. I have written more about mistakes to avoid when doing a benchmark with an LSM and if you only do read-only tests before fragmenting the LSM tree you might be an optimist.

One thing I didn't notice until now is that QPS for InnoDB on read-write is larger than for read-only for tests with range-size=10000 but not for range-size=100. I don't know why yet.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
221     1.00    272     1.00    InnoDB-5.6
359     1.62    393     1.44    InnoDB-5.7.17
197     0.89    230     0.85    MyRocks-5.6

point-query.pre

InnoDB uses more CPU/query in 5.7 than in 5.6 so QPS is smaller. MyRocks has the same problem.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
18329   1.00    20505   1.00    InnoDB-5.6
16592   0.91    18126   0.88    InnoDB-5.7.17
14627   0.80    15462   0.75    MyRocks-5.6

point-query

See the comment in the previous section. This test was run after the write-heavy tests. The test for the previous section was run before the write-heavy tests. QPS for InnoDB here and above is similar, but QPS for MyRocks here is 7% to 11% less because it costs more to search the LSM structures after random updates.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
18554   1.00    19257   1.00    InnoDB-5.6
16883   0.91    17723   0.92    InnoDB-5.7.17
13557   0.73    13721   0.71    MyRocks-5.6

random-points.pre

The QPS decrease for InnoDB from 5.6 to 5.7 is much smaller here than for point-query.pre above. I assume that most of the QPS decrease from 5.6 to 5.7 is from code above InnoDB (parse, optimize, etc) and the query here fetches 100 rows by PK using an in-list. So the per-query overhead here is amortized over 100 rows versus 1 row above.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
3040    1.00    3203    1.00    InnoDB-5.6
2931    0.96    3121    0.97    InnoDB-5.7.17
1374    0.45    1394    0.44    MyRocks-5.6

random-points

See the comment in the previous section. This test was run after the write-heavy tests. The test for the previous section was run before the write-heavy tests. QPS for InnoDB here and above is similar, but QPS for MyRocks here is 30% to 40% less because it costs more to search the LSM structures after random updates. The cost here is greater than the cost above for read-only and point-query because this test spends a larger fraction of the per-query time in the storage engine, so it is more sensitive to storage engine overheads. This is the same reason that the 5.6 to 5.7 regression is smaller here for InnoDB than for the tests that preceded random-points.pre.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
3036    1.00    3223    1.00    InnoDB-5.6
2947    0.97    3123    0.97    InnoDB-5.7.17
 962    0.32     847    0.26    MyRocks-5.6

hot-points

This is like random-points except it fetches the same values. It is run after the write-heavy tests. It is always in-memory. InnoDB can benefit from the adaptive hash index. But I am curious why the QPS here for MyRocks is closer to the random-points test run before the write-heavy tests.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
3666    1.00    4242    1.00    InnoDB-5.6
3458    0.94    3898    0.92    InnoDB-5.7.17
1334    0.36    1341    0.32    MyRocks-5.6

insert

InnoDB loses 12% to 15% of the insert rate in MySQL 5.7 because the per-insert CPU overhead is larger. MyRocks suffers the same problem.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
9340    1.00    11244   1.00    InnoDB-5.6
7853    0.84     9892   0.88    InnoDB-5.7.17
8074    0.86     8785   0.78    MyRocks-5.6


Friday, November 10, 2017

Sysbench: in-memory, InnoDB and a small server, MySQL 5.0 to 8.0

This has results for in-memory sysbench with InnoDB and MySQL versions 5.0, 5.1, 5.5, 5.6, 5.7 and 8.0. The previous post covered MyISAM. There are four types of tests: write-heavy, scan-heavy, point-query and inlist-query. For MyISAM the results for the tests within each group were similar. That is less true for InnoDB.

tl;dr - because the other tl;dr are too long
  • InnoDB in 5.5 frequently has lousy performance for write-heavy tests. I didn't debug it.
  • InnoDB 5.7 and 8.0 have great scan performance, are much better for high-concurrency workloads (to be covered in a future post), but suffer from more CPU overhead for low-concurrency workloads.

tl;dr - from 5.0.96 to 8.0.3
  • For update-nonindex QPS decreased 41% on the i5 NUC and 52% on the i3 NUC. Results for update-one and update-inlist were similar. The regression was smaller for update-index and insert. For delete 8.0.3 gets more QPS than 5.0.96.
  • For scan-heavy tests the QPS for MySQL 8.0.3 is between 1.05X and 1.26X better than for 5.0.96 on all tests except read-only with range-size=100 where it is 2% to 5% worse. 
  • For point-query the QPS decreased 36% on the i5 NUC and 46% on the i3 NUC
  • For the inlist-heavy tests the QPS decreased ~22%.
  • The QPS regression is larger for point-query tests than for inlist-heavy. New MySQL releases bring new features which brings new CPU overheads. For point-query there is 1 row fetched per SELECT versus 100 per select for the inlist-heavy tests, so the inlist-heavy amortize that overhead over more rows.

tl;dr - from 5.6.35 to 8.0.3
  • For update-nonindex QPS decreased 23% on the i5 NUC and 30% on the i3 NUC. Results for update-one and update-inlist were similar. The regression was smaller for update-index. Unlike the 5.0 to 8.0 result QPS decreased by ~25% for delete and ~25% for insert.
  • For scan-heavy tests InnoDB 5.7.17 has the best QPS. While there have been regressions since then, scan performance in InnoDB 8.0.3 is still better than in 5.6.35 by up to 1.5X.
  • For point-query the QPS decreased ~11% on the i5 NUC and 17% on the i3 NUC
  • For the inlist-heavy tests the QPS decreased ~7%

Configuration

The tests used InnoDB from upstream MySQL versions 5.0.96, 5.1.72, 5.5.51, 5.6.35, 5.7.17 and 8.0.3. All tests used jemalloc with mysqld. The i3 and i5 NUC servers are described here. My use of sysbench is described here. The my.cnf files are here for the i3 NUC and i5 NUC. I tried to tune my.cnf for all engines including: disabled SSL for MySQL 5.7, used the same charset and collation. For all tests the binlog was enabled but fsync was disabled for the binlog and database redo log. I compiled all of the MySQL versions on the test servers and did not use binaries from upstream. The built-in InnoDB is used for MySQL 5.0 and 5.1.

Sysbench is run with 2 tables and 2M rows per table. Each test is repeated for 1 and 2 clients. Each test runs for 600 seconds except for the insert-only test which runs for 300 seconds. The database fits in RAM as the i3 NUC has 8gb of RAM and the i5 NUC has 16gb.

I repeat tests on an i5 NUC and i3 NUC. The i5 NUC has more RAM, a faster SSD and faster CPU than the i3 NUC, but I disabled turbo boost on the i5 NUC many months ago to reduce variance in performance and with that the difference in CPU performance between these servers is much smaller. Assuming I will always disable turbo boost in the future I might as well stick with an i3 NUC for my next purchase.

Results


All of the data for the tests is on github for the i3 NUC and the i5 NUC. Results for each test are listed separately below. For all tests but scan the result has the QPS for the test with 1 client relative to the QPS for InnoDB in MySQL 5.0.96. The tests are explained here.

Graphs

There are 4 types of tests and I provided a graph for each type: write-heavy, scan-heavy, point-query, inlist-query. Fortunately the results within each group are similar and one graph per group is sufficient. The tests are explained here.

The write-heavy group includes update-inlist, update-one, update-index, update-nonindex, delete and insert. The graph is for update-nonindex using the QPS for each MySQL version relative to the QPS for MySQL 5.0.96.
  • From MySQL 5.0.96 to 8.0.3 QPS decreased 41% on the i5 NUC and 52% on the i3 NUC
  • From MySQL 5.6.35 to 8.0.3 QPS decreased 23% on the i5 NUC and 30% on the i3 NUC
  • For the update-index test the QPS decreased ~20% from 5.0.96 to 8.0.3. From 5.6.35 to 8.0.3 it decreased 6% for the i5 NUC and 13% for the i3 NUC.
  • For the delete test 8.0.3 gets more QPS than 5.0.96. But from 5.6.35 to 8.0.3 the QPS decreased by ~25%.
  • For the insert test the QPS decreased 9% for the i5 NUC and 15% for the i3 NUC from 5.0.96 to 8.0.3. From 5.6.35 to 8.0.3 it decreased 22% for the i5 NUC and 27% for the i3 NUC. 
The scan-heavy group includes read-write with range-size set to 100 and 10,000 and then read-only with range-size set to 100 and 10,000. The graph is for read-write with range-size=100 using the QPS for each MySQL version relative to the QPS for MySQL 5.0.96. 
  • The QPS for MySQL 8.0.3 is between 1.05X and 1.26X better than for 5.0.96 on all tests except read-only with range-size=100 where it is 2% to 5% worse. 
  • InnoDB 5.7.17 has the best range scan performance. While there have been regressions since then, scan performance in InnoDB 8.0.3 is still better than in 5.6.35 and up to 1.5X better.
The point-query group includes the point-query test run before and then after the write-heavy tests. The graph is for the test run after the write-heavy tests using the QPS for each MySQL version relative to the QPS for MySQL 5.0.96. From MySQL 5.0.96 to 8.0.3 the QPS decreased 36% on the i5 NUC and 46% on the i3 NUC. From MySQL 5.6.35 to 8.0.3 the decreased ~11% on the i5 NUC and 17% on the i3 NUC. 
The inlist-query group includes the hot-points test and the random-points tests run before and then after the write-heavy tests. The graph is for the test run after the write-heavy tests using the QPS for each MySQL version relative to the QPS for MySQL 5.0.96. From MySQL 5.0.96 to 8.0.3 the QPS decreased ~22%. From MySQL 5.6.35 to 8.0.3 the QPS decreased ~7%.
update-inlist

Here and the sections that follow have the QPS and QPS ratio for each MySQL release on the i3 and i5 NUC. The QPS ratio is the QPS for the release relative to the QPS for MySQL 5.0.96 using the test with 1 client.

From MySQL 5.0.96 to 8.0.3 QPS decreased by 38% on the i5 NUC and 41% on the i3 NUC. From MySQL 5.6.35 to 8.0.3 QPS decreased by 20% on the i5 NUC and 11% on the i3 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
3306    1.00    4335    1.00    5.0.96
2718    0.82    4051    0.93    5.1.72
 645    0.20    1155    0.27    5.5.51
2201    0.67    3349    0.77    5.6.35
2050    0.62    3192    0.74    5.7.17
2037    0.62    3086    0.71    8.0.1
1960    0.59    2709    0.62    8.0.2
1963    0.59    2693    0.62    8.0.3

update-one

From MySQL 5.0.96 to 8.0.3 QPS decreased by ~50%. From MySQL 5.6.35 to 8.0.3 QPS decreased by 26% on the i5 NUC and 33% on the i3 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
13748   1.00    15689   1.00    5.0.96
12084   0.88    13269   0.85    5.1.72
 9065   0.66    10343   0.66    5.5.51
 9650   0.70    10969   0.70    5.6.35
 8097   0.59     9648   0.61    5.7.17
 7552   0.55     8876   0.57    8.0.1
 6491   0.47     8222   0.52    8.0.2
 6468   0.47     8147   0.52    8.0.3

update-index

From MySQL 5.0.96 to 8.0.3 QPS decreased by 17% on the i5 NUC and 20% on the i3 NUC. From MySQL 5.6.35 to 8.0.3 QPS decreased by 6% on the i5 NUC and 13% on the i3 NUC. The regression here is much smaller than for the other update-heavy tests.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
 3414   1.00    6185    1.00    5.0.96
 3323   0.97    5394    0.87    5.1.72
 3265   0.96    5173    0.84    5.5.51
 3134   0.92    5489    0.89    5.6.35
 2983   0.87    5861    0.95    5.7.17
 2910   0.85    5494    0.89    8.0.1
 2798   0.82    5170    0.84    8.0.2
 2721   0.80    5160    0.83    8.0.3

update-nonindex

From MySQL 5.0.96 to 8.0.3 QPS decreased by ~50%. From MySQL 5.6.35 to 8.0.3 QPS decreased by 24% on the i5 NUC and 31% on the i3 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
11321   1.00    13146   1.00    5.0.96
 9965   0.88    11392   0.87    5.1.72
 4543   0.40     7294   0.55    5.5.51
 7860   0.69    10148   0.77    5.6.35
 6337   0.56     9135   0.69    5.7.17
 6295   0.56     8514   0.65    8.0.1
 5499   0.49     7812   0.59    8.0.2
 5398   0.48     7745   0.59    8.0.3

delete

From MySQL 5.0.96 to 8.0.3 QPS is 1.1X better on the i5 NUC and ~1.5X better on the i3 NUC. But from MySQL 5.6.35 to 8.0.3 QPS decreased by ~25%.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
 7484   1.00    13819   1.00    5.0.96
 7547   1.01    12725   0.92    5.1.72
 9950   1.33    14830   1.07    5.5.51
14590   1.95    19900   1.44    5.6.35
12758   1.70    17621   1.28    5.7.17
12143   1.62    16539   1.20    8.0.1
11104   1.48    15260   1.10    8.0.2
11085   1.48    15202   1.10    8.0.3

read-write with --range-size=100

From MySQL 5.0.96 to 8.0.3 QPS is 1.04X better on the i5 NUC and 1.17X better on the i3 NUC. From MySQL 5.6.35 to 8.0.3 QPS decreased by 1% on the i5 NUC and 2% on the i3 NUC. From the peak in MySQL 5.7.17 to 8.0.3 QPS decreased by 7% on the i5 NUC and 5% on the i3 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
 8208   1.00    11038   1.00    5.0.96
 7875   0.96    10126   0.92    5.1.72
 8904   1.08    11275   1.02    5.5.51
 9844   1.20    11591   1.05    5.6.35
10152   1.24    12290   1.11    5.7.17
 9694   1.18    11698   1.06    8.0.1
 9578   1.17    11570   1.05    8.0.2
 9610   1.17    11481   1.04    8.0.3

read-write with --range-size=10000

From MySQL 5.0.96 to 8.0.3 QPS is ~1.27X better. From MySQL 5.6.35 to 8.0.3 QPS is ~1.5X better. From the peak in MySQL 5.7.17 to 8.0.3 QPS decreased by 3% on the i5 NUC and 5% on the i3 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
330     1.00    387     1.00    5.0.96
316     0.96    376     0.97    5.1.72
293     0.89    363     0.94    5.5.51
283     0.86    335     0.87    5.6.35
442     1.34    500     1.29    5.7.17
434     1.32    489     1.26    8.0.1
423     1.28    491     1.27    8.0.2
421     1.28    486     1.26    8.0.3

read-only with --range-size=100

From MySQL 5.0.96 to 8.0.3 QPS decreased by 5% on the i5 NUC and 2% on the i3 NUC. From MySQL 5.6.35 to 8.0.3 QPS is ~1.07X better. From the peak in MySQL 5.7.17 to 8.0.3 QPS decreased by 3% on the i5 NUC and 8% on the i3 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
10748   1.00    12490   1.00    5.0.96
 9858   0.92    11816   0.95    5.1.72
10100   0.94    11396   0.91    5.5.51
 9644   0.90    11120   0.89    5.6.35
11369   1.06    12247   0.98    5.7.17
10516   0.98    11750   0.94    8.0.1
10398   0.97    11980   0.96    8.0.2
10509   0.98    11874   0.95    8.0.3

read-only.pre with --range-size=10000

From MySQL 5.0.96 to 8.0.3 QPS is 1.22X better on the i5 NUC and 1.3X better on the i3 NUC. From MySQL 5.6.35 to 8.0.3 QPS is ~1.5X better. From the peak in MySQL 5.7.17 to 8.0.3 QPS decreased by 2% on the i5 NUC and 5% on the i3 NUC.

This test is run before the write-heavy tests. The next section has results for the test run after write-heavy tests. They get similar QPS which means that b-tree fragmentation isn't an issue here for an in-memory workload.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
264     1.00    317     1.00    5.0.96
255     0.97    305     0.96    5.1.72
239     0.91    294     0.93    5.5.51
226     0.86    272     0.86    5.6.35
358     1.36    394     1.24    5.7.17
351     1.33    392     1.24    8.0.1
341     1.30    384     1.21    8.0.2
341     1.30    387     1.22    8.0.3

read-only with --range-size=10000

From MySQL 5.0.96 to 8.0.3 QPS is 1.22X better on the i5 NUC and 1.28X better on the i3 NUC. From MySQL 5.6.35 to 8.0.3 QPS is ~1.5X better. From the peak in MySQL 5.7.17 to 8.0.3 QPS decreased by 7% on the i5 NUC and 5% on the i3 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
268     1.00    316     1.00    5.0.96
256     0.96    303     0.96    5.1.72
239     0.89    295     0.93    5.5.51
221     0.82    272     0.86    5.6.35
359     1.34    393     1.24    5.7.17
352     1.31    387     1.22    8.0.1
345     1.29    388     1.23    8.0.2
343     1.28    385     1.22    8.0.3

point-query.pre

From MySQL 5.0.96 to 8.0.3 QPS decreased ~40%. From MySQL 5.6.35 to 8.0.3 QPS decreased ~30%.

This test is run before the write-heavy tests. The next section has results for the test run after write-heavy tests. They get similar QPS which means that b-tree fragmentation isn't an issue here for an in-memory workload.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
28360   1.00    27125   1.00    5.0.96
22444   0.79    23049   0.85    5.1.72
18938   0.67    20004   0.74    5.5.51
18329   0.65    20505   0.76    5.6.35
16592   0.59    18126   0.67    5.7.17
16010   0.56    16402   0.60    8.0.1
15031   0.53    16175   0.60    8.0.2
15208   0.54    16188   0.60    8.0.3

point-query

From MySQL 5.0.96 to 8.0.3 QPS decreased ~40%. From MySQL 5.6.35 to 8.0.3 QPS decreased by 11% on the i5 NUC and 17% on the i3 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
28733   1.00    26883   1.00    5.0.96
21941   0.76    23487   0.83    5.1.72
18541   0.65    20287   0.75    5.5.51
18554   0.65    19257   0.72    5.6.35
16883   0.59    17723   0.66    5.7.17
16163   0.56    16828   0.63    8.0.1
15005   0.52    16626   0.62    8.0.2
15492   0.54    17121   0.64    8.0.3

random-points.pre

From MySQL 5.0.96 to 8.0.3 QPS decreased ~20%. From MySQL 5.6.35 to 8.0.3 QPS decreased by ~6%. The reduction for point-query tests above is larger than the reduction for random-points here. I assume that point-query suffers more from the new CPU overhead in new MySQL releases (more features == more code in network, parse and optimize). The point query test pays that overhead cost per row fetched. For the random-points test here, 100 rows are fetched per SELECT statement.

This test is run before the write-heavy tests. The next section has results for the test run after write-heavy tests. They get similar QPS which means that b-tree fragmentation isn't an issue here for an in-memory workload.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
3665    1.00    3801    1.00    5.0.96
3516    0.96    3707    0.98    5.1.72
3267    0.89    3468    0.91    5.5.51
3040    0.83    3203    0.84    5.6.35
2931    0.80    3121    0.82    5.7.17
2810    0.77    3006    0.79    8.0.1
2817    0.77    3030    0.80    8.0.2
2832    0.77    3023    0.80    8.0.3

random-points

From MySQL 5.0.96 to 8.0.3 QPS decreased ~21%. From MySQL 5.6.35 to 8.0.3 QPS decreased by 7%.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
3693    1.00    3793    1.00    5.0.96
3498    0.95    3684    0.97    5.1.72
3269    0.89    3434    0.91    5.5.51
3036    0.82    3223    0.85    5.6.35
2947    0.80    3123    0.82    5.7.17
2801    0.76    3009    0.79    8.0.1
2839    0.77    3042    0.80    8.0.2
2833    0.77    2998    0.79    8.0.3

hot-points

From MySQL 5.0.96 to 8.0.3 QPS decreased 32%. From MySQL 5.6.35 to 8.0.3 QPS decreased by ~10%.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
4918    1.00    5525    1.00    5.0.96
4388    0.89    5036    0.91    5.1.72
3931    0.80    4595    0.83    5.5.51
3666    0.75    4242    0.77    5.6.35
3458    0.70    3898    0.71    5.7.17
3314    0.67    3753    0.68    8.0.1
3335    0.68    3764    0.68    8.0.2
3338    0.68    3764    0.68    8.0.3

insert

From MySQL 5.0.96 to 8.0.3 QPS decreased 9% on the i5 NUC and 15% on the i3 NUC. From MySQL 5.6.35 to 8.0.3 QPS decreased by 22% on the i5 NUC and 27% on the i3 NUC.

i3 NUC          i5 NUC
QPS     ratio   QPS     ratio   engine
8027    1.00     9681   1.00    5.0.96
7726    0.96     9215   0.95    5.1.72
6932    0.86     8851   0.91    5.5.51
9340    1.16    11244   1.16    5.6.35
7853    0.98     9892   1.02    5.7.17
7413    0.92     9257   0.96    8.0.1
6941    0.86     8567   0.88    8.0.2
6829    0.85     8822   0.91    8.0.3