Friday, January 12, 2018

Meltdown vs MySQL part 2: in-memory sysbench and a core i5 NUC

This is my second performance report for the Meltdown patch using in-memory sysbench and a small server. In this test I used a core i5 NUC with the 4.13 and 4.8 kernels. In the previous test I used a core i3 NUC with the 4.4 kernel.
  • results for 4.13 are mixed -- sometimes there is more QPS with the fix enabled, sometimes there is more with the fix disabled. The typical difference is small, about 2%.
  • QPS for 4.8, which doesn't have the Meltdown fix, are usually better than with 4.13, the largest difference is ~10% and the difference tend to be larger at 1 client than at 2 or 8.

Configuration

My usage of sysbench is described here. The servers are described here. For this test I used the core i5 NUC (NUC7i5bnh) with Ubuntu 16.04. I have 3 such servers and ran tests with the fix enabled (kernel 4.13.0-26), the fix disabled via pti=off (kernel 4.13.0-26) and the old kernel (4.8.0-36) that doesn't have the fix. From cat /proc/cpuinfo I see pcid. This server uses the HWE kernels to make wireless work. I repeated tests after learning that 4.13 doesn't support the nobarrier mount option for XFS. My workaround was to switch to ext4 and the results here are from ext4.

The servers have 2 cores and 4 HW threads. I normally use them for low-concurrency benchmarks with 1 or 2 concurrent database clients. For this test I used 1, 2 and 8 concurrent clients to determine whether more concurrency and more mutex contention would cause more of a performance loss.

The sysbench test was configured to use 1 table with 4M rows and InnoDB. The InnoDB buffer pool was large enough to cache the table. The sysbench client runs on the same host as mysqld.

I just noticed that all servers had the doublewrite buffer and binlog disabled. This was leftover from debugging the XFS nobarrier change.

Results

My usage of sysbench is described here which explains the tests that I list below. Each test has QPS for 1, 2 and 8 concurrent clients. Results are provided for
  • pti enabled - kernel 4.13.0-26 with the Meltdown fix enabled
  • pti disabled - kernel 4.13.0-26 with the Meltdown fix disabled via pti=off
  • old kernel, no pti - kernel 4.8.0-36 which doesn't have the Meltdown fix
After each of the QPS sections, there are two lines for QPS ratios. The first line compares the QPS for the kernel with the Meltdown fix enabled vs disabled. The second line compares the QPS for the kernel with the Meltdown fix vs the old kernel. A value less than one means that MySQL gets less QPS with the Meltdown fix.

update-inlist
1       2       8       concurrency
5603    7546    8212    pti enabled
5618    7483    8076    pti disabled
5847    7613    8149    old kernel, no pti
-----   -----   -----
0.997   1.008   1.016   qps ratio: pti on/off
0.958   0.991   1.007   qps ratio: pti on / old kernel

update-one
1       2       8       concurrency
11764   18880   16699   pti enabled
12074   19475   17132   pti disabled
12931   19573   16559   old kernel, no pti
-----   -----   -----
0.974   0.969   0.974   qps ratio: pti on/off
0.909   0.964   1.008   qps ratio: pti on / old kernel

update-index
1       2       8       concurrency
7202    12688   16738   pti enabled
7197    12581   17466   pti disabled
7443    12926   17720   old kernel, no pti
-----   -----   -----
1.000   1.000   0.958   qps ratio: pti on/off
0.967   0.981   0.944   qps ratio: pti on / old kernel

update-nonindex
1       2       8       concurrency
11103   18062   22964   pti enabled
11414   18208   23076   pti disabled
12395   18529   22168   old kernel, no pti
-----   -----   -----
0.972   0.991   0.995   qps ratio: pti on/off
0.895   0.974   1.035   qps ratio: pti on / old kernel

delete
1       2       8       concurrency
19197   30830   43605   pti enabled
19720   31437   44935   pti disabled
21584   32109   43660   old kernel, no pti
-----   -----   -----
0.973   0.980   0.970   qps ratio: pti on/off
0.889   0.960   0.998   qps ratio: pti on / old kernel

read-write range=100
1       2       8       concurrency
11956   20047   29336   pti enabled
12475   20021   29726   pti disabled
13098   19627   30030   old kernel, no pti
-----   -----   -----
0.958   1.001   0.986   qps ratio: pti on/off
0.912   1.021   0.976   qps ratio: pti on / old kernel

read-write range=10000
1       2       8       concurrency
488     815     1080    pti enabled
480     768     1073    pti disabled
504     848     1083    old kernel, no pti
-----   -----   -----
1.016   1.061   1.006   qps ratio: pti on/off
0.968   0.961   0.997   qps ratio: pti on / old kernel

read-only range=100
1       2       8       concurrency
12089   21529   33487   pti enabled
12170   21595   33604   pti disabled
11948   22479   33876   old kernel, no pti
-----   -----   -----
0.993   0.996   0.996   qps ratio: pti on/off
1.011   0.957   0.988   qps ratio: pti on / old kernel

read-only.pre range=10000
1       2       8       concurrency
392     709     876     pti enabled
397     707     872     pti disabled
403     726     877     old kernel, no pti
-----   -----   -----
0.987   1.002   1.004   qps ratio: pti on/off
0.972   0.976   0.998   qps ratio: pti on / old kernel

read-only range=10000
1       2       8       concurrency
394     701     874     pti enabled
389     698     871     pti disabled
402     725     877     old kernel, no pti
-----   -----   -----
1.012   1.004   1.003   qps ratio: pti on/off
0.980   0.966   0.996   qps ratio: pti on / old kernel

point-query.pre
1       2       8       concurrency
18490   31914   56337   pti enabled
19107   32201   58331   pti disabled
18095   32978   55590   old kernel, no pti
-----   -----   -----
0.967   0.991   0.965   qps ratio: pti on/off
1.021   0.967   1.013   qps ratio: pti on / old kernel

point-query
1       2       8       concurrency
18212   31855   56116   pti enabled
18913   32123   58320   pti disabled
17907   32941   55430   old kernel, no pti
-----   -----   -----
0.962   0.991   0.962   qps ratio: pti on/off
1.017   0.967   1.012   qps ratio: pti on / old kernel

random-points.pre
1       2       8       concurrency
3043    5940    8131    pti enabled
2944    5681    7984    pti disabled
3030    6015    8098    old kernel, no pti
-----   -----   -----
1.033   1.045   1.018   qps ratio: pti on/off
1.004   0.987   1.004   qps ratio: pti on / old kernel

random-points
1       2       8       concurrency
3053    5930    8128    pti enabled
2949    5756    7981    pti disabled
3058    6011    8116    old kernel, no pti
-----   -----   -----
1.035   1.030   1.018   qps ratio: pti on/off
0.998   0.986   1.001   qps ratio: pti on / old kernel

hot-points
1       2       8       concurrency
3931    7522    9500    pti enabled
3894    7535    9214    pti disabled
3914    7692    9448    old kernel, no pti
-----   -----   -----
1.009   0.998   1.031   qps ratio: pti on/off
1.004   0.977   1.005   qps ratio: pti on / old kernel

insert
1       2       8       concurrency
12469   21418   25158   pti enabled
12561   21327   25094   pti disabled
13045   21768   21258   old kernel, no pti
-----   -----   -----
0.992   1.004   1.002   qps ratio: pti on/off
0.955   0.983   1.183   qps ratio: pti on / old kernel

2 comments:

  1. Thanks for sharing.
    I was wondering what the host context switching rate per second ("sar -w 5" (or "sar -w" to view the history if sadc is running) , cs from "vmstat 5") , or the process/thread rates ( "pidstat -wt | sort -nrk6 | head" ) are when the (most contended) tests are running?

    ReplyDelete
    Replies
    1. From the point-query test where InnoDB does ~18k, ~32k, ~56k QPS at 1,2,8 threads the cs rates from vmstat are ~69k, ~121k, ~107k. The number of context switches per query is ~4, ~4, ~2 for 1,2,8 threads.

      Delete

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...