Friday, January 12, 2018

Meltdown vs MySQL part 1: in-memory sysbench and a core i3 NUC

This is my first performance report for the Meltdown patch using in-memory sysbench and a small server.
  • the worst case overhead was ~5.5%
  • a typical overhead was ~2%
  • QPS was similar between the kernel with the Meltdown fix disabled and the old kernel
  • the overhead with too much concurrency (8 clients) wasn't worse than than the overhead without too much concurrency (1 or 2 clients)

Configuration

My usage of sysbench is described here. The servers are described here. For this test I used the core i3 NUC (NUC5i3ryh) with Ubuntu 16.04. I have 3 such servers and ran tests with the fix enabled (kernel 4.4.0-109), the fix disabled via pti=off (kernel 4.4.0-109) and the old kernel (4.4.0-38) that doesn't have the fix. From cat /proc/cpuinfo I see pcid.

The servers have 2 cores and 4 HW threads. I normally use them for low-concurrency benchmarks with 1 or 2 concurrent database clients. For this test I used 1, 2 and 8 concurrent clients to determine whether more concurrency and more mutex contention would cause more of a performance loss.

The sysbench test was configured to use 1 table with 4M rows and InnoDB. The InnoDB buffer pool was large enough to cache the table. The sysbench client runs on the same host as mysqld.

Results

My usage of sysbench is described here which explains the tests that I list below. Each test has QPS for 1, 2 and 8 concurrent clients. Results are provided for
  • pti enabled - kernel 4.4.0-109 with the Meltdown fix enabled
  • pti disabled - kernel 4.4.0-109 with the Meltdown fix disabled via pti=off
  • old kernel, no pti - kernel 4.4.0-38 which doesn't have the Meltdown fix
After each of the QPS sections, there are two lines for QPS ratios. The first line compares the QPS for the kernel with the Meltdown fix enabled vs disabled. The second line compares the QPS for the kernel with the Meltdown fix vs the old kernel. A value less than one means that MySQL gets less QPS with the Meltdown fix.

update-inlist
1       2       8       concurrency
2039    2238    2388    pti enabled
2049    2449    2369    pti disabled
2059    2199    2397    old kernel, no pti
-----   -----   -----
0.995   0.913   1.008   qps ratio: pti on/off
0.990   1.017   0.996   qps ratio: pti on / old kernel

update-one
1       2       8       concurrency
8086    11407   9498    pti enabled
8234    11683   9748    pti disabled
8215    11708   9755    old kernel, no pti
-----   -----   -----
0.982   0.976   0.974   qps ratio: pti on/off
0.984   0.974   0.973   qps ratio: pti on / old kernel

update-index
1       2       8       concurrency
2944    4528    7330    pti enabled
3022    4664    7504    pti disabled
3020    4784    7555    old kernel, no pti
-----   -----   -----
0.974   0.970   0.976   qps ratio: pti on/off
0.974   0.946   0.970   qps ratio: pti on / old kernel

update-nonindex
1       2       8       concurrency
6310    8688    12600   pti enabled
6103    8482    11900   pti disabled
6374    8723    12142   old kernel, no pti
-----   -----   -----
1.033   1.024   1.058   qps ratio: pti on/off
0.989   0.995   1.037   qps ratio: pti on / old kernel

delete
1       2       8       concurrency
12348   17087   23670   pti enabled
12568   17342   24448   pti disabled
12665   17749   24499   old kernel, no pti
-----   -----   -----
0.982   0.985   0.968   qps ratio: pti on/off
0.974   0.962   0.966   qps ratio: pti on / old kernel

read-write range=100
1       2       8       concurrency
 9999   14973   21618   pti enabled
10177   15239   22088   pti disabled
10209   15249   22153   old kernel, no pti
-----   -----   -----
0.982   0.982   0.978   qps ratio: pti on/off
0.979   0.981   0.975   qps ratio: pti on / old kernel

read-write range=10000
1       2       8       concurrency
430     762     865     pti enabled
438     777     881     pti disabled
439     777     882     old kernel, no pti
-----   -----   -----
0.981   0.980   0.981   qps ratio: pti on/off
0.979   0.980   0.980   qps ratio: pti on / old kernel

read-only range=100
1       2       8       concurrency
10472   19016   26631   pti enabled
10588   20124   27587   pti disabled
11290   20153   27796   old kernel, no pti
-----   -----   -----
0.989   0.944   0.965   qps ratio: pti on/off
0.927   0.943   0.958   qps ratio: pti on / old kernel

read-only.pre range=10000
1       2       8       concurrency
346     622     704     pti enabled
359     640     714     pti disabled
356     631     715     old kernel, no pti
-----   -----   -----
0.963   0.971   0.985   qps ratio: pti on/off
0.971   0.985   0.984   qps ratio: pti on / old kernel

read-only range=10000
1       2       8       concurrency
347     621     703     pti enabled
354     633     716     pti disabled
354     638     716     old kernel, no pti
-----   -----   -----
0.980   0.981   0.988   qps ratio: pti on/off
0.980   0.973   0.981   qps ratio: pti on / old kernel

point-query.pre
1       2       8       concurrency
16104   29540   46863   pti enabled
16716   30052   49404   pti disabled
16605   30392   49872   old kernel, no pti
-----   -----   -----
0.963   0.982   0.948   qps ratio: pti on/off
0.969   0.971   0.939   qps ratio: pti on / old kernel

point-query
1       2       8       concurrency
16240   29359   47141   pti enabled
16640   29785   49015   pti disabled
16369   30226   49530   old kernel, no pti
-----   -----   -----
0.975   0.985   0.961   qps ratio: pti on/off
0.992   0.971   0.951   qps ratio: pti on / old kernel

random-points.pre
1       2       8       concurrency
2756    5202    6211    pti enabled
2764    5216    6245    pti disabled
2679    5130    6188    old kernel, no pti
-----   -----   -----
0.997   0.997   0.994   qps ratio: pti on/off
1.028   1.014   1.003   qps ratio: pti on / old kernel

random-points
1       2       8       concurrency
2763    5177    6191    pti enabled
2768    5188    6238    pti disabled
2701    5076    6182    old kernel, no pti
-----   -----   -----
0.998   0.997   0.992   qps ratio: pti on/off
1.022   1.019   1.001   qps ratio: pti on / old kernel

hot-points
1       2       8       concurrency
3414    6533    7285    pti enabled
3466    6623    7287    pti disabled
3288    6312    6998    old kernel, no pti
-----   -----   -----
0.984   0.986   0.999   qps ratio: pti on/off
1.038   1.035   1.041   qps ratio: pti on / old kernel

insert
1       2       8       concurrency
7612    10051   11943   pti enabled
7713    10150   12322   pti disabled
7834    10243   12514   old kernel, no pti
-----   -----   -----
0.986   0.990   0.969   qps ratio: pti on/off
0.971   0.981   0.954   qps ratio: pti on / old kernel

4 comments:

  1. Thank you Mark very much for all these benchmarks.

    "From cat /proc/cpuinfo I see pcid." - does it really mean, that the kernel uses PCID? I supposed that it just means that the CPU supports PCID and then it's up to the kernel if it employs it or not.

    ReplyDelete
    Replies
    1. Asking my local experts now. All I have so far is https://groups.google.com/forum/#!topic/mechanical-sympathy/L9mHTbeQLNU

      Delete
    2. It seems that our system (Centos7/RHEL7) includes the backported patch with PCID too. Despite the fact that kernel number is much older (3.10 ...). Thank you for that link, was very useful for me.

      Delete
  2. Interesting post about benchmarks, thank you for all these benchmarks.

    ReplyDelete

RocksDB on a big server: LRU vs hyperclock

This has benchmark results for RocksDB using a big (48-core) server. I ran tests to document the impact of the the block cache type (LRU vs ...