Tuesday, September 19, 2017

Linkbench: IO-bound, large server

This has results for IO-bound Linkbench on a large server. I previously shared results for in-memory & low-concurrency Linkbench.

tl;dr
  • Compression doesn't hurt MyRocks performance
  • MyRocks matches or beats InnoDB performance with much better efficiency

Configuration

I used my Linkbench repo and helper scripts to run sysbench with maxid1=1B, loaders=4 and requestors=16 so there will be 4+1 concurrent connections doing the load (4 for link/count tables, 1 for node table) and 16 connections running transactions after the load finishes. My linkbench repo has a recent commit that changes the Linkbench workload and results with that change are labeled new config while results without the change are labeled old config. I share both. The test pattern is 1) load and 2) transactions. The transactions were run in 12 1-hour loops and I share results from the last hour. The test server has 48 HW threads, fast SSD and 50gb of RAM available to the OS page cache and database.

Tests were run for MyRocks, InnoDB from upstream MySQL and TokuDB. The binlog was enabled but sync on commit was disabled for the binlog and database log. All engines used jemalloc. Mostly accurate my.cnf files are here.
  • MyRocks was compiled on August 15 with git hash 0d76ae. The block cache was set to 10gb. Tests were done without compression and then with no compression for L0/L1/L2, LZ4 for L3 to the next to last level and then Zstandard for the max level.
  • InnoDB was from upstream 5.6.35, 5.7.17 and 8.0.2.  The buffer pool was set to 35gb. The performance schema was enabled. Compression was not used.
  • TokuDB was from Percona Server 5.7.17. The db cache was set to 10gb. Tests were done without compression and then zlib.
Load Results

All of the data is here and note that I did not adjust metrics computed from iostat to account for iostat overstating bytes written by up to 2X for RocksDB. The table below has a subset of the results for the new config.
  • MyRocks has the best insert rate and compression doesn't hurt the rate
  • There is a regression for InnoDB from 5.7 to 8.0.2
  • Write efficiency (wKB/i) is best for TokuDB and similar between MyRocks and InnoDB
  • CPU efficiency (Mcpu/i) is similar for MyRocks and InnoDB
  • Space efficiency (size) is best for MyRocks and TokuDB. Uncompressed MyRocks uses much less space than uncompressed InnoDB

ips     wKB/i   Mcpu/i  size    rss     rMB/s   wMB/s   cpu     engine
142812  2.03     79      948     7.5    0.3     290.3   11.3    myrocks.none
144232  2.55     87      439     5.0    1.9     367.7   12.6    myrocks.zstd
132297  1.04     75     15xx    42.7    0       136.6    9.9    inno5635
138218  1.04     81     15xx    39.4    0.1     144.0   11.2    inno5717
122495  1.05     96     15xx    39.5    0.1     128.8   11.8    inno802
 36737  1.68    246     12xx    11.1    0.1      61.6    9.0    toku5717.none
 37032  1.18    257      442    11.6    0.1      43.8    9.5    toku5717.zlib

legend:
* ips - inserts/second
* wKB/i - iostat KB written per insert
* Mcpu/i - normalized CPU time per insert
* wMB/s, rMB/s - iostat write MB/s & read MB/s
* size - database size in GB at test end
* cpu - average value of vmstat us + sy columns
* rss - mysqld RSS in GB

Transaction Results

These are results from the 12th 1-hour loop of the transaction phase. All of the data is here and note that I did not adjust metrics computed from iostat to account for iostat overstating bytes written by up to 2X for RocksDB. The table below has a subset of the results for the new config. I will explain them. 
  • MyRocks and InnoDB have the best transaction rates
  • Write efficiency (wKB/t) is much better for MyRocks than for InnoDB
  • CPU efficiency (Mcpu/t) is similar for MyRocks and InnoDB
  • Space efficiency (size) is best for MyRocks and TokuDB (see here)
  • MyRocks has the the best response times and compression doesn't hurt it

tps     r/t     rKB/t   wKB/t   Mcpu/t  ul      gl      rMB/s   wMB/s   cpu     engine
34701   1.16     22.41   1.27    659      1      0.9     777.7   44.2   22.9    myrocks.none
34622   1.08     13.12   1.27    758      1      0.9     454.4   44.1   26.2    myrocks.zstd
25775   1.16     18.50  12.05    672      6      4       476.8  310.5   17.3    inno5635
34539   1.15     18.39  11.91    609      3      2       635.0  411.4   21.0    inno5717
33578   1.14     18.23  11.97    641      3      2       612.1  401.9   21.5    inno802
 8982   3.07    188.97   5.64   2685     13     12      1697.3   50.7   24.1    toku5717.none
12448   1.20     17.42   2.84   2644      6      3       216.9   35.4   32.9    toku5717.zlib

legend:
* tps - transactions/second
* r/t - iostat reads per transaction
* rKB/t, wKB/t - iostat KB read & written per transaction
* Mcpu/t - normalized CPU time per transaction
* size - database size in GB at test end
* un, gn, ul, gl - 99th percentile response time in millisecs for UpdateNode,
                   GetNode, UpdateList and GetLinkedList transactions
* rMB/s, wMB/s - iostat read MB/s & write MB/s
* cpu - average CPU utilization from vmstat us & sy

Charts

Charts for load and transaction throughput using the ips and tps columns from the tables above.

Wednesday, September 13, 2017

Making it easier to read Linux perf output

This is a short note to myself the next time I need to remember things I had to do to enhance the readability of Linux perf output.
  • trim trailing spaces -> perf report --stdio | awk 'sub(/ *$/, "")'
  • don't forget the children when using perf record -g -> perf report --no-children


Friday, September 8, 2017

Linkbench - in-memory, low-concurrency

After a few weeks of sysbench it is now time for more complex workloads and the first one is Linkbench with a cached database and low-concurrency. I prefer to start with cached & low-concurrency configurations before trying IO-bound & high-concurrency.

tl;dr:
  • InnoDB from MySQL 5.6 had the best throughput
  • CPU efficiency is similar for MyRocks and InnoDB
  • There is a CPU regression from MySQL 5.6 to 5.7 to 8.x
  • Write efficiency was similar for all engines on the load test but much better for MyRocks and TokuDB on the transaction test.

Configuration

I used my Linkbench repo and helper scripts to run sysbench with maxid1=10M, loaders=1 and requestors=1 so there will be 2 concurrent connections doing the load and 1 connection running transactions after the load finishes. My linkbench repo has a recent commit that changes the Linkbench workload and results with that change are labeled new config while results without the change are labeled old config. I share both. The test pattern is 1) load and 2) transactions. The transactions were run in 12 1-hour loops and I share results from the last hour. The test server has 48 HW threads, fast SSD and 256gb of RAM.

Tests were run for MyRocks, InnoDB from upstream MySQL and TokuDB. The binlog was enabled but sync on commit was disabled for the binlog and database log. All engines used jemalloc. Mostly accurate my.cnf files are here but the database cache was made large enough to cache the ~10gb database.
  • MyRocks was compiled on August 15 with git hash 0d76ae. Compression was not used.
  • InnoDB was from upstream 5.6.35, 5.7.17 and 8.0.2.  The performance schema was enabled.
  • TokuDB was from Percona Server 5.7.17. Compression was not used.

Load Results

All of the data is here and note that I did not adjust metrics computed from iostat to account for iostat overstating bytes written by up to 2X for RocksDB. The table below has a subset of the results for the new config. The results for the old config are similar.
  • InnoDB 5.6 has the best insert rate but there is a regression from InnoDB in 5.6 to 5.7 to 8.0.2
  • Write efficiency (wkb/i) is similar for all engines
  • CPU efficiency (Mcpu/i) is similar for MyRocks and InnoDB

ips     wkb/i   Mcpu/i  size    wMB/s   cpu     engine
 54283  1.60     83     14       86.7    4.5    myrocks
 64402  1.02     72     16       65.6    4.6    inno5635
 56414  1.03     77     16       58.2    4.3    inno5717
 42954  1.02     97     16       45.1    4.2    inno802
 21611  1.42    179     14       30.7    3.9    toku5717

legend:
* ips - inserts/second
* wkb/i - iostat KB written per insert
* Mcpu/i - normalized CPU time per insert
* wMB/s - iostat write MB/s, average
* size - database size in GB at test end
* cpu - average value of vmstat us + sy columns


Transaction Results

These are results from the 12th 1-hour loop of the transaction phase. All of the data is here and note that I did not adjust metrics computed from iostat to account for iostat overstating bytes written by up to 2X for RocksDB. The table below has a subset of the results for the new config. I will explain them. The results for the old config are similar.
  • InnoDB 5.6 has the best transaction rate but there is a regression from 5.6 to 5.7 to 8.0.2
  • Write efficiency (wkb/t) is much better for TokuDB and MyRocks than for InnoDB
  • CPU efficiency (Mcpu/t) is similar for MyRocks and InnoDB
  • Response times are similar between MyRocks and InnoDB

tps     wkb/t   Mcpu/t  size  un    gn    ul    gl    wMB/s  engine
5489    0.78     642    15    0.3   0.1   0.5   0.5    4.3   myrocks
7239    5.15     524    26    0.3   0.1   0.4   0.2   37.3   inno5635
6463    5.17     580    26    0.3   0.1   0.5   0.3   33.4   inno5717
5855    5.25     623    25    0.3   0.1   0.6   0.3   30.7   inno802
3333    0.08    1043    18    0.6   0.2   1.0   0.8   10.0   toku5717

legend:
* tps - transactions/second
* wkb/t - iostat KB written per transaction
* Mcpu/t - normalized CPU time per transaction
* size - database size in GB at test end
* un, gn, ul, gl - 99th percentile response time in millisecs for UpdateNode,
                   GetNode, UpdateList and GetLinkedList transactions
* wMB/s - iostat write MB/s, average


Charts

Charts for load and transaction throughput using the ips and tps columns from the tables above.


Thursday, September 7, 2017

IO-bound table scan performance for MyRocks, InnoDB and TokuDB

I used sysbench to compare IO-bound table scan performance for MyRocks, InnoDB and TokuDB. Tests were run on a large server with fast SSD, Intel NUCs with SSD and an Intel NUC with disk. I call this IO-bound because for all tests the table was larger than RAM.

tl;dr
  • MyRocks can be more than 2X slower than for InnoDB. 
  • InnoDB in 5.7 does better than in 5.6
  • TokuDB without compression is comparable to InnoDB without compression and does much better than InnoDB when prefetching is enabled.
  • Compression usually has a small impact on scan performance for MyRocks with zstd and a much larger impact for TokuDB with zlib. I wonder how much of this is a measure of zstd vs zlib.
  • Scans were usually slower for all engines after fragmentation but the impact was larger for MyRocks and TokuDB than for InnoDB.

Configuration

I used my sysbench helper scripts with my sysbench branch. For tests with X tables there was 1 connection per table doing a full scan and when X > 1 the scans were concurrent. The scan time was measured twice -- first immediately after the load and index step and then after many updates have been applied. The second measurement was done to show the impact of fragmentation on scan performance.

I repeated tests on different hardware:
  • 48core.ssd - server has 48 HW threads, fast SSD and 50gb of RAM. Tests were done with 8 tables and 100M rows/table and then 1 table with 800M rows.
  • i3.ssd - a core i3 Intel NUC with Samsung 850 SSD, 8gb of RAM and 4 HW threads. The test used 2 tables and 80M rows/table.
  • i3.disk - a core i3 Intel NUC with 1 disk, 8gb of RAM and 4 HW threads. The test used 1 table and 160M rows/table.
I repeated tests for MyRocks, InnoDB and TokuDB:
  • I compiled MyRocks on August 15 with git hash 0d76ae. The MyRocks tests were done without compression and with zstd compression (myrocks.none, myrocks.zstd). I did one test for MyRocks with a binary that did not use special instructions to make crc32 faster (myrocks.none.slowcrc) and learned that fast crc doesn't make a difference on this test. It would be a bigger deal for an IO-bound test doing point queries.
  • I used TokuDB from Percona Server 5.7.17. The TokuDB tests were done without compression and with zlib compression. I tried tokudb_disable_prefetching ON and OFF (toku5717.none, toku5717.none.prefetch), but I have been setting this to ON for my OLTP benchmarks because enabling it ruined some OLTP results.
  • I used InnoDB from upstream 5.6.35 and 5.7.17. The performance_schema was enabled. The InnoDB tests did not use compression. 

Results

The results below list the number of seconds to scan the table(s) and the time relative to InnoDB from MySQL 5.6.35. For the relative time a value greater than 1 means the engine is slower than InnoDB. These values are reported for pre and post where pre is the measurement taken immediately after loading the table and creating the secondary index and post is the measurement taken after applying random updates to the table(s).

See tl;dr above for what I learned from these results.

Large server


These are results from 8 tables with 100M rows/table and then 1 table and 800M rows/table on the large server.

48core.ssd - 8t x 100m
pre     pre     post    post    engine
secs    ratio   secs    ratio
221     2.302   246     2.256   myrocks.none
201     2.093   211     1.935   myrocks.zstd
 96     1.000   109     1.000   inno5635
 75     0.781    86     0.788   inno5717
 67     0.697    94     0.862   touk5717.none
 39     0.406    69     0.633   toku5717.none.prefetch
190     1.979   224     2.055   toku5717.zlib

48core.ssd - 1t x 800m
pre     pre     post    post    engine
secs    ratio   secs    ratio
 638    1.065   1032    1.627   myrocks.none
 916    1.529   1063    1.676   myrocks.zstd
 599    1.000    634    1.000   inno5635
 434    0.724    449    0.708   inno5717
 513    0.856    735    1.159   toku5717.none
 249    0.415    502    0.791   toku5717.none.prefetch
1525    2.545   1776    2.801   toku5717.zlib

Intel NUC

These are results from the Intel NUC using SSD and then a disk.

i3.ssd - 2t x 80m
pre     pre     post    post    engine
secs    ratio   secs    ratio
181     1.448   192     1.560   myrocks.none
182     1.456   189     1.536   myrocks.none.slowcrc
219     1.752   238     1.934   myrocks.zstd
125     1.000   123     1.000   inno5635
114     0.912   107     0.869   inno5717

i3.disk - 1t x 160m
pre     pre     post    post    engine
secs    ratio   secs    ratio
330     1.304   348     1.343   myrocks.none
432     1.707   451     1.741   myrocks.zstd
253     1.000   259     1.000   inno5635
257     1.015   261     1.007   inno5717

Charts

Below are charts from the large server tests for 8 tables & 100M rows/table and then 1 table with 800M rows.

Wednesday, September 6, 2017

Timing load & index for sysbench tables

This post compares MyRocks, InnoDB and TokuDB on the time required to load and index a table for sysbench.

tl;dr
  • MyRocks, InnoDB and TokuDB have similar load performance although there is a regression for InnoDB from 5.6 to 5.7 to 8.x
  • InnoDB create index is much faster starting in 5.7

Configuration

I used my sysbench helper scripts with my sysbench branch and configured it to create 1 table with 800M rows. The binlog was enabled but sync on commit was disabled for the binlog and database log. The sysbench client shared the host with mysqld. The host has 48 HW threads, 50gb of RAM for the OS and MySQL and fast SSD. The test table is larger than RAM but it will take me a few days to get details on that. The test was repeated for MyRocks, InnoDB and TokuDB. I continue to use the IO-bound setup as described previously.
  • I compiled MyRocks on August 15 with git hash 0d76ae. The MyRocks tests were done without compression and with zstd compression. 
  • I used TokuDB from Percona Server 5.7.17. The TokuDB tests were done without compression and with zlib compression. I tried tokudb_disable_prefetching ON and OFF, but I have been setting this to ON for my OLTP benchmarks. 
  • I used InnoDB from upstream 5.6.35, 5.7.17, 8.0.1 and 8.0.2. For 8.x I used latin1/latin1_swedish_ci charset/collation. The performance_schema was enabled. The InnoDB tests did not use compression. 

A sample command line for sysbench is:
bash all.sh 1 800000000 180 300 180 innodb 1 0 /bin/mysql none /sysbench10 /dbdir

Results

The load is in PK order and there are no secondary indexes. Engines have similar performance although there is a slow regression for InnoDB with each new release and there is a big regression from 8.0.1 to 8.0.2 which I hope will be fixed when 8.x approaches GA. The ratio is the time to load for the engine divided by the time to load for InnoDB from MySQL 5.6.35.

load    load    engine
secs    ratio
 7266   1.000   inno5635
 7833   1.078   inno5717
 8286   1.140   inno801
10516   1.447   inno802
 7640   1.051   myrocks.none
 7810   1.074   myrocks.zstd
 7558   1.040   toku5717.none
 7494   1.031   toku5717.none.prefetch
 7726   1.063   toku5717.zlib 

Create index performance has more diversity. The table is larger than RAM, some of it will be read from storage and engines with compression (MyRocks.zstd, toku5717.zlib) suffer from decompression latency. The prefetch option doesn't help TokuDB in this case. InnoDB create index performance got much faster starting in 5.7. The ratio is the create index time for the engine divided by the time for InnoDB from MySQL 5.6.35.

create  create  engine
secs    ratio
 3565   1.000   inno5635
 1904   0.534   inno5717
 1961   0.550   inno801
 1966   0.551   inno802
 3321   0.931   myrocks.none
 3802   1.066   myrocks.zstd
 9817   2.753   toku5717.none
 9855   2.764   toku5717.none.prefetch
10731   3.010   toku5717.zlib

Charts

These charts have the same data as the tables above.

Write-heavy workloads with MyRocks

MyRocks is based on RocksDB and RocksDB is write-optimized, so why don't write-heavy workloads always run faster on MyRocks than on InnoDB? I will start with a summary of MyRocks versus InnoDB: MyRocks has better space & write efficiency, frequently has better write latency and sometimes has better read latency. Better space & write efficiency means you use less SSD and it will last longer.

Don't forget that better write and space efficiency with MyRocks can lead to better read latency. When a database engine does fewer writes to storage there is more IO capacity available for reads which is a big deal with disk and can be a big deal with slower SSD. When a database engine uses less space in storage then it is likely to cache more data in RAM and have better cache hit rates on reads. Many of the benchmarks that I run use uniform distribution for key generation and won't show the benefit from better cache hit rates.

With RocksDB write usually means a call to Put and Put is a blind-write. As shown by many benchmark results, a blind write can be very fast with RocksDB - insert data into memtable, optionally flush the WAL to the OS page cache, optionally force the WAL to persistent storage. But SQL update and insert statements usually need much more than a blind-write and the reads done by MyRocks can explain why some write-heavy workloads are faster with InnoDB. Things that get in the way include:
  • Pluggable storage engine APIs are slow to adopt blind-write optimizations
  • The modified row count must be returned from an update statement 
  • Enforcement of the PK and unique constraints
  • Secondary index maintenance
Pluggable storage engine APIs are slow to adopt blind-write optimizations. I don't blame MySQL and MongoDB for this because such optimizations are limited to write-optimized engines. But MyRocks and MongoRocks are here and I expect that WiredTiger/MongoDB will eventually have an LSM in production. M*Rocks engines can use the merge operator for this. I don't expect blind-write optimizations to ever be implemented for an update-in-place b-tree.

The modified row count must be returned for an update statement and that requires evaluation of the where clause. For RocksDB this requires reads -- from the memtable, maybe from the LSM tree, OS page cache and storage. Reads from storage and the OS page cache might require decompression. This is a lot more work than a blind-write. The usage of blind-write optimizations will result in statements that are update-like and insert-like because the application programmers must be aware of the semantic differences -- modified row count won't be returned, constraint violations won't be acknowledged. I think it is worth doing.

Unique constraints must be enforced for the PK and unique secondary indexes. With the exception of an auto-increment column, this requires a read from the index to confirm the value does not exist. In the best case updates & inserts are in key order and the structures to be searched are cached in RAM -- otherwise this requires reads from the OS page cache and/or storage and might require decompression.

Secondary indexes must be maintained as part of the update & insert statement processing. For an insert this means that a new index entry will be inserted. For an update that requires index maintenance the existing index entry will be deleted and a new one will be inserted. With a b-tree the leaf page(s) will be read (possibly from storage), made dirty in the buffer pool and eventually those pages will be written back to storage. With RocksDB non-unique secondary index maintenance is read-free and blind-writes are done to put the new index entry and possibly delete-mark the old index entry. See the previous paragraph if there is a unique secondary index.

Tuesday, September 5, 2017

In-memory sysbench and a small server

I repeated the in-memory sysbench test on a small server and the results are similar to what I measured on the large server.

tl;dr
  • MyRocks does worse than InnoDB and the difference is larger on read-heavy tests.
  • InnoDB QPS tends to decrease after 5.6.35
  • InnoDB range scans are more efficient after 5.6.35

Configuration

I used core i5 Intel NUC servers with an SSD, my sysbench helper scripts and sysbench branch. The test was run with 4 tables and 1M rows/table. Somewhat accurate my.cnf files are here. The test was run for MyRocks compiled on August 15 with git hash 0d76ae and InnoDB from upstream 5.6.35, 5.7.17 and 8.0.2. The binlog was enabled but sync on commit was disabled for the binlog and database log. InnoDB 8.0.2 used latin1 charset and latin1_swedish_ci collation to match earlier releases. The sysbench client and mysqld shared the host.

Results

All of the data is here. Below I share the QPS ratios to compare QPS for MyRocks and InnoDB 5.7.17 with InnoDB 5.6.35. The results are similar to the results from in-memory sysbench on a large server.

QPS ratio:
* rocks = QPS for MyRocks / QPS for InnoDB 5.6.35
* inno = QPS for InnoDB 5.7.17 / QPS for InnoDB 5.6.35

1 connection
rocks   inno
0.391   0.966   update-inlist
0.723   0.911   update-one
0.965   1.042   update-index
0.592   0.908   update-nonindex
0.651   0.915   update-nonindex-special
0.700   0.893   delete-only
0.628   1.122   read-write.range100
0.838   1.495   read-write.range10000
0.680   1.155   read-only.range100
0.826   1.438   read-only.range10000
0.655   0.882   point-query
0.247   0.966   random-points
0.287   0.944   hot-points
0.696   0.884   insert-only
-----   -----
0.634   1.037   average

Friday, September 1, 2017

In-memory sysbench, a larger server and contention - part 2

In this post I document performance problems in MyRocks, InnoDB and TokuDB using in-memory sysbench on a large server. I previously shared results for in-memory sysbench with less and more contention. In this post I explain the tests where QPS drops significantly when moving from a test with 8 tables to 1 table. In a future post I will repeat the analysis for IO-bound sysbench. Note that while I don't include InnoDB from MySQL 8.0.2 in this analysis, it is similar to 5.7.17.

Which engines lose QPS because of contention on the sysbench 1 table tests at high concurrency?
  • all engines lose QPS on the update-one test
  • InnoDB and TokuDB lose QPS on the random-points test. MyRocks does not.
  • all engines lose QPS on the hot-points test
  • InnoDB and TokuDB lose QPS on the insert-only test. MyRocks does not.
While I previously wrote that in-memory sysbench is the worst-case for MyRocks, it is interesting to find a few cases where MyRocks does better than InnoDB.

Configuration

I compare results from in-memory sysbench using 8 tables and 1 table. There is more contention for internal data structures and rows when sysbench uses 1 table rather than 8 for tests run with mid and high concurrency. I explained these tests in previous posts on sysbench with 8 tables and 1 table. I repeated tests using 1 to 64 clients on a server with 48 HW threads and I consider 32 or more clients to be high concurrency, 8 clients to be mid concurrency and 1 client to be low concurrency.

I run many (~10) sysbench tests (microbenchmarks) because modern sysbench makes that easy with Lua (thanks Alexey). Here I show tests where QPS at high concurrency suffers for tests with 1 table because with fewer tables there is more contention to internal data structures, database pages and rows. The tests for which contention is a problem are update-one, random-points, hot-points and insert-only. My usage of sysbench is explained here but I will briefly describe these tests:
  • update-one - all updates are to the same row in each table (the row with id=1). For the test with 1 table there is only one row that gets all updates which becomes a hot spot.
  • random-points - each query is a SELECT statement with an in-list that matches 100 rows by equality on the PK. The test used uniform distribution to generate the keys to find so there are no row hot spots, but there is a table hot spot when sysbench is run with one table.
  • hot-points - like random-points but this searches for the same 100 keys in every query. So this has a row hot spots.
  • insert-only - inserts are done in PK order. Secondary index maintenance is required and values for that column (k) are inserted in random order. There can be hot spots on the right-hand side of the PK index.

Guide to results

Below I share QPS for each test at low, mid and high concurrency where low is 1 connection, mid is 8 connections and high is 48 connections. The database is cached and sysbench shares the server with mysqld. There is no think time in the sysbench client when running a test, there are no stalls for reads from storage because all data can fit in the database cache and there are a few chances for stalls on writes.

For each test I list the QPS at 1, 8 and 48 connections twice - first for sysbench run with 8 tables and then for it run with 1 table. When using 8 tables there are 1M rows/table and with 1 table there is 8M rows in that table. I used MyRocks based on MySQL 5.6.35, InnoDB from upstream 5.6.35 and 5.7.17 and then TokuDB from Percona Server 5.7.17.

After the QPS results there is a section that lists QPS ratios where I highlight how QPS drops when moving from 8 tables to 1 table. When the QPS ratio is less than 1.0 there might be a performance problem.

update-one

For this test the QPS ratio section has the QPS for the engine at 1 table divided by the QPS for the engine at 8 tables. For this test all engines have a problem at mid and high concurrency as the QPS ratios are less than 0.5. Can I be happy that MyRocks suffers the least? This is a hard problem to fix because updates to one row must be serialized. For all tests the binlog was enabled and sync-on-commit was disabled for the binlog and database log. I hope that commutative updates are eventually supported in MyRocks to improve QPS for concurrent updates to a few rows.

QPS
1       8       48      concurrency/engine
- 8 tables
 8672   43342   39902   myrocks
10472   49717   52468   inno5635
 9670   51181   62626   inno5717
 2912   13736   19551   toku5717
- 1 table
 9072   17348   13055   myrocks
10521   17092   13288   inno5635
 9535   14411   13019   inno5717
 2926    3254    3077   toku5717

QPS ratio
rocks   inno56  inno57  toku
1.046   1.004   0.986   1.004   1 connection - low concurrency
0.400   0.343   0.281   0.236   8 connections - mid concurrency
0.327   0.253   0.207   0.157   48 connections - high concurrency

random-points

For this test the QPS ratio section has the QPS for the engine at 1 table divided by the QPS for the engine at 8 tables. For this test MyRocks does not have a problem for 1 table while InnoDB and TokuDB have a small problem at mid concurrency and a big problem at high concurrency. PMP output for TokuDB with 1 table & 48 connections is here and shows mutex contention. PMP output for InnoDB with 1 table & 48 connections is here and shows contention on rw-locks.

QPS
1       8       48      concurrency/engine
- 8 tables
 897     6871   23189   myrocks
2028    12693   16358   inno5635
1872    13925   47773   inno5717
1529    11824   36786   toku5717
- 1 table
 972     7411   25003   myrocks
1910    10313   12239   inno5635
1764    11931   17690   inno5717
1400     8669    8401   toku5717

QPS ratio
rocks   inno56  inno57  toku
1.083   0.941   0.942   0.915   1 connection - low concurrency
1.078   0.812   0.856   0.733   8 connections - mid concurrency
1.078   0.748   0.370   0.228   48 connections - high concurrency

hot-points

For this test the QPS ratio section is different than the above. The section here has two sets of numbers -- one for 8 tables and one for 1 table. The values are the QPS for the test divided by the QPS for the random-points test at 8 tables. When the value is less than one the engine gets less QPS than expected for this test.

For both 8 tables and 1 table all engines get less QPS on the hot-points test than on the random-points test. The loss is much greater for the 1 table test than the 8 table test. I filed issue 674 for MyRocks to make this better, but it really is an issue with RocksDB and mutex contention in the sharded LRU. PMP output for TokuDB with 1 table and 48 connections is here and it looks like the same problem as for random-points. PMP output for InnoDB with 1 table and 48 connections is here and the problem might be the same as in random-points.

QPS
1       8       48      concurrency/engine
- 8 tables
1376    10256   28762   myrocks
2863    13588   15630   inno5635
2579    17899   50430   inno5717
1989    14091   36737   toku5717
- 1 table
1577     8489    8691   myrocks
2845     8787   10947   inno5635
2574    11904   16505   inno5717
1802     7318    7788   toku5717

QPS ratio for 8 tables
rocks   inno56  inno57  toku
1.534   1.411   1.377   1.300   1 connection - low concurrency
1.492   1.070   1.285   1.191   8 connections - mid concurrency
1.240   0.955   1.055   0.998   48 connections - high concurrency

QPS ratio for 1 table
rocks   inno56  inno57  toku
1.758   1.402   1.375   1.178   1 connection - low concurrency
1.235   0.692   0.854   0.618   8 connections - mid concurrency
0.374   0.669   0.345   0.211   48 connections - high concurrency

insert-only

For this test the QPS ratio section has the QPS for the engine at 1 table divided by the QPS for the engine at 8 tables. For this test MyRocks does not lose QPS while InnoDB and TokuDB do. For all tests the binlog was enabled and sync-on-commit was disabled for the binlog and database log. While I used PMP to explain the performance problems above I won't do that here for TokuDB and InnoDB.

QPS
1       8       48      concurrency/engine
- 8 tables
 9144   46466    65777  myrocks
12317   59811    59971  inno5635
10539   61522   115598  inno5717
 3199   17164    34043  toku5717
- 1 table
 9329   47629    67704  myrocks
12273   55445    37180  inno5635
10529   61235    59690  inno5717
 3156   17193    25754  toku5717

QPS ratio
rocks   inno56  inno57  toku
1.020   0.996   0.999   0.986   1 connection -low concurrency
1.025   0.927   0.995   1.001   8 connections - mid concurrency
1.029   0.619   0.516   0.756   48 connections - high concurrency

All things RocksDB at Percona Live Europe 2017

There is plenty of content about MyRocks, MongoRocks and RocksDB at Percona Live Europe. Go here for registration and use the code SeeMeSpeakPLE17 for a discount:

Thursday, August 31, 2017

In-memory sysbench, a larger server and contention - part 1

Yesterday I shared results for in-memory sysbench on a large server. Today I have more results from a similar test but with more contention. The tests yesterday used 8 tables with 1M rows/table. The test here uses 1 table with 8M rows. In some ways the results are similar to yesterday. But there are some interesting differences that I will explain in part 2 (another post).

tl;dr
  • MyRocks does worse than InnoDB 5.6.35 at low and mid concurrency for all tests except update-index. It suffers more on the read-heavy tests.
  • MyRocks and TokuDB are more competitive at mid and high concurrency than at low. MyRocks is faster than InnoDB 5.6.35 for many of the high concurrency tests.
  • InnoDB QPS at low concurrency tends to decrease after 5.6 for tests heavy on point queries but something in MySQL 5.7 made InnoDB range scans much faster.
  • InnoDB QPS at mid and high concurrency tends to increase after 5.6

Configuration

Everything is the same as explained in the previous post except I used 1 table with 8M rows here and 8 tables with 1M rows/table there. Using 1 table instead of 8 means there can be more contention in the database engine on things like the InnoDB per-index mutex. A command line is:
bash all.sh 1 8000000 180 300 180 innodb 1 0 /orig5717/bin/mysql none /sysbench.new

Results without charts

All of the data is here. Below I share the QPS ratios to compare the QPS for one engine with the QPS from InnoDB from MySQL 5.6.35. The engine is slower than InnoDB 5.6.35 when the ratio is less than 1.0.

QPS ratio:
* rocks = myrocks / inno5635
* inno = inno5717 / inno5635
* toku = toku5717 / inno5635

1 connection
rocks   inno    toku
0.460   0.946   0.195   update-inlist
0.862   0.906   0.278   update-one
1.496   1.239   0.569   update-index
0.674   0.906   0.248   update-nonindex
0.756   0.921   0.240   update-nonindex-special
0.793   0.866   0.319   delete-only
0.701   1.027   0.612   read-write.range100
0.812   1.657   1.033   read-write.range10000
0.701   1.089   0.737   read-only.range100
0.804   1.676   1.281   read-only.range10000
0.675   0.904   0.731   point-query
0.508   0.923   0.732   random-points
0.554   0.904   0.633   hot-points
0.760   0.857   0.257   insert-only
-----   -----   -----
0.754   1.059   0.562   average

8 connections
rocks   inno    toku
0.968   1.587   0.293   update-inlist
1.014   0.843   0.190   update-one
1.837   2.183   0.608   update-index
0.879   1.090   0.307   update-nonindex
0.928   1.094   0.312   update-nonindex-special
0.968   1.068   0.340   delete-only
0.722   1.045   0.560   read-write.range100
0.814   1.626   1.108   read-write.range10000
0.714   1.126   0.825   read-only.range100
0.811   1.639   1.255   read-only.range10000
0.690   0.914   0.727   point-query
0.718   1.156   0.840   random-points
0.966   1.354   0.832   hot-points
0.859   1.104   0.310   insert-only
-----   -----   -----
0.921   1.274   0.608   average

48 connections
rocks   inno    toku
1.679   3.087   0.788   update-inlist
0.982   0.979   0.231   update-one
1.222   1.986   0.606   update-index
1.379   1.947   0.886   update-nonindex
1.387   1.936   0.854   update-nonindex-special
1.189   1.876   0.578   delete-only
0.826   1.148   0.514   read-write.range100
0.840   1.316   0.953   read-write.range10000
0.743   1.112   0.740   read-only.range100
0.850   1.342   1.034   read-only.range10000
0.941   1.368   1.066   point-query
2.042   1.445   0.686   random-points
0.793   1.507   0.711   hot-points
1.820   1.605   0.692   insert-only
-----   -----   -----
1.192   1.618   0.739   average

Results with charts

Charts using the data from the previous section. For some of them I truncate the x-axis to make it easier to see differences between engines.





Wednesday, August 30, 2017

How to build MongoRocks for MongoDB 3.4

With advice from Igor I compiled MongoRocks from source for MongoDB 3.4. I previously shared how to do this for MongoDB 3.2. I did this to use a new version of RocksDB, 5.7.3 in this case, as the team continues to make RocksDB better. The instructions below are for Ubuntu 16.04. There are slightly different instructions here.

# Install many of the dependencies for MongoRocks
sudo yum install snappy-devel zlib-devel bzip2-devel lz4-devel
sudo yum install scons gcc-g++ git

# Unpack MongoDB 3.4.7 source in $MONGOSRC

# Directory in which git repos are created
mkdir ~/git

# Get MongoRocks engine for MongoDB 3.4.7
cd ~/git
git clone https://github.com/mongodb-partners/mongo-rocks.git
cd mongo-rocks
git checkout tags/r3.4.7 -b r347

# get and build RocksDB libraries for version 5.7.3
cd ~/git
git clone https://github.com/facebook/rocksdb.git
cd rocksdb
git checkout tags/v5.7.3 -b v573
DISABLE_JEMALLOC=1 USE_RTTI=1 CFLAGS=-fPIC make static_lib -j 4

# prepare source build with support for RocksDB
cd $MONGOSRC
mkdir -p src/mongo/db/modules/
ln -sf ~/git/mongo-rocks src/mongo/db/modules/rocks

# build mongod & mongo binaries
# if you need to debug and see command lines then add --debug=presub
scons CPPPATH=/home/mdcallag/git/rocksdb/include \
      LIBPATH=/home/mdcallag/git/rocksdb \
      LIBS="lz4 zstd" mongod mongo

# install mongod
mkdir -p ~/b/m347
cd ~/b/m347
mkdir data
mkdir bin
cp $MONGOSRC/build/opt/mongo/mongod bin
cp $MONGOSRC/build/opt/mongo/mongo bin

# create mongo.conf file with the text that follows. You must
# change $HOME and consider changing the value for cacheSizeGB
---
processManagement:
  fork: true
systemLog:
  destination: file
  path: $HOME/b/m347/log
  logAppend: true
storage:
  syncPeriodSecs: 600
  dbPath: $HOME/b/m347/data
  journal:
    enabled: true
operationProfiling.slowOpThresholdMs: 2000
replication.oplogSizeMB: 4000
storage.rocksdb.cacheSizeGB: 4
---

# start mongod, consider using numactl --interleave=all
bin/mongod --config mongo.conf --master --storageEngine rocksdb

# confirm RocksDB is there
$ head -1 data/db/LOG

In-memory sysbench and a large server

Today I share worst-case performance results for MyRocks -- in-memory sysbench and a small database. I like MyRocks because it reduces space and write amplification, but I don't show results for that here. Besides, there isn't much demand for better compression from such a small database. This is part 1 with results from a large server.

tl;dr
  • MyRocks does worse than InnoDB 5.6.35 at low and mid concurrency for all tests except update-index. It suffers more on the read-heavy tests.
  • MyRocks and TokuDB are more competitive at mid and high concurrency than at low
  • InnoDB QPS at low concurrency tends to decrease after 5.6
  • InnoDB QPS at mid and high concurrency tends to increase after 5.6

Configuration

I use my sysbench fork and helper scripts, release specific my.cnf files and a server with 48 HW threads, fast SSD and 256gb of RAM. The binlog was enabled and sync-on-commit was disabled for the binlog and database log. I remembered to disable SSL.

I tested MyRocks, TokuDB and InnoDB, with buffered IO and a 180g database cache for MyRocks/TokuDB and O_DIRECT and a 180gb buffer pool for InnoDB. The server is shared by the sysbench client and mysqld. For MyRocks I used a build from August 15 with git hash 0d76ae. For TokuDB I used Percona Server 5.7.17-12. For InnoDB I used upstream 5.6.35, 5.7.17 and 8.0.2. For InnoDB 8.0.2 I used latin1 charset and latin1_swedish_ci collation. Compression was not used for any engines. More details are in the release specific my.cnf files and I used the same my.cnf for InnoDB with 8.0.1 and 8.0.2. All mysqld use jemalloc.

The test used 8 tables with 1M rows/table. My use of sysbench is explained here. Tests are run in an interesting pattern -- load, write-heavy, read-only, insert-only. On the large server each test is run for 1, 2, 4, 8, 16, 24, 32, 40, 48 and 64 concurrent connections for either 3 or 5 minutes per concurrency level. So each test runs for either 30 or 50 minutes total and I hope that is long enough to get the database into a steady state. An example command line to run the test with my helper scripts is:
bash all.sh 8 1000000 180 300 180 innodb 1 0 /orig5717/bin/mysql none /sysbench.new

Results without charts

All of the data is here. Below I share the QPS ratios to compare the QPS for one engine with the QPS from InnoDB from MySQL 5.6.35. The engine is slower than InnoDB 5.6.35 when the ratio is less than 1.0. Things that I see include:
  • MyRocks does worse than InnoDB 5.6.35 at low and mid concurrency for all tests except update-index. It suffers more on the read-heavy tests.
  • MyRocks and TokuDB are more competitive at mid and high concurrency than at low
  • InnoDB QPS at low concurrency tends to decrease after 5.6
  • InnoDB QPS at mid and high concurrency tends to increase after 5.6

QPS ratio:
* rocks = myrocks / inno5635
* inno = inno5717 / inno5635
* toku = toku5717 / inno5635

1 connection
rocks   inno    toku
0.476   0.959   0.201   update-inlist
0.828   0.923   0.278   update-one
1.146   1.090   0.458   update-index
0.648   0.901   0.243   update-nonindex
0.740   0.898   0.236   update-nonindex-special
0.836   0.917   0.347   delete-only
0.711   1.041   0.586   read-write.range100
0.809   1.671   1.038   read-write.range10000
0.664   1.096   0.753   read-only.range100
0.801   1.657   1.263   read-only.range10000
0.641   0.905   0.735   point-query
0.442   0.923   0.753   random-points
0.480   0.900   0.694   hot-points
0.742   0.855   0.259   insert-only
-----   -----   -----
0.711   1.052   0.560   average of the above

8 connections
rocks   inno    toku
0.966   1.611   0.308   update-inlist
0.871   1.029   0.276   update-one
1.201   1.467   0.417   update-index
0.858   1.093   0.315   update-nonindex
0.898   1.090   0.314   update-nonindex-special
0.949   1.058   0.338   delete-only
0.710   1.039   0.534   read-write.range100
0.811   1.621   1.128   read-write.range10000
0.675   1.098   0.851   read-only.range100
0.810   1.639   1.263   read-only.range10000
0.648   0.910   0.746   point-query
0.541   1.097   0.931   random-points
0.754   1.317   1.037   hot-points
0.776   1.028   0.286   insert-only
-----   -----   -----
0.819   1.221   0.625   average of the above

48 connections
rocks   inno    toku
1.649   3.127   0.922   update-inlist
0.760   1.193   0.372   update-one
1.316   2.236   0.700   update-index
1.360   1.982   0.937   update-nonindex
1.374   1.965   0.995   update-nonindex-special
1.126   1.845   0.566   delete-only
0.804   1.129   0.507   read-write.range100
0.838   1.310   0.956   read-write.range10000
0.711   1.098   0.866   read-only.range100
0.823   1.305   1.034   read-only.range10000
0.932   1.347   1.084   point-query
1.417   2.920   2.248   random-points
1.840   3.226   2.350   hot-points
1.096   1.927   0.567   insert-only
-----   -----   -----
1.146   1.901   1.007   average of the above

Results with charts

The charts below have the same data from the previous section - the QPS for the engine relative to the QPS for InnoDB from MySQL 5.6.35.