Small Datum: September 2017

Tuesday, September 19, 2017

Linkbench: IO-bound, large server

This has results for IO-bound Linkbench on a large server. I previously shared results for in-memory & low-concurrency Linkbench.

tl;dr

Compression doesn't hurt MyRocks performance
MyRocks matches or beats InnoDB performance with much better efficiency

Configuration

I used my Linkbench repo and helper scripts to run sysbench with maxid1=1B, loaders=4 and requestors=16 so there will be 4+1 concurrent connections doing the load (4 for link/count tables, 1 for node table) and 16 connections running transactions after the load finishes. My linkbench repo has a recent commit that changes the Linkbench workload and results with that change are labeled new config while results without the change are labeled old config. I share both. The test pattern is 1) load and 2) transactions. The transactions were run in 12 1-hour loops and I share results from the last hour. The test server has 48 HW threads, fast SSD and 50gb of RAM available to the OS page cache and database.

Tests were run for MyRocks, InnoDB from upstream MySQL and TokuDB. The binlog was enabled but sync on commit was disabled for the binlog and database log. All engines used jemalloc. Mostly accurate my.cnf files are here.

MyRocks was compiled on August 15 with git hash 0d76ae. The block cache was set to 10gb. Tests were done without compression and then with no compression for L0/L1/L2, LZ4 for L3 to the next to last level and then Zstandard for the max level.
InnoDB was from upstream 5.6.35, 5.7.17 and 8.0.2. The buffer pool was set to 35gb. The performance schema was enabled. Compression was not used.
TokuDB was from Percona Server 5.7.17. The db cache was set to 10gb. Tests were done without compression and then zlib.

Load Results

All of the data is here and note that I did not adjust metrics computed from iostat to account for iostat overstating bytes written by up to 2X for RocksDB. The table below has a subset of the results for the new config.

MyRocks has the best insert rate and compression doesn't hurt the rate
There is a regression for InnoDB from 5.7 to 8.0.2
Write efficiency (wKB/i) is best for TokuDB and similar between MyRocks and InnoDB
CPU efficiency (Mcpu/i) is similar for MyRocks and InnoDB
Space efficiency (size) is best for MyRocks and TokuDB. Uncompressed MyRocks uses much less space than uncompressed InnoDB

ips wKB/i Mcpu/i size rss rMB/s wMB/s cpu engine

142812 2.03 79 948 7.5 0.3 290.3 11.3 myrocks.none

144232 2.55 87 439 5.0 1.9 367.7 12.6 myrocks.zstd

132297 1.04 75 15xx 42.7 0 136.6 9.9 inno5635

138218 1.04 81 15xx 39.4 0.1 144.0 11.2 inno5717

122495 1.05 96 15xx 39.5 0.1 128.8 11.8 inno802

36737 1.68 246 12xx 11.1 0.1 61.6 9.0 toku5717.none

37032 1.18 257 442 11.6 0.1 43.8 9.5 toku5717.zlib

legend:

* ips - inserts/second

* wKB/i - iostat KB written per insert

* Mcpu/i - normalized CPU time per insert

* wMB/s, rMB/s - iostat write MB/s & read MB/s

* size - database size in GB at test end

* cpu - average value of vmstat us + sy columns
* rss - mysqld RSS in GB

Transaction Results

These are results from the 12th 1-hour loop of the transaction phase. All of the data is here and note that I did not adjust metrics computed from iostat to account for iostat overstating bytes written by up to 2X for RocksDB. The table below has a subset of the results for the new config. I will explain them.

MyRocks and InnoDB have the best transaction rates
Write efficiency (wKB/t) is much better for MyRocks than for InnoDB
CPU efficiency (Mcpu/t) is similar for MyRocks and InnoDB
Space efficiency (size) is best for MyRocks and TokuDB (see here)
MyRocks has the the best response times and compression doesn't hurt it

tps r/t rKB/t wKB/t Mcpu/t ul gl rMB/s wMB/s cpu engine

34701 1.16 22.41 1.27 659 1 0.9 777.7 44.2 22.9 myrocks.none

34622 1.08 13.12 1.27 758 1 0.9 454.4 44.1 26.2 myrocks.zstd

25775 1.16 18.50 12.05 672 6 4 476.8 310.5 17.3 inno5635

34539 1.15 18.39 11.91 609 3 2 635.0 411.4 21.0 inno5717

33578 1.14 18.23 11.97 641 3 2 612.1 401.9 21.5 inno802

8982 3.07 188.97 5.64 2685 13 12 1697.3 50.7 24.1 toku5717.none

12448 1.20 17.42 2.84 2644 6 3 216.9 35.4 32.9 toku5717.zlib

legend:

* tps - transactions/second
* r/t - iostat reads per transaction

* rKB/t, wKB/t - iostat KB read & written per transaction

* Mcpu/t - normalized CPU time per transaction

* size - database size in GB at test end

* un, gn, ul, gl - 99th percentile response time in millisecs for UpdateNode,

GetNode, UpdateList and GetLinkedList transactions

* rMB/s, wMB/s - iostat read MB/s & write MB/s
* cpu - average CPU utilization from vmstat us & sy

Charts

Charts for load and transaction throughput using the ips and tps columns from the tables above.

Wednesday, September 13, 2017

Making it easier to read Linux perf output

This is a short note to myself the next time I need to remember things I had to do to enhance the readability of Linux perf output.

trim trailing spaces -> perf report --stdio | awk 'sub(/ *$/, "")'
don't forget the children when using perf record -g -> perf report --no-children

Friday, September 8, 2017

Linkbench - in-memory, low-concurrency

After a few weeks of sysbench it is now time for more complex workloads and the first one is Linkbench with a cached database and low-concurrency. I prefer to start with cached & low-concurrency configurations before trying IO-bound & high-concurrency.

tl;dr:

InnoDB from MySQL 5.6 had the best throughput
CPU efficiency is similar for MyRocks and InnoDB
There is a CPU regression from MySQL 5.6 to 5.7 to 8.x
Write efficiency was similar for all engines on the load test but much better for MyRocks and TokuDB on the transaction test.

Configuration

I used my Linkbench repo and helper scripts to run sysbench with maxid1=10M, loaders=1 and requestors=1 so there will be 2 concurrent connections doing the load and 1 connection running transactions after the load finishes. My linkbench repo has a recent commit that changes the Linkbench workload and results with that change are labeled new config while results without the change are labeled old config. I share both. The test pattern is 1) load and 2) transactions. The transactions were run in 12 1-hour loops and I share results from the last hour. The test server has 48 HW threads, fast SSD and 256gb of RAM.

Tests were run for MyRocks, InnoDB from upstream MySQL and TokuDB. The binlog was enabled but sync on commit was disabled for the binlog and database log. All engines used jemalloc. Mostly accurate my.cnf files are here but the database cache was made large enough to cache the ~10gb database.

MyRocks was compiled on August 15 with git hash 0d76ae. Compression was not used.
InnoDB was from upstream 5.6.35, 5.7.17 and 8.0.2. The performance schema was enabled.
TokuDB was from Percona Server 5.7.17. Compression was not used.

InnoDB 5.6 has the best insert rate but there is a regression from InnoDB in 5.6 to 5.7 to 8.0.2
Write efficiency (wkb/i) is similar for all engines
CPU efficiency (Mcpu/i) is similar for MyRocks and InnoDB

ips wkb/i Mcpu/i size wMB/s cpu engine

54283 1.60 83 14 86.7 4.5 myrocks

64402 1.02 72 16 65.6 4.6 inno5635

56414 1.03 77 16 58.2 4.3 inno5717

42954 1.02 97 16 45.1 4.2 inno802

21611 1.42 179 14 30.7 3.9 toku5717

legend:

* ips - inserts/second

* wkb/i - iostat KB written per insert

* Mcpu/i - normalized CPU time per insert

* wMB/s - iostat write MB/s, average

* size - database size in GB at test end

* cpu - average value of vmstat us + sy columns

Transaction Results

InnoDB 5.6 has the best transaction rate but there is a regression from 5.6 to 5.7 to 8.0.2
Write efficiency (wkb/t) is much better for TokuDB and MyRocks than for InnoDB
CPU efficiency (Mcpu/t) is similar for MyRocks and InnoDB
Response times are similar between MyRocks and InnoDB

tps wkb/t Mcpu/t size un gn ul gl wMB/s engine

5489 0.78 642 15 0.3 0.1 0.5 0.5 4.3 myrocks

7239 5.15 524 26 0.3 0.1 0.4 0.2 37.3 inno5635

6463 5.17 580 26 0.3 0.1 0.5 0.3 33.4 inno5717

5855 5.25 623 25 0.3 0.1 0.6 0.3 30.7 inno802

3333 0.08 1043 18 0.6 0.2 1.0 0.8 10.0 toku5717

legend:

* tps - transactions/second

* wkb/t - iostat KB written per transaction

* Mcpu/t - normalized CPU time per transaction

* size - database size in GB at test end

* un, gn, ul, gl - 99th percentile response time in millisecs for UpdateNode, GetNode, UpdateList and GetLinkedList transactions

* wMB/s - iostat write MB/s, average

Charts

Charts for load and transaction throughput using the ips and tps columns from the tables above.

Thursday, September 7, 2017

IO-bound table scan performance for MyRocks, InnoDB and TokuDB

I used sysbench to compare IO-bound table scan performance for MyRocks, InnoDB and TokuDB. Tests were run on a large server with fast SSD, Intel NUCs with SSD and an Intel NUC with disk. I call this IO-bound because for all tests the table was larger than RAM.

tl;dr

MyRocks can be more than 2X slower than for InnoDB.
InnoDB in 5.7 does better than in 5.6
TokuDB without compression is comparable to InnoDB without compression and does much better than InnoDB when prefetching is enabled.
Compression usually has a small impact on scan performance for MyRocks with zstd and a much larger impact for TokuDB with zlib. I wonder how much of this is a measure of zstd vs zlib.
Scans were usually slower for all engines after fragmentation but the impact was larger for MyRocks and TokuDB than for InnoDB.

Configuration

I used my sysbench helper scripts with my sysbench branch. For tests with X tables there was 1 connection per table doing a full scan and when X > 1 the scans were concurrent. The scan time was measured twice -- first immediately after the load and index step and then after many updates have been applied. The second measurement was done to show the impact of fragmentation on scan performance.

I repeated tests on different hardware:

48core.ssd - server has 48 HW threads, fast SSD and 50gb of RAM. Tests were done with 8 tables and 100M rows/table and then 1 table with 800M rows.
i3.ssd - a core i3 Intel NUC with Samsung 850 SSD, 8gb of RAM and 4 HW threads. The test used 2 tables and 80M rows/table.
i3.disk - a core i3 Intel NUC with 1 disk, 8gb of RAM and 4 HW threads. The test used 1 table and 160M rows/table.

I repeated tests for MyRocks, InnoDB and TokuDB:

I compiled MyRocks on August 15 with git hash 0d76ae. The MyRocks tests were done without compression and with zstd compression (myrocks.none, myrocks.zstd). I did one test for MyRocks with a binary that did not use special instructions to make crc32 faster (myrocks.none.slowcrc) and learned that fast crc doesn't make a difference on this test. It would be a bigger deal for an IO-bound test doing point queries.
I used TokuDB from Percona Server 5.7.17. The TokuDB tests were done without compression and with zlib compression. I tried tokudb_disable_prefetching ON and OFF (toku5717.none, toku5717.none.prefetch), but I have been setting this to ON for my OLTP benchmarks because enabling it ruined some OLTP results.
I used InnoDB from upstream 5.6.35 and 5.7.17. The performance_schema was enabled. The InnoDB tests did not use compression.

Results

The results below list the number of seconds to scan the table(s) and the time relative to InnoDB from MySQL 5.6.35. For the relative time a value greater than 1 means the engine is slower than InnoDB. These values are reported for pre and post where pre is the measurement taken immediately after loading the table and creating the secondary index and post is the measurement taken after applying random updates to the table(s).

See tl;dr above for what I learned from these results.

Large server

These are results from 8 tables with 100M rows/table and then 1 table and 800M rows/table on the large server.

48core.ssd - 8t x 100m
pre pre post post engine
secs ratio secs ratio
221 2.302 246 2.256 myrocks.none
201 2.093 211 1.935 myrocks.zstd
96 1.000 109 1.000 inno5635
75 0.781 86 0.788 inno5717
67 0.697 94 0.862 touk5717.none
39 0.406 69 0.633 toku5717.none.prefetch
190 1.979 224 2.055 toku5717.zlib

48core.ssd - 1t x 800m
pre pre post post engine
secs ratio secs ratio
638 1.065 1032 1.627 myrocks.none
916 1.529 1063 1.676 myrocks.zstd
599 1.000 634 1.000 inno5635
434 0.724 449 0.708 inno5717
513 0.856 735 1.159 toku5717.none
249 0.415 502 0.791 toku5717.none.prefetch
1525 2.545 1776 2.801 toku5717.zlib

Intel NUC

These are results from the Intel NUC using SSD and then a disk.

i3.ssd - 2t x 80m
pre pre post post engine
secs ratio secs ratio
181 1.448 192 1.560 myrocks.none
182 1.456 189 1.536 myrocks.none.slowcrc
219 1.752 238 1.934 myrocks.zstd
125 1.000 123 1.000 inno5635
114 0.912 107 0.869 inno5717

i3.disk - 1t x 160m
pre pre post post engine
secs ratio secs ratio
330 1.304 348 1.343 myrocks.none
432 1.707 451 1.741 myrocks.zstd
253 1.000 259 1.000 inno5635
257 1.015 261 1.007 inno5717

Charts

Below are charts from the large server tests for 8 tables & 100M rows/table and then 1 table with 800M rows.

Wednesday, September 6, 2017

Timing load & index for sysbench tables

This post compares MyRocks, InnoDB and TokuDB on the time required to load and index a table for sysbench.

tl;dr

MyRocks, InnoDB and TokuDB have similar load performance although there is a regression for InnoDB from 5.6 to 5.7 to 8.x
InnoDB create index is much faster starting in 5.7

Configuration

I used my sysbench helper scripts with my sysbench branch and configured it to create 1 table with 800M rows. The binlog was enabled but sync on commit was disabled for the binlog and database log. The sysbench client shared the host with mysqld. The host has 48 HW threads, 50gb of RAM for the OS and MySQL and fast SSD. The test table is larger than RAM but it will take me a few days to get details on that. The test was repeated for MyRocks, InnoDB and TokuDB. I continue to use the IO-bound setup as described previously.

I compiled MyRocks on August 15 with git hash 0d76ae. The MyRocks tests were done without compression and with zstd compression.
I used TokuDB from Percona Server 5.7.17. The TokuDB tests were done without compression and with zlib compression. I tried tokudb_disable_prefetching ON and OFF, but I have been setting this to ON for my OLTP benchmarks.
I used InnoDB from upstream 5.6.35, 5.7.17, 8.0.1 and 8.0.2. For 8.x I used latin1/latin1_swedish_ci charset/collation. The performance_schema was enabled. The InnoDB tests did not use compression.

A sample command line for sysbench is:

bash all.sh 1 800000000 180 300 180 innodb 1 0 /bin/mysql none /sysbench10 /dbdir

Results

The load is in PK order and there are no secondary indexes. Engines have similar performance although there is a slow regression for InnoDB with each new release and there is a big regression from 8.0.1 to 8.0.2 which I hope will be fixed when 8.x approaches GA. The ratio is the time to load for the engine divided by the time to load for InnoDB from MySQL 5.6.35.

load load engine
secs ratio
7266 1.000 inno5635
7833 1.078 inno5717
8286 1.140 inno801
10516 1.447 inno802
7640 1.051 myrocks.none
7810 1.074 myrocks.zstd
7558 1.040 toku5717.none
7494 1.031 toku5717.none.prefetch
7726 1.063 toku5717.zlib

Create index performance has more diversity. The table is larger than RAM, some of it will be read from storage and engines with compression (MyRocks.zstd, toku5717.zlib) suffer from decompression latency. The prefetch option doesn't help TokuDB in this case. InnoDB create index performance got much faster starting in 5.7. The ratio is the create index time for the engine divided by the time for InnoDB from MySQL 5.6.35.

create create engine
secs ratio
3565 1.000 inno5635
1904 0.534 inno5717
1961 0.550 inno801
1966 0.551 inno802
3321 0.931 myrocks.none
3802 1.066 myrocks.zstd
9817 2.753 toku5717.none
9855 2.764 toku5717.none.prefetch
10731 3.010 toku5717.zlib

Charts

These charts have the same data as the tables above.

Write-heavy workloads with MyRocks

MyRocks is based on RocksDB and RocksDB is write-optimized, so why don't write-heavy workloads always run faster on MyRocks than on InnoDB? I will start with a summary of MyRocks versus InnoDB: MyRocks has better space & write efficiency, frequently has better write latency and sometimes has better read latency. Better space & write efficiency means you use less SSD and it will last longer.

Don't forget that better write and space efficiency with MyRocks can lead to better read latency. When a database engine does fewer writes to storage there is more IO capacity available for reads which is a big deal with disk and can be a big deal with slower SSD. When a database engine uses less space in storage then it is likely to cache more data in RAM and have better cache hit rates on reads. Many of the benchmarks that I run use uniform distribution for key generation and won't show the benefit from better cache hit rates.

With RocksDB write usually means a call to Put and Put is a blind-write. As shown by many benchmark results, a blind write can be very fast with RocksDB - insert data into memtable, optionally flush the WAL to the OS page cache, optionally force the WAL to persistent storage. But SQL update and insert statements usually need much more than a blind-write and the reads done by MyRocks can explain why some write-heavy workloads are faster with InnoDB. Things that get in the way include:

Pluggable storage engine APIs are slow to adopt blind-write optimizations
The modified row count must be returned from an update statement
Enforcement of the PK and unique constraints
Secondary index maintenance

Pluggable storage engine APIs are slow to adopt blind-write optimizations. I don't blame MySQL and MongoDB for this because such optimizations are limited to write-optimized engines. But MyRocks and MongoRocks are here and I expect that WiredTiger/MongoDB will eventually have an LSM in production. M*Rocks engines can use the merge operator for this. I don't expect blind-write optimizations to ever be implemented for an update-in-place b-tree.

The modified row count must be returned for an update statement and that requires evaluation of the where clause. For RocksDB this requires reads -- from the memtable, maybe from the LSM tree, OS page cache and storage. Reads from storage and the OS page cache might require decompression. This is a lot more work than a blind-write. The usage of blind-write optimizations will result in statements that are update-like and insert-like because the application programmers must be aware of the semantic differences -- modified row count won't be returned, constraint violations won't be acknowledged. I think it is worth doing.

Unique constraints must be enforced for the PK and unique secondary indexes. With the exception of an auto-increment column, this requires a read from the index to confirm the value does not exist. In the best case updates & inserts are in key order and the structures to be searched are cached in RAM -- otherwise this requires reads from the OS page cache and/or storage and might require decompression.

Secondary indexes must be maintained as part of the update & insert statement processing. For an insert this means that a new index entry will be inserted. For an update that requires index maintenance the existing index entry will be deleted and a new one will be inserted. With a b-tree the leaf page(s) will be read (possibly from storage), made dirty in the buffer pool and eventually those pages will be written back to storage. With RocksDB non-unique secondary index maintenance is read-free and blind-writes are done to put the new index entry and possibly delete-mark the old index entry. See the previous paragraph if there is a unique secondary index.

Tuesday, September 5, 2017

In-memory sysbench and a small server

I repeated the in-memory sysbench test on a small server and the results are similar to what I measured on the large server.

tl;dr

MyRocks does worse than InnoDB and the difference is larger on read-heavy tests.
InnoDB QPS tends to decrease after 5.6.35
InnoDB range scans are more efficient after 5.6.35

Configuration

I used core i5 Intel NUC servers with an SSD, my sysbench helper scripts and sysbench branch. The test was run with 4 tables and 1M rows/table. Somewhat accurate my.cnf files are here. The test was run for MyRocks compiled on August 15 with git hash 0d76ae and InnoDB from upstream 5.6.35, 5.7.17 and 8.0.2. The binlog was enabled but sync on commit was disabled for the binlog and database log. InnoDB 8.0.2 used latin1 charset and latin1_swedish_ci collation to match earlier releases. The sysbench client and mysqld shared the host.

Results

All of the data is here. Below I share the QPS ratios to compare QPS for MyRocks and InnoDB 5.7.17 with InnoDB 5.6.35. The results are similar to the results from in-memory sysbench on a large server.

QPS ratio:
* rocks = QPS for MyRocks / QPS for InnoDB 5.6.35
* inno = QPS for InnoDB 5.7.17 / QPS for InnoDB 5.6.35

1 connection
rocks inno
0.391 0.966 update-inlist
0.723 0.911 update-one
0.965 1.042 update-index
0.592 0.908 update-nonindex
0.651 0.915 update-nonindex-special
0.700 0.893 delete-only
0.628 1.122 read-write.range100
0.838 1.495 read-write.range10000
0.680 1.155 read-only.range100
0.826 1.438 read-only.range10000
0.655 0.882 point-query
0.247 0.966 random-points
0.287 0.944 hot-points
0.696 0.884 insert-only
----- -----
0.634 1.037 average

Friday, September 1, 2017

In-memory sysbench, a larger server and contention - part 2

In this post I document performance problems in MyRocks, InnoDB and TokuDB using in-memory sysbench on a large server. I previously shared results for in-memory sysbench with less and more contention. In this post I explain the tests where QPS drops significantly when moving from a test with 8 tables to 1 table. In a future post I will repeat the analysis for IO-bound sysbench. Note that while I don't include InnoDB from MySQL 8.0.2 in this analysis, it is similar to 5.7.17.

Which engines lose QPS because of contention on the sysbench 1 table tests at high concurrency?

all engines lose QPS on the update-one test
InnoDB and TokuDB lose QPS on the random-points test. MyRocks does not.
all engines lose QPS on the hot-points test
InnoDB and TokuDB lose QPS on the insert-only test. MyRocks does not.

While I previously wrote that in-memory sysbench is the worst-case for MyRocks, it is interesting to find a few cases where MyRocks does better than InnoDB.

Configuration

I compare results from in-memory sysbench using 8 tables and 1 table. There is more contention for internal data structures and rows when sysbench uses 1 table rather than 8 for tests run with mid and high concurrency. I explained these tests in previous posts on sysbench with 8 tables and 1 table. I repeated tests using 1 to 64 clients on a server with 48 HW threads and I consider 32 or more clients to be high concurrency, 8 clients to be mid concurrency and 1 client to be low concurrency.

I run many (~10) sysbench tests (microbenchmarks) because modern sysbench makes that easy with Lua (thanks Alexey). Here I show tests where QPS at high concurrency suffers for tests with 1 table because with fewer tables there is more contention to internal data structures, database pages and rows. The tests for which contention is a problem are update-one, random-points, hot-points and insert-only. My usage of sysbench is explained here but I will briefly describe these tests:

update-one - all updates are to the same row in each table (the row with id=1). For the test with 1 table there is only one row that gets all updates which becomes a hot spot.
random-points - each query is a SELECT statement with an in-list that matches 100 rows by equality on the PK. The test used uniform distribution to generate the keys to find so there are no row hot spots, but there is a table hot spot when sysbench is run with one table.
hot-points - like random-points but this searches for the same 100 keys in every query. So this has a row hot spots.
insert-only - inserts are done in PK order. Secondary index maintenance is required and values for that column (k) are inserted in random order. There can be hot spots on the right-hand side of the PK index.

Guide to results

Below I share QPS for each test at low, mid and high concurrency where low is 1 connection, mid is 8 connections and high is 48 connections. The database is cached and sysbench shares the server with mysqld. There is no think time in the sysbench client when running a test, there are no stalls for reads from storage because all data can fit in the database cache and there are a few chances for stalls on writes.

For each test I list the QPS at 1, 8 and 48 connections twice - first for sysbench run with 8 tables and then for it run with 1 table. When using 8 tables there are 1M rows/table and with 1 table there is 8M rows in that table. I used MyRocks based on MySQL 5.6.35, InnoDB from upstream 5.6.35 and 5.7.17 and then TokuDB from Percona Server 5.7.17.

After the QPS results there is a section that lists QPS ratios where I highlight how QPS drops when moving from 8 tables to 1 table. When the QPS ratio is less than 1.0 there might be a performance problem.

update-one

For this test the QPS ratio section has the QPS for the engine at 1 table divided by the QPS for the engine at 8 tables. For this test all engines have a problem at mid and high concurrency as the QPS ratios are less than 0.5. Can I be happy that MyRocks suffers the least? This is a hard problem to fix because updates to one row must be serialized. For all tests the binlog was enabled and sync-on-commit was disabled for the binlog and database log. I hope that commutative updates are eventually supported in MyRocks to improve QPS for concurrent updates to a few rows.

QPS
1 8 48 concurrency/engine

- 8 tables

8672 43342 39902 myrocks

10472 49717 52468 inno5635

9670 51181 62626 inno5717

2912 13736 19551 toku5717

- 1 table

9072 17348 13055 myrocks

10521 17092 13288 inno5635

9535 14411 13019 inno5717

2926 3254 3077 toku5717

QPS ratio

rocks inno56 inno57 toku

1.046 1.004 0.986 1.004 1 connection - low concurrency

0.400 0.343 0.281 0.236 8 connections - mid concurrency

0.327 0.253 0.207 0.157 48 connections - high concurrency

random-points

For this test the QPS ratio section has the QPS for the engine at 1 table divided by the QPS for the engine at 8 tables. For this test MyRocks does not have a problem for 1 table while InnoDB and TokuDB have a small problem at mid concurrency and a big problem at high concurrency. PMP output for TokuDB with 1 table & 48 connections is here and shows mutex contention. PMP output for InnoDB with 1 table & 48 connections is here and shows contention on rw-locks.

QPS

1 8 48 concurrency/engine

- 8 tables

897 6871 23189 myrocks

2028 12693 16358 inno5635

1872 13925 47773 inno5717

1529 11824 36786 toku5717

- 1 table

972 7411 25003 myrocks

1910 10313 12239 inno5635

1764 11931 17690 inno5717

1400 8669 8401 toku5717

QPS ratio

rocks inno56 inno57 toku

1.083 0.941 0.942 0.915 1 connection - low concurrency

1.078 0.812 0.856 0.733 8 connections - mid concurrency

1.078 0.748 0.370 0.228 48 connections - high concurrency

hot-points

For this test the QPS ratio section is different than the above. The section here has two sets of numbers -- one for 8 tables and one for 1 table. The values are the QPS for the test divided by the QPS for the random-points test at 8 tables. When the value is less than one the engine gets less QPS than expected for this test.

For both 8 tables and 1 table all engines get less QPS on the hot-points test than on the random-points test. The loss is much greater for the 1 table test than the 8 table test. I filed issue 674 for MyRocks to make this better, but it really is an issue with RocksDB and mutex contention in the sharded LRU. PMP output for TokuDB with 1 table and 48 connections is here and it looks like the same problem as for random-points. PMP output for InnoDB with 1 table and 48 connections is here and the problem might be the same as in random-points.

QPS

1 8 48 concurrency/engine
- 8 tables

1376 10256 28762 myrocks

2863 13588 15630 inno5635

2579 17899 50430 inno5717

1989 14091 36737 toku5717

- 1 table

1577 8489 8691 myrocks

2845 8787 10947 inno5635

2574 11904 16505 inno5717

1802 7318 7788 toku5717

QPS ratio for 8 tables

rocks inno56 inno57 toku

1.534 1.411 1.377 1.300 1 connection - low concurrency

1.492 1.070 1.285 1.191 8 connections - mid concurrency

1.240 0.955 1.055 0.998 48 connections - high concurrency

QPS ratio for 1 table

rocks inno56 inno57 toku

1.758 1.402 1.375 1.178 1 connection - low concurrency

1.235 0.692 0.854 0.618 8 connections - mid concurrency

0.374 0.669 0.345 0.211 48 connections - high concurrency

insert-only

For this test the QPS ratio section has the QPS for the engine at 1 table divided by the QPS for the engine at 8 tables. For this test MyRocks does not lose QPS while InnoDB and TokuDB do. For all tests the binlog was enabled and sync-on-commit was disabled for the binlog and database log. While I used PMP to explain the performance problems above I won't do that here for TokuDB and InnoDB.

QPS

1 8 48 concurrency/engine
- 8 tables

9144 46466 65777 myrocks

12317 59811 59971 inno5635

10539 61522 115598 inno5717

3199 17164 34043 toku5717

- 1 table

9329 47629 67704 myrocks

12273 55445 37180 inno5635

10529 61235 59690 inno5717

3156 17193 25754 toku5717

QPS ratio

rocks inno56 inno57 toku

1.020 0.996 0.999 0.986 1 connection -low concurrency

1.025 0.927 0.995 1.001 8 connections - mid concurrency

1.029 0.619 0.516 0.756 48 connections - high concurrency

All things RocksDB at Percona Live Europe 2017

There is plenty of content about MyRocks, MongoRocks and RocksDB at Percona Live Europe. Go here for registration and use the code SeeMeSpeakPLE17 for a discount:

MyRocks Internals and Production Deployment at Facebook - I am not sure whether this is a tutorial or a talk but the presenter is Yoshinori Matsunobu and the content will be awesome.
MyRocks at Facebook and a Roadmap - What is next for MyRocks? I won't spoil the surprise but this is a keynote by Yoshinori Matsunobu.
MyRocks Engineering: Deploying a New MySQL Storage Engine to Production at Facebook - Writing a new storage engine is hard, deploying it is harder. This explains how MyRocks replaced InnoDB for a web-scale workload without too much worry.
Building a MongoDB Storage Engine with the RocksDB LSM Tree - MongoRocks is part of Percona Server for MongoDB. From following the email list they do a lot more than just compile it. I am a happy user of it -- for performance comparisons.
What's New in CMongo Tencent's Branch of MongoDB - MongoRocks is web-scale! I will attend to learn more about how they are using and making better MongoRocks.
Improvements to MongoRocks in 2017 - I am the speaker. I will explain how we continue to make MongoRocks better by making RocksDB better.
RocksDB Static Sorted Table Formats: Considerations when Building a Search Engine - Tarek Sheasha explains a web-scale use of RocksDB at Booking.com