Monday, December 18, 2017

MyRocks, InnoDB and TokuDB: a summary

This has links to all performance reports from my recent series. I wanted to publish all of this by year end because I am taking a break from running and documenting MyRocks and MongoRocks performance.

Small server

Results from Intel NUC servers I have at home.

Insert benchmark:

Sysbench:

Linkbench:


Sysbench: IO-bound and a fast server, part 2

This post has results for sysbench with an IO-bound workload and a fast server. It uses 1 table with 800M rows. The previous post had results for 8 tables with 100M rows/table. The goal is to understand the impact from more contention. Before that I published a report comparing sysbench with 8 tables vs 1 table for an in-memory workload.

All of the data is on github. With the exception of the scan section, the graphs below have the absolute QPS for 48 concurrent clients using 8 tables and 1 table.

tl;dr

  • MyRocks scans are up to 2X faster with readahead
  • update-one - all engines do worse. This is expected.
  • update-index - InnoDB 5.6 & 5.7 do worse. This doesn't repeat on the in-memory benchmark.
  • read-write - InnoDB 5.6 does better on the 1 table test. This is odd.
  • hot-points - all engines do worse. This is expected. MyRocks can do 3X better by switching from LRU to Clock once that feature is GA.
  • insert - InnoDB gets ~50% of the QPS with 1 table vs 8 tables.

scan

This scan test uses one connection to scan the 800M row table. For the previous post there were 8 connections each scanning a different table with 100M rows/table. Fortunately the results here are similar to the results from the previous test.
  • InnoDB 5.6 is faster than MyRocks without readahead.
  • MyRocks with readahead is faster than InnoDB 5.6 but slower than InnoDB 5.7/8.0


update-one

All engines lose QPS with 1 table, the result is expected given all updates go to one row on this test. The result here is similar to the in-memory benchmark.
update-index

MyRocks and InnoDB 8.0 do great. InnoDB 5.6/5.7 lose QPS with 1 table but I did not debug that.

read-write range=100

Most engines do the same for 1 vs 8 tables. InnoDB 5.6 does much better with 1 table but I did not debug that.

hot-points

All engines lose QPS with 1 table and this is similar to the in-memory benchmark. That is expected because the test fetches the same 100 rows per query and this stays in-memory. The QPS for MyRocks can be ~3X better by switching from using LRU to Clock for the block cache but that feature might not be GA today.

insert

InnoDB gets ~2X more QPS with 8 tables than with 1. That result is similar to the in-memory benchmark. I didn't debug the source of contention.

Sysbench: IO-bound and a fast server

In this post I share results for in-memory sysbench on a fast server using MyRocks, InnoDB and TokuDB.

tl;dr
  • MyRocks is more space efficient than InnoDB. InnoDB uses ~2.2X more space than compressed MyRocks and ~1.1X more space than uncompressed MyRocks.
  • MyRocks is more write effcient than InnoDB. InnoDB writes ~7.9X more to storage per update than MyRocks on the update-index test.
  • For full index scans InnoDB 5.6 is ~2X faster than MyRocks. But with readahead enabled, uncompressed MyRocks is ~2X faster than InnoDB 5.6 and comparable to InnoDB 5.7/8.0.
  • MyRocks is >= InnoDB 5.7/8.0 for 3 of the four update-only tests. update-one is the only test on which it isn't similar or better and that test has a cached working set.
  • MyRocks is similar to InnoDB 5.6 on the insert only test.
  • MyRocks matches InnoDB 5.7/8.0 for read-write with range-size=100 (the default). It does worse with range-size=10000.
  • MyRocks is similar to InnoDB 5.6 for read-only with range-size=100 (the default). It does worse with range-size=10000. InnoDB 5.7/8.0 do better than InnoDB 5.6.
  • Results for point-query are mixed. MyRocks does worse than InnoDB 5.6 while InnoDB 5.7/8.0 do better and worse than InnoDB 5.6.
  • Results for random-points are also mixed and similar to point-query.

Configuration

My usage of sysbench is described here. The test server has 48 HW threads, fast SSD and 50gb of RAM. The database block cache (buffer pool) was 10gb for MyRocks and TokuDB and 35gb for InnoDB. MyRocks and TokuDB used buffered IO while InnoDB used O_DIRECT. Sysbench was run with 8 tables and 100M rows/table. Tests were repeated for 1, 2, 4, 8, 16, 24, 32, 40, 48 and 64 concurrent clients. At each concurrency level the read-only tests run for 180 seconds, the write-heavy tests for 300 seconds and the insert test for 180 seconds.

Tests were run for MyRocks, InnoDB from upstream MySQL, InnoDB from FB MySQL and TokuDB. The binlog was enabled but sync on commit was disabled for the binlog and database log. All engines used jemalloc. Mostly accurate my.cnf files are here.
  • MyRocks was compiled on October 16 with git hash 1d0132. Tests were repeated with and without compression. The configuration without compression is called MySQL.none in the rest of this post. The configuration with compression is called MySQL.zstd and used zstandard for the max level, no compression for L0/L1/L2 and lz4 for the other levels. 
  • Upstream 5.6.35, 5.7.17, 8.0.1, 8.0.2 and 8.0.3 were used with InnoDB. SSL was disabled and 8.x used the same charset/collation as previous releases.
  • InnoDB from FB MySQL 5.6.35 was compiled on June 16 with git hash 52e058.
  • TokuDB was from Percona Server 5.7.17. Tests were done without compression and then with zlib compression.
The performance schema was enabled for upstream InnoDB and TokuDB. It was disabled at compile time for MyRocks and InnoDB from FB MySQL because FB MySQL 5.6 has user & table statistics for monitoring.

Results

All of the data for the tests is on github. Graphs for each test are below. The graphs show the QPS for a test relative to the QPS for InnoDB 5.6.35 and a value > 1 means the engine gets more QPS than InnoDB 5.6.35. The graphs have data for tests with 1, 8 and 48 concurrent clients and I refer to these as low, mid and high concurrency. The tests are explained here and the results are in the order in which the tests are run except where noted below. The graphs exclude results for InnoDB from FB MySQL to improve readability.

space and write efficiency

MyRocks is more space and write efficient than InnoDB.
  • InnoDB uses 2.27X more space than compressed MyRocks and 1.12X more space than uncompressed MyRocks.
  • InnoDB writes ~7.9X more to storage per update than MyRocks on the update index test.


scan

This has data on the time to do a full scan of the PK index before and after the write-heavy tests:
  • InnoDB 5.6 is ~2X faster than MyRocks without readahead. I don't think the InnoDB PK suffers from fragmentation with sysbench. Had that been a problem then the gap would have been smaller.
  • MyRocks with readahead is almost 2X faster than InnoDB 5.6.
  • InnoDB 5.7/8.0 is faster than InnoDB 5.6
  • For MyRocks and InnoDB 5.6 the scans before and after write-heavy tests have similar performance but for InnoDB 5.7/8.0 the scan after the write-heavy tests was faster. The scan was faster after because the storage read rate was much better as show in the second graph. But the CPU overhead/row was larger for the scan after write-heavy tests. This is a mystery.



update-inlist

The workload updates 100 rows per statement via an in-list and doesn't need index maintenance. Interesting results:
  • MyRocks is similar to InnoDB at low concurrency
  • MyRocks is the best at mid and high concurrency. It benefits from read-free secondary index maintenance.
  • InnoDB 5.7 & 8.0 are better than InnoDB 5.6 at mid and high concurrency.


update-one

While the configuration is IO-bound, the test updates only one row so the working set is cached. Interesting results:
  • MyRocks suffers at high concurrency. I did not debug this.
  • InnoDB 5.7/8.0 are worse than InnoDB 5.6 at low concurrency and better at high. New non-InnoDB code overhead hurts at low but InnoDB code improvements help at high.

update-index

The workload here needs secondary index maintenance. Interesting results:
  • MyRocks is better here than on other write-heavy tests relative to InnoDB because non-unique secondary index maintenance is read-free.
  • InnoDB 5.7/8.0 do much better than InnoDB 5.6 at mid and high concurrency
  • Relative to InnoDB 5.6, the other engines do better at mid than at high concurrency. I did not debug this.

update-nonindex

The workload here doesn't require secondary index maintenance. Interesting results:
  • MyRocks is similar to InnoDB 5.7/8.0. All are better than InnoDB 5.6.

read-write range=100

Interesting results:
  • MyRocks is similar to InnoDB 5.7/8.0. Both are slightly better than InnoDB 5.6 at mid concurrency and much better at high. I didn't debug the difference at high concurrency. Possible problems for InnoDB 5.6 include: write-back stalls, mutex contention.

read-write range=10000

Interesting results:
  • MyRocks is better than InnoDB 5.6
  • InnoDB 5.7/8.0 are better than MyRocks. Long range scans are more efficient for InnoDB starting in 5.7 and this test scans 10,000 rows per statement versus 100 rows in the previous section.

read-only range=100

This test scans 100 rows/query. Interesting results:
  • MyRocks is similar or better than InnoDB 5.6 except at high concurrency. For the read-write tests above MyRocks benefits from faster writes, but this test is read-only.
  • InnoDB 5.7/8.0 are similar or better than InnoDB 5.6. Range scan improvements offset the cost of new code.

read-only.pre range=10000

This test scans 10,000 rows/query and is run before the write heavy tests. Interesting results:
  • MyRocks is similar to slightly worse than InnoDB 5.6
  • InnoDB 5.7/8.0 are better than InnoDB 5.6

read-only range=10000

This test scans 10,000 rows/query and is run after the write heavy tests. Interesting results:
  • MyRocks is similar to slightly worse than InnoDB 5.6
  • InnoDB 5.7/8.0 are better than InnoDB 5.6 at mid and high concurrency
  • The differences with InnoDB 5.6 here are smaller than in the previous test above that is run before the write-heavy tests.

point-query.pre

This test is run before the write heavy tests. Interesting results:
  • MyRocks is slightly worse than InnoDB 5.6
  • InnoDB 5.7/8.0 are slightly better than InnoDB 5.6 

point-query

This test is run after the write heavy tests. Interesting results:

  • MyRocks and InnoDB 5.7/8.0 are slightly worse than InnoDB 5.6


random-points.pre

This test is run before the write heavy tests. Interesting results:

  • MyRocks and InnoDB 5.7/8.0 do better than InnoDB 5.6 at low concurrency but their advantage decreases at mid and high concurrency.


random-points

This test is run after the write heavy tests. Interesting results:

  • MyRocks and InnoDB 5.7 are slightly worse than InnoDB 5.6
  • InnoDB 8.0 is slightly better than InnoDB 5.6



hot-points

The working set for this test is cached and the results are similar to the in-memory benchmark.


insert

Interesting results:
  • MyRocks is worse than InnoDB 5.6 at low/mid concurrency and slightly better at high.
  • InnoDB 5.7/8.0 are similar or worse than InnoDB 5.6 at low/mid concurrency and better at high concurrency.

Monday, December 11, 2017

Sysbench: in-memory and a fast server, part 2

This post has results for in-memory sysbench with 1 table and 8M rows/table. My previous post was for in-memory sysbench with 8 tables and 1M rows/table. The goal is to understand whether performance is lost when there is more contention for a table, and in some cases more contention for the same rows. This repeats tests I ran in September and the results are similar.

There are four tests for which QPS with 1 table is much worse than with 8 tables:
  • update-one - all engines do worse. This is expected.
  • random-points - InnoDB and TokuDB do worse
  • hot-points - all engines do worse
  • insert - InnoDB and TokuDB do worse

All of the data is on github. The graphs below have the absolute QPS for 48 concurrent clients.

update-one

All engines do worse with 1 table. I am not surprised because only 1 row gets all updates while the updates are spread across 8 rows in the 8-table test.

random-points

InnoDB and TokuDB do worse with 1 table. My previous post shows where the contention occurs.


hot-points

All engines do worse with 1 table. My previous post shows where the contention occurs. Issue 674 is open to make this better in MyRocks.

insert

InnoDB and TokuDB do worse with 1 table. I have yet to debug this.



Sysbench: in-memory and a fast server

In this post I share results for in-memory sysbench on a fast server using MyRocks, InnoDB and TokuDB. To save time I share throughput results at low, mid and high concurrency but skip the HW efficiency metrics that I derive from vmstat and iostat output.

tl;dr - for in-memory sysbench
  • MyRocks does worse than InnoDB for most tests, sometimes a lot worse
  • MyRocks gets up to 2X more QPS for write-heavy tests with the binlog disabled. The cost from the binlog is larger for it than for InnoDB. This is an opportunity to make MyRocks better.
  • InnoDB 5.7 and 8.0 tend to do better than InnoDB 5.6 at high concurrency and worse at low concurrency. 
  • For mid concurrency InnoDB 5.7 and 8.0 tend to do better than InnoDB 5.6 for write-heavy but worse for read-heavy except for range queries
  • InnoDB 5.7 and 8.0 benefit from improvements to range scan efficiency and a reduction in mutex contention. But InnoDB 5.7/8.0 has more overhead from code above the storage engine and that costs up to 20% of QPS.

Configuration

My usage of sysbench is described here. The test server has 48 HW threads, fast SSD and 256gb of RAM. The database block cache (buffer pool) was large enough to cache all tables. Sysbench was run with 8 tables and 1M rows/table. Tests were repeated for 1, 2, 4, 8, 16, 24, 32, 40, 48 and 64 concurrent clients. At each concurrency level the read-only tests run for 180 seconds, the write-heavy tests for 300 seconds and the insert test for 180 seconds.

Tests were run for MyRocks, InnoDB from upstream MySQL, InnoDB from FB MySQL and TokuDB. The binlog was enabled but sync on commit was disabled for the binlog and database log, but tests were repeated for MyRocks with the binlog disabled. All engines used jemalloc. Mostly accurate my.cnf files are here.
  • MyRocks was compiled on October 16 with git hash 1d0132. Compression was not used.
  • Upstream 5.6.35, 5.7.17, 8.0.1, 8.0.2 and 8.0.3 were used with InnoDB. SSL was disabled and 8.x used the same charset/collation as previous releases.
  • InnoDB from FB MySQL 5.6.35 was compiled on June 16 with git hash 52e058.
  • TokuDB was from Percona Server 5.7.17. Compression was not used.
The performance schema was enabled for upstream InnoDB and TokuDB. It was disabled at compile time for MyRocks and InnoDB from FB MySQL because FB MySQL 5.6 has user & table statistics for monitoring.

Results

All of the data for the tests is on github. Graphs for each test are below. The graphs show the QPS for a test relative to the QPS for InnoDB 5.6.35 and a value > 1 means the engine gets more QPS than InnoDB 5.6.35. The graphs have data for tests with 1, 8 and 48 concurrent clients and I refer to these as low, mid and high concurrency. The tests are explained here and the results are in the order in which the tests are run except where noted below. The graphs exclude results for InnoDB from FB MySQL to improve readability.

For the write heavy tests I provide results for MyRocks with the binlog enabled (MyRocks) and with it disabled (MyRocks.nobl). MyRocks gets up to 2X more write QPS with the binlog disabled. It suffers much more than InnoDB when the binlog is enabled. Work is in progress to make that better.

update-inlist

Interesting results:
  • MyRocks is lousy at low and mid concurrency
  • MyRocks gets up to 2X more QPS with the binlog disabled
  • InnoDB 5.7/8.0 are worse than InnoDB 5.6 at low concurrency and better at mid & high concurrency


update-one

Interesting results:
  • MyRocks is worse than InnoDB
  • MyRocks gets up to 2X more QPS with the binlog disabled
  • InnoDB 5.7/8.0 are worse than InnoDB 5.6 at low concurrency and slightly better at high concurrency. But new code overheads limit the difference.

update-index

Interesting results:
  • MyRocks is better here than on other write-heavy tests relative to InnoDB because non-unique secondary index maintenance is read-free.
  • MyRocks gets up to 2X more QPS with the binlog disabled
  • InnoDB 5.7/8.0 are similar to InnoDB 5.6 at low concurrency and better at mid/high concurrency

update-nonindex

Interesting results:
  • MyRocks is worse than InnoDB 5.6 except at high concurrency
  • MyRocks gets up to 2X more QPS with the binlog disabled
  • InnoDB 5.7/8.0 are worse than InnoDB 5.6 at low concurrency, similar at mid and better at high concurrency. The difference between InnoDB 5.6 and 5.7/8.0 is smaller here than for update-inlist because this spends a larger fraction of time in optimizer/parser.

read-write range=100

Interesting results:
  • MyRocks is worse than InnoDB
  • InnoDB 5.7/8.0 are similar to InnoDB 5.6


read-write range=10000

Interesting results:
  • MyRocks is worse than InnoDB
  • InnoDB 5.7/8.0 are better than InnoDB 5.6 because range scans were improved

read-only range=100

Interesting results:
  • MyRocks is worse than InnoDB
  • InnoDB 5.7/8.0 are similar to InnoDB 5.6. Range scan improvements offset cost of new code.

read-only.pre range=10000

This test is run before the write heavy tests and the InnoDB B-Tree might be less fragmented as a result. Interesting results:
  • MyRocks is worse than InnoDB
  • InnoDB 5.7/8.0 are better than InnoDB 5.6 because range scans were improved and the range scan here is longer than in the previous section.

read-only range=10000

This test is run after the write heavy tests. Interesting results:
  • MyRocks is worse than InnoDB
  • InnoDB 5.7/8.0 are better than InnoDB 5.6 because range scans were improved

point-query.pre

This test is run before the write heavy tests and the InnoDB B-Tree might be less fragmented as a result. Interesting results:
  • MyRocks is worse than InnoDB except at high concurrency
  • InnoDB 5.7/8.0 are worse than InnoDB 5.6 except at high concurrency

point-query

This test is run after the write heavy tests. Interesting results:
  • MyRocks is worse than InnoDB except at high concurrency
  • InnoDB 5.7/8.0 are worse than InnoDB 5.6 except at high concurrency

random-points.pre

This test is run before the write heavy tests and the InnoDB B-Tree might be less fragmented as a result. Interesting results:
  • MyRocks is worse than InnoDB except at high concurrency
  • InnoDB 5.7/8.0 are worse than InnoDB 5.6 except at high concurrency

random-points

This test is run after the write heavy tests. Interesting results:
  • MyRocks is worse than InnoDB except at high concurrency. The gap with InnoDB is larger here than for random-points.pre. I assume that RocksDB suffers more than a B-Tree from the LSM equivalent of a fragmented search tree.
  • InnoDB 5.7/8.0 are worse than InnoDB 5.6 except at high concurrency

hot-points

The results here are similar to the results for random-points. The hot-points test is similar to random-points except there is more data contention. But as that is split across 8 tables it isn't significant. It will be significant for the test that uses 1 table.


insert

Interesting results:
  • MyRocks is worse than InnoDB except at high concurrency
  • MyRocks gets up to 2X more QPS with the binlog disabled
  • InnoDB 5.7/8.0 are worse than InnoDB 5.6 at low concurrency, similar at mid and better at high concurrency.

Thursday, December 7, 2017

Insert benchmark: IO-bound, high-concurrency, fast server, part 2

This is similar to the previous insert benchmark result for IO-bound and high-concurrency except it uses 1 table rather than 16 to determine how a storage engine behaves with more contention.

tl;dr
  • Inserts are much faster for MyRocks
  • The InnoDB PK uses 2X more space for the 1 table test than the 16 table test. I filed bug 88827.
  • MyRocks secondary index scans have a similar performance to InnoDB
  • MyRocks PK scans are ~2X slower than InnoDB 5.6 on the 16 table test but ~3X faster on the 1 table test. This might also be bug 88827.

Configuration

Start by reading my previous post. The test still uses 2B rows but there is only one table here when the previous test used 16 tables. The load still uses 16 concurrent clients. The read-write test still uses 16 read clients and 16 write clients. But the scan test uses 1 client here versus 16 clients on the previous test and the scan test takes longer to finish.

While I have results for InnoDB from FB MySQL I exclude them from the graphs to improve readability.

Results

All of the data for the 1-table tests is here and for the 16-table tests is here. I adjusted iostat bytes written metrics for MyRocks because it currently counts bytes trimmed as bytes written which is an issue for RocksDB but my adjustment is not exact.

For most of the results below I compare rates for this test with rates for the 16-table test and skip the graphs that show HW efficiency metrics.

Size

This graph shows the database size when the load ends for the 16 table and 1 table tests. For MyRocks and TokuDB the database size is similar for both tests. The InnoDB result is odd because the size is almost 1.25X larger for the 1 table test. From SHOW TABLE STATUS the data_length was about 2X larger for the 1 table test. From iostat output the PK scan for the 1 table test reads ~205gb while the 16 table test reads ~125gb. So the PK uses almost 2X more space than it should when there are concurrent inserters to the same table. The inserts are multi-row and the PK is auto-inc so the inserts grow the b-tree to the right. I filed bug 88827 for this.


Load

This graph shows the insert rate for the 16 and 1 table tests. Some engines get more inserts/second with 1 table, others get more with 16 tables:
  • More with 16 tables: ~1.1X more for MyRocks, ~2X more for TokuDB
  • More with 1 table: 1.3X more for InnoDB 5.6, 1.4X more for InnoDB 5.7 and 8.0



Scan

These graphs show the scan times relative to the scan time for InnoDB 5.6.35. A value > 1 means the engine is slower than InnoDB. The first graph is from the 16 table test and the second is from the 1 table test. In both cases the MyRocks secondary index scan is about as fast as InnoDB. But the PK scan MyRocks is more than 2X slower than InnoDB in the 16 table test and 2X to 3X faster than InnoDB for the 1 table test.

What might explain the difference in PK scan times? MyRocks was ~2X slower than InnoDB in the 16 table test and ~3X faster than InnoDB in the 1 table test. That is a change of 6X. The output from vmstat and iostat can help for the 16 table and 1 table tests. This is in the q5 section which is the second scan of the PK and I will compare myrocks.jun16.none with inno5635.

  • First, the InnoDB PK uses ~2X more space for the 1 table test, so there is 2X more data to scan. But with hand waving that should explain only 2X of the 6X change.
  • On the 16 table test InnoDB 5.6 gets ~3X more MB/s of storage reads compared to MyRocks: 2353.7 vs ~828.1. But on the 1 table test InnoDB 5.6 gets less storage read MB/s than MyRocks: 67.4 vs 94.8.
  • One reason for getting less read MB/s from storage is using more CPU and that appears true in this case. The Mcpu/o column has the CPU overhead per row read. For the 16 table test it is 1.278 for InnoDB 5.6 vs 1.100 for MyRocks. On the 1 table test it is 3.547 for InnoDB 5.6 vs 1.807 for MyRocks. So InnoDB is only using ~1.2X more CPU than MyRocks on the 16 table test but ~2X more CPU on the 1 table test.