Friday, December 1, 2017

tpcc-mysql, in-memory, low-concurrency: MyRocks and InnoDB

This has results for tpcc-mysql with a workload that has low-concurrency when the database fits in memory. Tests were run for MyRocks, InnoDB and TokuDB.

tl;dr:
  • InnoDB from upstream MySQL 5.6.35 has the best throughput
  • There is a large drop in throughput from upstream MySQL 5.6.35 to 8.0.3
  • MyRocks has the best write efficiency. InnoDB writes ~3X more to storage per transaction.
  • Uncompressed MyRocks has the best space efficiency. InnoDB uses ~1.5X more space.

Disclaimer

For many reasons this isn't TPC-C.

Read committed was used for MyRocks while the official benchmark requires repeatable read for some of the transaction types. When MyRocks supports gap locks this test can be repeated using repeatable read for it. Repeatable read was used for InnoDB and TokuDB.

Configuration

I used tpcc-mysql from Percona with my helper scripts. The test was run with 100 warehouses and 1 customer - so the database is small and there is no concurrency. The test server has 48 HW threads, fast SSD and 256gb of RAM.

The test pattern is load and then run 12 hours of transactions 1 hour at a time. Results are reported for the last hour of transactions. I think I used 1000 milliseconds as the response time SLA.

Tests were run for MyRocks, InnoDB from upstream MySQL, InnoDB from FB MySQL and TokuDB. The binlog was enabled but sync on commit was disabled for the binlog and database log. All engines used jemalloc. Mostly accurate my.cnf files are here.
  • MyRocks was compiled on October 16 with git hash 1d0132. Compression was not used.
  • Upstream 5.6.35, 5.7.17, 8.0.1, 8.0.2 and 8.0.3 were used with InnoDB. SSL was disabled and 8.x used the same charset/collation as previous releases.
  • InnoDB from FB MySQL 5.6.35 was compiled on June 16 with git hash 52e058. 
  • TokuDB was from Percona Server 5.7.17. Compression was not used.
The performance schema was enabled for upstream InnoDB and TokuDB. It was disabled at compile time for MyRocks and InnoDB from FB MySQL because FB MySQL 5.6 has user & table statistics for monitoring.

Graphs

The first graph has the TPMC for an engine relative to the TPMC for InnoDB from upstream MySQL 5.6.35. InnoDB from 5.6.35 does the best. There is a regression from 5.6 to 5.7 to 8.x. I assume that is from code above the storage engine.

The second graph has the CPU overhead per TPMC relative to the value for InnoDB from upstream MySQL 5.6.35. A value > 1 means that the engine uses more CPU/transaction than the base case. InnoDB from 5.6.35 again does the best, there is a regression from 5.6 to 8.x and CPU overhead (mostly) explains transaction throughput.
This graph shows the KB written to storage per TPMC and the values are relative to InnoDB from upstream MySQL 5.6.35. Unfortunately the base case (InnoDB 5.6.35) was an outlier so all other values on the graph are much smaller than 1. MyRocks write efficiency is excellent as InnoDB 5.7 and 8.0 write ~3X more to storage per transaction. I assume that InnoDB 5.6.35 is an outlier because of furious flushing. The my.cnf used for this test is the same as used for InnoDB 5.6 on my other tests.
This graph shows the size of the database after the load and after 12 hours of transactions. Rows are added during the transaction test, the database is expected to grow and the growth is a function of the transaction rate. Fortunately the engines here have similar transaction rates so it is mostly fair to compare database size.

Uncompressed InnoDB uses ~1.5X more space than uncompressed MyRocks.
Results

All of the data is here. I adjusted iostat metrics for MyRocks because iostat currently counts bytes trimmed as bytes written which is an issue for RocksDB but my adjustment is not exact. The database sizes are rounded to the nearest 1gb above 10gb -- I need to fix my scripts.

This table has the database size in GB when the load finishes. Note that the size for MyRocks here is larger than it is after 1 hour of transactions. That is a bit odd and I assume that data to be removed by compaction explains the difference or maybe rounding to the nearest GB is the problem. See also my disclaimer on database size in the graphs section above.

size    engine
11      MyRocks
 9.2    FbInno5635
 9.4    Inno5635
 9.4    Inno5717
 9.4    Inno801
 8.9    Toku5717

This table has throughput and efficiency metrics for the first hour of transactions. InnoDB from upstream MySQL 5.6.35 is an outlier for wkb/t and the cause might be furious flushing. It also sustains the best throughput (tpmc). MyRocks has the best write efficiency (wkb/t).

run, hour=1
wMB/s  wkb/t  tpmc  cpu/s  Mcpu/t  size  engine
  6.6   0.91  7266  3.5    479     10    MyRocks
-
 23.3   2.43  9567  3.7    390     11    FbInno5635
140.7  14.21  9902  4.1    415     11    Inno5635
 22.5   2.50  8976  3.9    430     11    Inno5717
 21.9   2.61  8403  3.8    458      9    Inno801
 20.7   2.77  7469  3.8    512      9    Inno802
 19.9   2.62  7573  3.8    506     11    Inno803
-
 14.1   2.36  5981  3.4    577     13    Toku5717

legend:
* wMB/s - iostat write MB/s, average
* wkb/t - iostat KB written per tpmc
* tpmc - transaction rate
* cpu/s - 
average value of vmstat us + sy columns 
* Mcpu/t - normalized CPU time per tpmc
* size - database size in GB at test end

This table has throughput and efficiency metrics for the 12th hour of transactions. InnoDB from upstream MySQL 5.6.35 is an outlier for wkb/t and the cause might be furious flushing. It also sustains the best throughput (tpmc). MyRocks has the best write efficiency (wkb/t).

run, hour=12
wMB/s  wkb/t  tpmc  cpu/s  Mcpu/t  size  engine
  8.1   1.13  7136  3.4    481     15    MyRocks
-
 31.8   3.30  9616  3.7    390     22    FbInno5635
148.5  15.12  9821  4.2    424     24    Inno5635
 29.9   3.36  8899  3.9    436     23    Inno5717
 27.6   3.32  8322  3.8    462     22    Inno801
 25.6   3.45  7415  3.9    521     20    Inno802
 26.0   3.47  7495  3.8    512     21    Inno803
-
 13.6   2.34  5792  3.4    592     17    Toku5717

legend:
* wMB/s - iostat write MB/s, average
* wkb/t - iostat KB written per tpmc
* tpmc - transaction rate
* cpu/s - 
average value of vmstat us + sy columns 
* Mcpu/t - normalized CPU time per tpmc
* size - database size in GB at test end

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...