Friday, February 24, 2017

Part 2 - sysbench, MyRocks, InnoDB and a small server

This is part 2 of the performance report for sysbench, MyRocks, InnoDB and a small server. The first part covered an IO-bound workload where the database was larger than RAM. This part covers a cached workload where the database fits in the MyRocks block cache and InnoDB buffer pool.

tl;dr
  • MyRocks is faster than InnoDB in the write-heavy tests.
  • InnoDB in MySQL 5.6 is faster than MyRocks in the read-heavy tests. I can't reproduce this on the newer/faster CPUs I use at work. For this test I used an Intel NUC with a 5th generation core i3 CPU. I am still trying to figure this out.
  • InnoDB in MySQL 5.6 is faster than in 5.7 for most write-heavy tests.
  • InnoDB in MySQL 5.7 is faster than in 5.6 for most read-heavy tests. I am curious why this depends on read-heavy vs write-heavy.
Details

The previous blog post has all of the details and and shared results for the IO-bound test that used 4 tables with 40M rows per table. This test used 4 tables with 1M rows per table. Tests are run for 3 storage engines - MyRocks from FB MySQL merged to upstream MySQL 5.6.35 (myrocks-5635), InnoDB from upstream MySQL 5.6.35 (innodb-5635) and InnoDB from upstream MySQL 5.7.10 (innodb-5710). The sequence of tests is the same as described in the previous blog post.

A sample command line for the test is:
bash all_small.sh 4 1000000 600 600 300 innodb 1 0 \
    ~/b/orig5710/bin/mysql none ~/b/sysbench/share/sysbench


Update-only with secondary index maintenance

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
6795    10738   13212   myrocks-5635
3907     5912    7323   innodb-5635
3286     4820    6012   innodb-5710
- optimized my.cnf
6787    10735   13103   myrocks-5635
3616     6021    7393   innodb-5635
3122     4733    6131   innodb-5710

Summary:
  • MyRocks is faster than InnoDB because secondary index maintenance is write-only (read-free). On the IO-bound test that avoided stalls from page reads. Here it saves on CPU.
  • The optimized my.cnf doesn't change performance
  • InnoDB in MySQL 5.6 is faster than in 5.7

Update-only without secondary index maintenance

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
7373    11510   14762   myrocks-5635
8050    10783   13185   innodb-5635
6528     8971   11338   innodb-5710
- optimized my.cnf
7335    11374   14609   myrocks-5635
8809    12240   15179   innodb-5635
6644     8935   11580   innodb-5710

Summary:
  • InnoDB in MySQL 5.6 is slightly faster than MyRocks. The update statement in this test doesn't require secondary index maintenance so MyRocks loses that benefit.
  • The optimized my.cnf helps InnoDB in MySQL 5.6
  • InnoDB in MySQL 5.6 is much faster than in 5.7

Delete

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       concurrency
- default my.cnf
8303    myrocks-5635
7848    innodb-5635
6461    innodb-5710
- optimized my.cnf
8265    myrocks-5635
7819    innodb-5635
6215    innodb-5710

Summary:
  • MyRocks is faster than InnoDB
  • The optimized my.cnf doesn't change performance
  • InnoDB in MySQL 5.6 is faster than in 5.7

Write-only

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
12547   19685   25236   myrocks-5635
11209   15429   18387   innodb-5635
 9701   13035   15308   innodb-5710
- optimized my.cnf
12197   19241   24533   myrocks-5635
13076   17780   21097   innodb-5635
10401   14496   17402   innodb-5710

Summary:
  • MyRocks is faster than InnoDB at concurrency >= 2
  • The optimized my.cnf helps InnoDB
  • InnoDB in MySQL 5.6 is faster than in 5.7

Read-write with --range-size=100


The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
5404    10126   13214   myrocks-5635
8507    12658   16867   innodb-5635
8983    13311   16806   innodb-5710
- optimized my.cnf
7426    12116   16076   myrocks-5635
10084   15236   20895   innodb-5635
10480   15666   20830   innodb-5710

Summary:
  • InnoDB is faster than MyRocks because there is more CPU overhead in MyRocks for range queries
  • The optimized my.cnf helps MyRocks and InnoDB
  • InnoDB has similar performance for MySQL 5.6 and 5.7

Read-write with --range-size=10000

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
153     286     311     myrocks-5635
273     484     652     innodb-5635
434     733     863     innodb-5710
- optimized my.cnf
185     330     531     myrocks-5635
278     523     684     innodb-5635
449     784     902     innodb-5710

Summary:
  • InnoDB is a lot faster than MyRocks  because of the CPU overhead for range queries
  • The optimized my.cnf helps MyRocks and InnoDB
  • InnoDB in MySQL 5.7 is much faster than in 5.6. I think something was done to make range queries more efficient in 5.7.

Read-only with --range-size=10

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
11260   19129   27348   myrocks-5635
13846   25056   37479   innodb-5635
14533   25412   37048   innodb-5710
- optimized my.cnf
13778   22240   31544   myrocks-5635
15348   27860   42859   innodb-5635
15320   27187   42294   innodb-5710

Summary:
  • InnoDB is a lot faster than MyRocks. On my work servers with newer/faster CPUs I don't reproduce this and am still trying to understand the cause.
  • The optimized my.cnf helps MyRocks and InnoDB
  • InnoDB has similar performance for MySQL 5.6 and 5.7

Read-only with --range-size=100

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
4393     8245   10191   myrocks-5635
8943    16784   25504   innodb-5635
11288   19771   28019   innodb-5710
- optimized my.cnf
8134    14686   20881   myrocks-5635
9847    18361   26953   innodb-5635
11778   20905   30065   innodb-5710

Summary:
  • InnoDB is a lot faster than MyRocks. See summary for read-only with --range-size=10
  • InnoDB in MySQL 5.7 is faster than 5.6. I think range queries were made more efficient in 5.7

Read-only with --range-size=10000


The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
106     209     242     myrocks-5635
214     411     536     innodb-5635
357     624     711     innodb-5710
- optimized my.cnf
199     380     486     myrocks-5635
227     424     556     innodb-5635
374     648     732     innodb-5710

Summary:
  • InnoDB is a lot faster than MyRocks. See summary for read-only with --range-size=10
  • InnoDB in MySQL 5.7 is faster than 5.6. I think range queries were made more efficient in 5.7

Point-select

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
16482   29111   36510   myrocks-5635
15840   30280   37841   innodb-5635
16347   29451   47411   innodb-5710
- optimized my.cnf
16411   28628   35344   myrocks-5635
18481   34911   43390   innodb-5635
19169   31806   53018   innodb-5710

Summary:
  • InnoDB is a lot faster than MyRocks. See summary for read-only with --range-size=10
  • It isn't clear whether InnoDB is faster in MYSQL 5.6 or 5.7

Insert-only

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
8003    11722   13323   myrocks-5635
9548    11549   13323   innodb-5635
8065     9541    9816   innodb-5710
- optimized my.cnf
8271    12312   13804   myrocks-5635
9712    12732   14917   innodb-5635
8357    10716   11985   innodb-5710

Summary:
  • InnoDB in MySQL 5.6 is faster than MyRocks
  • The optimized my.cnf helps performance
  • InnoDB in MySQL 5.6 is faster than in 5.7

Thursday, February 23, 2017

Using modern sysbench to compare MyRocks and InnoDB on a small server

I used sysbench to compare MyRocks and InnoDB on a small server. I ran tests for an IO-bound and in-memory configuration and share results for the IO-bound setup here.

tl;dr
  • On some workloads InnoDB is faster than MyRocks but this does not reproduce on the faster/newer CPUs that I use at work. I wasn't able to explain this but it looks like the memory-latency bound functions in RocksDB account for more of the CPU overhead on my home servers. Maybe I need to try Intel vTune, I just wish it didn't cost $899.
  • Performance with the default my.cnf is very good for all of the engines. The optimized my.cnf is more useful on the larger servers I use at work. One change in the optimized my.cnf increases the RocksDB page size from 4kb to 16kb which can hurt throughput on IO-bound workloads.
  • MyRocks did much better on inserts, updates and deletes.
  • InnoDB did much better on range scans and better on point selects. There are some changes in MySQL 5.7 that makes range scans faster. The goal for MyRocks is much better space & write efficiency than InnoDB (use less SSD, SSD lasts longer) with similar or good-enough read efficiency. We achieved the space and write efficiency goals. There is more work to be done for read efficiency.
  • There isn't much of a difference for InnoDB between MySQL 5.6.35 and 5.7.10. The difference is more obvious on workloads with more concurrency using larger servers.
  • There is little overhead from compression with MyRocks. There is a lot with InnoDB, especially on write-heavy tests where uncompressed InnoDB is much faster than compressed InnoDB. 
Details

The small server is an Intel NUC (NUC5i3ryh) with a 5th generation core i3 that runs at 2.1GHz. I use these at home because they are small, quiet, efficient and affordable. The server has 2 CPU cores, 4 with HT enabled, 8G of RAM, a 110gb Samsung 850 EVO m.2 SSD and a 7200 RPM disk. The OS is Ubuntu 16.04.

Tests are run for three storage engines. The first is myrocks-5635 which is MyRocks from FB MySQL merged to upstream MySQL 5.6.35. The second is innodb-5635 which is InnoDB from upstream MySQL 5.6.35. The third is innodb-5710 which is InnoDB from upstream MySQL 5.7.10. Command lines for cmake to build each binary are here.

The test was repeated three times for each engine: with the default my.cnf, with an optimized my.cnf and no compression and with an optimized my.cnf and zlib compression. This test is the IO-bound configuration with 4 tables and 40M rows per table. The database is larger than RAM but my test scripts didn't record the database size (need to fix that).
  • myrocks-5635 - the my.cnf files for MyRocks are here. Snappy compression is used for all levels in the default my.cnf. For the optimized my.cnf with zlib compression there is no compression for L0-L2, LZ4 compression for L3 to the next to last level and then zlib for the last level of the LSM tree. Prior to the insert-only tests the database was ~34gb without compression and ~18gb with zlib compression. Compaction IO stats are here, but suffers from a bug that is fixed but not yet in my build.
  • innodb-5635 - the my.cnf files for InnoDB in MySQL 5.6.35 are here
  • innodb-5710 - the my.cnf files for InnoDB in MySQL 5.7.10 are here.
The benchmark client is modern sysbench run by a helper script that uses a sequence of  workloads via bundled lua scripts. Tests are run in this order:
  • prepare - this isn't a test, it creates and loads the tables. For the IO-bound test there were 4 tables with 40M rows per table. The sysbench table has one secondary index.
  • oltp_update_index.lua - update-only, secondary index maintenance is required
  • oltp_update_non_index.lua - update-only, secondary index maintenance is not required
  • oltp_delete.lua - a sequence of delete row, re-insert row operations. It doesn't use transactions (yet) so it can't be run with more than 1 thread.
  • oltp_write_only.lua - all of the writes but none of the reads from oltp_read_write.lua
  • oltp_read_write.lua - run first with --range-size=100 and then again with --range-size=10000 (short and then long range scans)
  • oltp_read_only.lua - all of the reads but none of the writes from oltp_read_write.lua. Run four times with --range-size in 10, 100, 1000 and 10000.
  • oltp_point_select.lua - each operation fetches all columns in one row by primary key
  • oltp_insert.lua - insert-only

Sample command lines:
# myrocks
bash all_small.sh 4 40000000 600 600 300 rocksdb 1 0 ~/b/myrocks/bin/mysql \
    none ~/b/sysbench/share/sysbench

# InnoDB in MySQL 5.6.35 without compression
bash all_small.sh 4 40000000 600 600 300 innodb 1 0 ~/b/orig5635/bin/mysql \
    none ~/b/sysbench/share/sysbench

# InnoDB in MySQL 5.6.35 with zlib compression
bash all_small.sh 4 40000000 600 600 300 innodb 1 0 ~/b/orig5635/bin/mysql \
    key_block_size=8 ~/b/sysbench/share/sysbench

Update-only with secondary index maintenance


The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
4162    6655    8585    myrocks-5635
 543     712     810    innodb-5635
 610     714     816    innodb-5710
- optimized my.cnf without compression
4236    6762    8394    myrocks-5635
 552     687     810    innodb-5635
 579     699     791    innodb-5710
- optimized my.cnf with zlib compression
3540    5665    7148    myrocks-5635
 218     403     533    innodb-5635
 237     389     469    innodb-5710

Summary:
  • MyRocks is best because non-unique secondary index maintenance is write-only (read-free)
  • The optimized my.cnf didn't help
  • Compressed InnoDB is worse than uncompressed InnoDB. The per-index mutex might be the largest problem. This isn't explained by (de)compression latency.
  • InnoDB in MySQL 5.6 and 5.7 have similar performance

Update-only without secondary index maintenance

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
4596    7726    10100   myrocks-5635
2224    2892     3295   innodb-5635
2327    2886     3176   innodb-5710
- optimized my.cnf without compression
4578    7202     9616   myrocks-5635
2887    3835     4364   innodb-5635
2694    3419     4008   innodb-5710
- optimized my.cnf with zlib compression
4031    6320     7998   myrocks-5635
 484     768      888   innodb-5635
 500     687      815   innodb-5710

Summary:
  • MyRocks is best but the difference with InnoDB is smaller because no engines have to do secondary index maintenance for this test.
  • The optimized my.cnf helped InnoDB but hurt MyRocks
  • Compressed InnoDB is much worse than uncompressed InnoDB. The per-index mutex might be the largest problem.
  • InnoDB in MySQL 5.6 is slightly faster than in 5.7

Delete

The numbers in the table are the QPS for 1 thread. I can't use more threads until I update the test script to use transactions for each delete/reinsert pair.

1       concurrency
- default my.cnf
6294    myrocks-5635
 963    innodb-5635
1010    innodb-5710
- optimized my.cnf without compression
5821    myrocks-5635
1077    innodb-5635
1090    innodb-5710
- optimized my.cnf with zlib compression
5453    myrocks-5635
 644    innodb-5635
 588    innodb-5710

Summary:
  • MyRocks is best. That might be because secondary index maintenance is write-only (read-free).
  • The optimized my.cnf didn't make a big difference
  • Compressed InnoDB is much worse than uncompressed InnoDB. The per-index mutex might be the largest problem.
  • InnoDB in MySQL 5.6 and 5.7 have similar performance

Write-only

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
8813    12937   16351   myrocks-5635
1808     2151    2344   innodb-5635
1895     2172    2340   innodb-5710
- optimized my.cnf without compression
8062    12565   16010   myrocks-5635
1886     2158    2644   innodb-5635
1830     2160    2493   innodb-5710
- optimized my.cnf with zlib compression
7250    10977   13965   myrocks-5635
 791     1030    1214   innodb-5635
 724      972    1088   innodb-5710

Summary:
  • MyRocks is best. That might be because secondary index maintenance is write-only (read-free).
  • The optimized my.cnf didn't make a big difference
  • Compressed InnoDB is much worse than uncompressed InnoDB. The per-index mutex might be the largest problem.
  • In some cases InnoDB in MySQL 5.6 is slightly faster than 5.7

Read-write with --range-size=100

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
4740    8490    11516   myrocks-5635
2800    3649     4445   innodb-5635
3392    4424     5036   innodb-5710
- optimized my.cnf without compression
5480    9508    13005   myrocks-5635
3370    4729     5555   innodb-5635
3573    4814     5614   innodb-5710
- optimized my.cnf with zlib compression
5157    8525    11883   myrocks-5635
1756    2415     2961   innodb-5635
1637    2324     2769   innodb-5710

Summary:
  • MyRocks is best because it processes writes faster than InnoDB and the writes dominate the workload
  • The optimized my.cnf helped
  • Compressed InnoDB is much worse than uncompressed InnoDB, perhaps because of the per-index mutex.
  • InnoDB in MySQL 5.6 and 5.7 have similar performance

Read-write with --range-size=10000

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
174     305     416     myrocks-5635
208     348     484     innodb-5635
346     550     723     innodb-5710
- optimized my.cnf without compression
207     370     500     myrocks-5635
241     425     596     innodb-5635
374     607     792     innodb-5710
- optimized my.cnf with zlib compression
194     363     469     myrocks-5635
168     298     416     innodb-5635
213     369     484     innodb-5710

Summary:
  • Compared to oltp_read_write.lua with --range-size=100, the bottleneck here switches from writes to reads and InnoDB is faster than MyRocks for long range scans.
  • The optimized my.cnf helps MyRocks
  • Compressed InnoDB is much worse than uncompressed InnoDB. The largest problem might be the per-index mutex but another problem is (de)compression latency.
  • InnoDB in MySQL 5.7 is much faster than in 5.6.

Read-only with --range-size=10

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
 8407   13967   19378   myrocks-5635
 7725   14452   21220   innodb-5635
 9962   17029   24023   innodb-5710
- optimized my.cnf without compression
 8089   13189   16243   myrocks-5635
11219   19683   27638   innodb-5635
11536   19198   26926   innodb-5710
- optimized my.cnf with zlib compression
 8076   12884   18263   myrocks-5635
 7549   12009   17570   innodb-5635
 7380   11700   17173   innodb-5710

Summary:
  • InnoDB is faster than MyRocks because of the read overhead from an LSM.
  • MyRocks QPS on the read-only tests has extra variance because the state of the memtable (number of keys) and L0 (number of files) has more variance when the read-only test follows write heavy tests.
  • On this test InnoDB does better than MyRocks on my older/slower CPU (5th generation core i3) that I have at home while MyRocks does better on newer Intel Xeon CPUs I get at work.
  • Compressed InnoDB is worse than uncompressed InnoDB. I assume this is decompression latency.
  • InnoDB in MySQL 5.6 and 5.7 has similar performance

Read-only with --range-size=100

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
5130     8834   12320   myrocks-5635
5818    10236   15274   innodb-5635
7682    13208   18709   innodb-5710
- optimized my.cnf without compression
5683     9477   12781   myrocks-5635
7455    13236   19261   innodb-5635
8514    14517   20507   innodb-5710
- optimized my.cnf with zlib compression
5445     8872   12927   myrocks-5635
5140     8150   12456   innodb-5635
5432     8463   12911   innodb-5710

Summary:
  • See summary for oltp_read_only.lua with --range-size=10
  • The difference between InnoDB and MyRocks gets larger as the range scan gets larger

Read-only with --range-size=10000

The numbers in the table are the QPS for 1, 2 and 4 threads.
1       2       4       concurrency
- default my.cnf
138     240     325     myrocks-5635
175     314     445     innodb-5635
306     519     643     innodb-5710
- optimized my.cnf without compression
161     285     363     myrocks-5635
208     378     524     innodb-5635
324     520     685     innodb-5710
- optimized my.cnf with zlib compression
156     290     377     myrocks-5635
146     270     365     in-5636.zlib.opt
191     349     430     innodb-5710

Summary:
  • See summary for oltp_read_only.lua with --range-size=10
  • The difference between InnoDB and MyRocks gets larger as the range scan gets larger
  • InnoDB in MySQL 5.7 is faster than in 5.6. Something was done to make range scans faster.

Point-select

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
10839   18506   25871   myrocks-5635
 8992   16318   24090   innodb-5635
11332   19621   27239   innodb-5710
- optimized my.cnf without compression
10157   16694   22271   myrocks-5635
12944   23440   33489   innodb-5635
13123   22321   31288   innodb-5710
- optimized my.cnf with zlib compression
 9790   16486   22853   myrocks-5635
 7878   13071   18917   innodb-5635
 7484   12358   17911   innodb-5710

Summary:
  • InnoDB is faster than MyRocks
  • The optimized my.cnf helps InnoDB but hurts MyRocks
  • Compressed InnoDB is slower than uncompressed InnoDB. I assume this is decompression latency.
  • InnoDB in MySql 5.6 and 5.7 has similar performance

Insert-only

The numbers in the table are the QPS for 1, 2 and 4 threads.

1       2       4       concurrency
- default my.cnf
7351    10654   12527   myrocks-5635
2120     6343    8264   innodb-5635
2694     6337    7784   innodb-5710
- optimized my.cnf without compression
7556    11341   12556   myrocks-5635
2356     4437    4540   innodb-5635
2227     4615    4807   innodb-5710
- optimized my.cnf with zlib compression
7210    10816   11273   myrocks-5635
3837     4732    5370   innodb-5635
2979     3885    4528   innodb-5710

Summary:
  • MyRocks is faster perhaps because secondary index maintenance is write-only (read-free).
  • The optimized my.cnf hurts InnoDB
  • Compressed InnoDB is slower than uncompressed InnoDB. I assume this is from both the per-index mutex and (de)compression latency.
  • InnoDB in MySQL 5.6 and 5.7 has similar performance

Wednesday, February 8, 2017

MyRocks, InnoDB and in-memory insert performance

This is yet another result to understand an insert-only workload. In this case the database is small enough to be cached by the storage engine. Previous results were explained here, here and here. From several tests it is apparent that MyRocks and InnoDB from MySQL 5.6 have similar throughput while InnoDB from MySQL 5.7 does much better for cached & insert-only when the server has fast SSD.

In this result I changed a few options to reduce the chance that compaction stalls will occur with MyRocks. I increased level0_slowdown_writes_trigger to 20 and level0_stop_writes_trigger to 30. I also ran tests with universal compaction. We probably have some work to do here -- improving default option values and making compaction throttling less spiky.

tl;dr
  • MyRocks was faster than InnoDB in MySQL 5.6 and 5.7 at 1 and 2 concurrent clients
  • MyRocks and InnoDB/MySQL-5.6 have similar throughput at 4+ concurrent clients
  • InnoDB/MySQL-5.7 is fastest at 8+ concurrent clients
  • Disabling the binlog helps MyRocks more than InnoDB. We have work in progress to make binlog processing faster for MyRocks.
  • MyRocks with leveled compaction and the binlog is limited by compaction stalls in moving data from level 0 to level 2.
Note that MyRocks did great when able to avoid the bottlenecks from binlog processing and compaction stalls. That is apparent in the result for universal compaction with the binlog disabled. 

Test

For this test I inserted 400M rows using the insert benchmark. The test was run for 1, 2, 4, 8, 12 and 16 concurrent clients. There was a table per client and each table has 3 secondary indexes. The inserts are in PK order. The database cache was large enough to cache the database.

Tests were repeated with the binlog enabled and disabled. For MyRocks I used a build from November 21 to match what was used in previous tests. For MyRocks I also ran tests with leveled and universal compaction. This is the first time I tried universal compaction with MyRocks.

Results

The numbers in the table below are the average insert rate, larger is better.

1       2       4       8       12      16      concurrency
34465   61843   106185  131536  131926  121988  innodb-5.6.26, binlog
35217   63857   108666  133200  133023  125196  innodb-5.6.26, no binlog
31461   60855   114058  201918  258398  281690  innodb-5.7.10, binlog
32962   62490   117028  206292  263678  282486  innodb-5.7.10, no binlog

leveled compaction
38465   71467   113186  125392  125865  125984  myrocks, binlog
44092   81950   132275  142298  139519  134998  myrocks, no binlog

universal compaction
37474   71162   115042  133556  135181  134003  myrocks, binlog
43006   79888   134590  198511  238379  252207  myrocks, no binlog

Graphs

These graphs show the row insert rate per 5 second interval for the test with 16 clients. The graph for InnoDB-5.7 ends earliest because it finishes the load first. The graph for MyRocks with leveled compaction has the most variance when ingest is faster than compaction from level 0 to level 2. Ignoring the variance, MyRocks throughput declines less over time than InnoDB.

The next graph is for InnoDB-5.7. While throughput declines over time there is not much variance.
The next graph is for InnoDB-5.6. It has more variance and loses more throughput over time compared to InnoDB-5.7. MySQL has been busy making InnoDB better.
The next graphs are for MyRocks with leveled and then universal compaction. There are more stalls with leveled compaction. The average throughput is similar and binlog processing is the bottleneck. We have work in progress to improve leveled compaction and the overhead from the binlog.


Thursday, February 2, 2017

Why MyRocks?

This is yet another attempt to explain why MyRocks is interesting. MySQL continues to get more popular and should soon become #1 on the db-engines ranking. Popularity isn't the only thing about MySQL that is improving -- performance, manageability and availability are also much better and cloud vendors, especially Amazon, are doing remarkable work to make MySQL in the cloud much easier than it used to be.
While Oracle has been a great owner of MySQL we still benefit from external contributions to the core DBMS. But new storage engines are unlikely, even ones that had great promise like TokuDB and PBXT, because the storage engine API is hard to implement. I am surprised and thrilled by the progress we are making with MyRocks. It is in production for us today and I expect it to be in production elsewhere within the next 12 months. I am grateful to work with talented teams (MyRocks, MySQL & RocksDB) and understanding management.

Efficient performance is the reason for MyRocks. We want to provide performance similar to InnoDB but with much better storage efficiency. The technical message is that MyRocks has less space and write amplification than InnoDB without sacrificing too much read amplification. For a workload I care about it uses 1/2 the space and writes at 1/10 the rate of InnoDB. The less technical message is that with MyRocks a deployment needs less SSD and the SSD will last longer.

The other important question is when to use MyRocks. My goal is for MyRocks to be an alternative to InnoDB for any workload where the database is larger than RAM. While we have optimizations for in-memory workloads with RocksDB I don’t focus on that today. I also assume that too many workloads use too much RAM today. If you have fast storage with SSD and the database fits in RAM then you probably have too much RAM. So my pitch is to use more fast storage and less RAM and use less space, less power and possibly fewer servers.

Monday, January 30, 2017

Compaction stalls: something to make better in RocksDB


In previous results that I shared for the insert benchmark it was obvious that MyRocks throughput is steady when the workload transitions from in-memory to IO-bound. The reason is that non-unique secondary index maintenance is read-free for MyRocks so there are no stalls for storage reads of secondary index pages. Even with the change buffer, InnoDB eventually is slowed by storage reads and by page writeback.

It was less obvious that MyRocks has more variance on both the in-memory and IO-bound insert benchmark tests. I try to be fair when explaining storage engine performance so I provide a few more details here and results for InnoDB in MySQL 5.7.10 & 5.6.26 along with MyRocks from our fork of MySQL 5.6. The binlog was enabled for all tests, fsync-on-commit was disabled and 16 clients inserted 500m or 2b rows into 16 tables in PK order. Each table has 3 secondary indexes. For MyRocks I made one configuration change from the earlier result. I changed level0_slowdown_writes_trigger from 10 to 20 to reduce compaction stalls. This has a potential bad side effect of making queries slower, but this test was insert-only.

For both graphs the y-axis is the average insert rate per 5-second interval and the x-axis is the interval number. I used mstat to collect the data.

The goal is to match InnoDB in MySQL 5.7 in quality of service while providing much better throughput. We have some work to do. On the bright side, this is an opportunity for someone to make RocksDB better.

In-memory

The first graph is for the in-memory workload where all data is cached by the storage engine, nothing is read from storage, but many storage writes are done to persist the changes. InnoDB with MySQL 5.7 is much faster than with 5.6. It also has the least variance. MyRocks has the most variance and that is largely from compaction stalls when there are too many files in level 0 of the LSM tree.


IO-bound

The second graph is for the IO-bound workload. The graph ends first for MyRocks because it sustains the highest average insert rate. But it also has a thick line because of variance. InnoDB from MySQL 5.6 also has a lot of variance.

Monday, January 23, 2017

The value of write efficiency for the insert benchmark

I shared results last week for the insert benchmark. The test server for that result had fast SSD courtesy of several NVMe NAND flash devices. I repeated tests on a server with slower SSD and the results are interesting. Regardless of the storage device, MyRocks did better when the database was larger than RAM and InnoDB 5.7.10 did better when the database fit in RAM. But the difference between MyRocks and InnoDB 5.7.10 for the in memory workload is much smaller on slow SSD than it is on fast SSD. It is always risky to ignore the context from benchmark results and the important context here is the performance of the storage device.

Slower SSD hurts InnoDB more than it hurts MyRocks because MyRocks is more efficient for writes. Spending less IO on writes saves IO to do more writes or more reads as I previously demonstrated for Linkbench.

When the database fits in RAM there are no reads from storage. But there are many writes to storage whether or not the database is larger than RAM and the insert benchmark workload is always IO heavy. From looking at PMP stacks InnoDB throughput is limited by the rate at which the redo log is written and InnoDB in 5.6.26 has more CPU overhead from mutex contention than in 5.7.10. Note that the binlog was enabled but sync-on-commit was disabled so there was more stress on redo log throughput.

Workload

The workload is fully described in my previous post. 500m rows are inserted for the in memory test and 2b rows for the larger than memory test. All tests use 16 clients, each client inserts to a different table and each table has 3 secondary indexes to maintain.

The slower SSD provides about 10k page reads second versus more than 100k/second provided by the fast SSD server.  Both servers have similar CPUs and RAM.

By in memory I really mean that the database working set fits in memory but it is easier to ignore that and just claim the database fits in memory. Sorry for the confusion.

Throughput

The throughput listed below is the average insert rate during the test. Things that I notice in the results include:
  • InnoDB loses more throughput when going from fast to slow SSD and from in memory to larger than memory workloads.
  • Restating the previous point, MyRocks throughput is more stable when going from fast to slow SSD and from in memory to larger than memory workloads.
  • InnoDB in MySQL 5.7.10 scales much better than 5.6.26 for in memory write-heavy workloads

Throughput for in memory database
            slow-SSD  fast-SSD
MyRocks        83766    102712
InnoDB-5.7    124782    268873
InnoDB.5.6     66251    111111

Throughput for larger than memory database

            slow-SSD  fast-SSD
MyRocks        86401   115234
InnoDB-5.7     38207    65604
InnoDB-5.6     14784    52812

IO Efficiency

Things that I notice in the results include:
  • the value for wKB/i for MyRocks is somewhat smaller than InnoDB for the in memory database and much smaller than InnoDB for the larger than memory database. I expected this.
  • InnoDB in MySQL 5.7 is able to sustain higher IO rates than in 5.6 thanks to many performance improvements in InnoDB.
  • Assuming the slow SSD saturates around 400 MB/s of IO, then writing less per insert for MyRocks means it gets more inserts/second when IO saturates.

Legend:
* IPS - average inserts/second
* rMB/s, wMB/s - read & write MB/s via iostat
* rKB/i, wKB/i - read & write KB per insert via iostat
* note that iostat usually overstates bytes written by 2X

  for RocksDB because it counts bytes trimmed as bytes written

IO metrics for in memory database
             IPS     rMB/s   wMB/s   rKB/i   wKB/i
- slow SSD
MyRocks       83766    0      280     0       3.34
InnoDB-5.7   124782    0      533     0       4.27
InnoDB-5.6    66251    0      281     0       4.24
- fast SSD
MyRocks      102712    0      339     0       3.30
InnoDB-5.7   268873    0     1163     0       4.35
InnoDB-5.6   111111    0      451     0       4.05

IO metrics for larger than memory database
             IPS     rMB/s   wMB/s   rKB/i   wKB/i
- slow SSD
MyRocks       86401   86      346     1.00    4.00
InnoDB-5.7    38207   42      385     1.21   10.08
InnoDB-5.6    14784   36      169     2.44   11.41
- fast SSD
MyRocks      115234  118      457     1.03    3.96
InnoDB-5.7    65604   75      878     1.14   13.38
InnoDB-5.6    52812   71      525     1.34    9.94

Tuesday, January 17, 2017

Insert benchmark, MyRocks and InnoDB

I haven't been sharing many performance results on my blog as I have been saving results for my conference talks. Later this year I expect to blog more and travel less. Today I will share two results from the insert benchmark to compare MyRocks and InnoDB.

tl;dr - both are expected
  • MyRocks insert performance doesn't degrade when the database is larger than memory
  • InnoDB in MySQL 5.7 scales much better than in MySQL 5.6 for in-memory workloads
  • InnoDB in MySQL 5.7 wins for in-memory, MyRocks wins for io-bound

Configuration

This test does inserts in PK order into a table with 3 secondary indexes. Without the secondary indexes the workload is friendly to a b-tree. But secondary index for a b-tree is read-modify-write and the reads from and writes to storage will make InnoDB slower when the working set doesn't fit into RAM. InnoDB is likely to do better than other b-trees because of the insert buffer. With MyRocks, non-unique secondary index maintenance is read-free so performance doesn't drop much when the working set no longer fits in RAM.

I ran the insert benchmark using this script with two configurations. Both configurations used 16 clients, 16 tables and a client per table. For the io-bound configuration there was 50gb of RAM for the database and OS and 2B rows were inserted.  For the in-memory configuration there was 256gb of RAM for the database and OS and all data was cached by the database cache.

The test server has 48 cores with HT enabled and fast NVMe flash. For MyRocks I used the FB MySQL repo. For InnoDB I used MySQL 5.6.26 and 5.7.10. I tried to use a good my.cnf for all engines and the MyRocks settings are similar to what we suggest. For the io-bound configurations I used a 10gb RocksDB block cache and 35gb InnoDB buffer pool. For the in-memory configurations all database caches were set to 190gb. The binlog was enabled for all tests, but sync on commit was disabled for the database redo log & binlog.

I disabled compression for all engines in this test. InnoDB still uses much more space on disk, probably because of fragmented leaf pages.

Average insert rates

The average insert rates for each configuration. It is odd that the insert rate for io-bound MyRocks is better than for in-memory and I am not sure I will debug it. There isn't much difference between InnoDB from MySQL 5.6 and 5.7 for the io-bound configuration. There is a huge difference between them for the in-memory configuration. I assume the difference is all of the performance work done upstream (thank you). MyRocks is much faster than InnoDB for the io-bound configuration. InnoDB in MySQL 5.7 is much faster than MyRocks for the in-memory configuration. I expect MyRocks to do better on the in-memory setup in the future.

Legend for the table:
  • io-bound - average inserts/second for the io-bound configuration
  • in-memory - average inserts/second for the in-memory configuration
  • size - database size in GB at test end
  • engine - database engine

io-bound  in-memory  size   engine
115234    103088     226    MyRocks

 65604    267523     362    InnoDB-5.7.10
 52812    111259     362    InnoDB-5.6.26

IO-bound results

I have two graphs to show the insert rates over time. This is for data from one of the 16 clients rather than the global rate because my test script forgot to collect that. The first graph uses log scale for the x-axis. The second graph does not. The first graph makes it easier to see how throughput drops for InnoDB as the database gets larger. While it is odd that the throughput rate for InnoDB 5.7 picks up at test end, that can occur if the thread I monitored ran longer than other threads and had less competition for HW resources near test end. The graph for MyRocks stops earlier than the InnoDB graphs because it finishes the insert of 2B rows much earlier.

As a bonus for MyRocks, it writes about 1/4 the amount to storage per inserted row when compared to InnoDB. That makes SSD last longer and using less IO for writes saves more IO for reads.