tl;dr
- MyRocks gets better insert rates while using less HW per insert: ~8X faster insert rates on the i3 NUC and ~5X faster on the i5 NUC.
- InnoDB writes ~20X more to storage per insert compared to MyRocks.
- Uncompressed InnoDB uses ~1.6X more space than uncompressed MyRocks
- MyRocks is up to 1.3X slower than InnoDB for secondary index scans and up to 1.7X slower for PK scans. The gap is larger for PK indexes because InnoDB secondary indexes suffer from fragmentation in this test. MyRocks suffers from using more CPU per row, so it can't drive storage as fast as InnoDB.
- InnoDB suffers more write stalls on the read-write test
- MyRocks does much better for reads on the read-write tests because it is able to cache the secondary indexes. This is a benefit of being more space efficient. In the future I need to make the database larger to avoid comparing cached MyRocks vs not-cached InnoDB on this benchmark.
Configuration
The tests used a MyRocks build from FB MySQL 5.6.35 and August 15 with git hash 0d76ae and then upstream InnoDB 5.6.35 and 5.7.17. All tests used jemalloc with mysqld. The i3 and i5 NUC servers are described here. The insert benchmark is described here. The my.cnf files are here for the i3 NUC and i5 NUC. The i5 NUC has more RAM, faster CPUs and faster storage than the i3 NUC. I tried to tune my.cnf for all engines including: disabled SSL for MySQL 5.7, use the same charset and collation, set innodb_purge_threads=1 to reduce mutex contention on the small servers. For all tests the binlog was enabled but fsync was disabled for the binlog and I set flush_log_at_trx_commit=2 for both engines. I compiled all of the MySQL versions on the test servers and did not use binaries from upstream.
The database is larger than RAM. The i3 NUC has 8gb of RAM and the i5 NUC has 16gb. The insert benchmark used 1 table and loaded 250M rows on the i3 NUC and 500M rows on the i5 NUC, then did a full scan of each index on the table (PK, 3 secondary, PK again), then two read-write tests. The first read-write test tries to insert 1000 rows/second with one client while the other client does short range scans as fast as possible. The second read-write test is similar except the insert rate limit is 100/second.
The MyRocks tests were done without compression and with zstd compression for the max LSM level. These are labeled MyRocks.none and MyRocks.zstd below.
Results
All of the data for the graphs is here. Note that Linux iostat can overstate bytes written by RocksDB by 2X because it counts bytes trimmed as bytes written. The data in github does not adjust for that. The numbers here for wKB/i does by dividing the measured value by 2 -- which is an estimate. Note that even if my adjustment were exact this value is still missing bytes written by flash GC and when I measured that in the past it was much worse for InnoDB.
Results: load
This graph has the insert rate for each engine relative to the rate for MyRocks. I am not surprised that MyRocks has a better insert rate than InnoDB because it does less work per insert than InnoDB. MyRocks secondary index maintenance is read free and MyRocks has much less write amplification. For the in-memory insert benchmark there is a regression in the insert rate for InnoDB from 5.6 to 5.7 but that does not occur here.
- InnoDB writes ~20X more to storage per insert which helps my SSD devices last longer.
- InnoDB reads between 10X and 20X more from storage per insert because it reads leaf pages for secondary index maintenance while MyRocks reads for compaction but secondary index maintenance is read free.
- InnoDB uses more CPU per insert.
- InnoDB uses ~1.6X more space than MyRocks when neither use compression. InnoDB suffers from B-Tree fragmentation for the secondary indexes and MyRocks benefits from key prefix compression.
Absolute values
IPS CPU/i wKB/i rKB/i size(GB)
MyRocks.none 22122 2699 1.79 0.25 28
MyRocks.zstd 20169 3243 1.36 0.14 13
InnoDB-5.6 2665 4934 39.08 6.10 46
InnoDB-5.7 3242 3526 34.24 2.54 46
Values relative to MyRocks
IPS CPU/i wKB/i rKB/i size(GB)
MyRocks.none 1.00 1.00 1.00 1.00 1.00
MyRocks.zstd 0.91 1.20 0.76 0.56 0.46
InnoDB-5.6 0.12 1.83 21.83 24.40 1.64
InnoDB-5.7 0.15 1.31 19.13 10.16 1.64
These tables have more data for the i5 NUC. The insert rate difference is smaller here than for the i3 NUC. InnoDB suffers more from the slower storage device on the i3 NUC. Similar to the i3 NUC, InnoDB writes ~20X more to storage per insert and ~1.6X more space compared to uncompressed MyRocks.
Absolute values
IPS CPU/i wKB/i rKB/i size(GB)
MyRocks.none 26837 2278 1.75 0.12 56
MyRocks.zstd 24811 2770 1.52 0.06 25
InnoDB-5.6 5978 4265 44.50 2.10 90
InnoDB-5.7 5634 4690 44.92 1.26 90
Values relative to MyRocks
IPS CPU/i wKB/i rKB/i size(GB)
MyRocks.none 1.00 1.00 1.00 1.00 1.00
MyRocks.zstd 0.92 1.22 0.87 0.50 0.45
InnoDB-5.6 0.22 1.87 25.43 17.50 1.61
InnoDB-5.7 0.21 2.06 25.67 10.50 1.61
Results: scan
This section has results for the index scans. There are 5 scans: 1 is the PK, 2/3/4 are the secondary indexes and then 5 is the PK again. The 2+3+4 column has the time to scan all secondary indexes. The gap between MyRocks and InnoDB is much smaller here than it was on the in-memory full index scan. I assume the CPU overhead from MyRocks is less significant here where the scan is IO-bound.
I wrote above that I used a MyRocks build from August 15. Unfortunately for the scan test I used one for June 16 with git hash 52e058.
The result below includes results (MyRocks.none.ra, MyRocks.zstd.ra) that used a RocksDB option to make it more likely that filesystem readahead would be enabled. The option is rocksdb_advise_random_on_open and when set to 0 RocksDB uses POSIX_FADV_NORMAL rather than POSIX_FADV_RANDOM when opening SST files for user reads. I don't think this option is ready for production, but it can be used on benchmarks.
Some notes from the results:
- Full-index scans are faster with InnoDB. The gap is larger for the PK index because InnoDB secondary indexes suffer from fragmentation on this test. MyRocks suffers from using more CPU per row, so it can't drive storage as fast as InnoDB. MyRocks is up to 1.3X slower than InnoDB for secondary index scans and up to 1.7X slower for PK scans.
- The 2+3+4 result for MyRocks is close to the time for InnoDB in MySQL 5.6 when MyRocks uses filesystem readahead. I assume we can do more to get better readahead. Index scans on a fragmented B-Tree generate a lot of random reads. A full index scan from an LSM can do large reads from each LSM level to avoid that.
- Something was done for InnoDB in MySQL 5.7 to make range scans faster. Thank you to the InnoDB team.
1 2 3 4 5 2+3+4 index/engine
219 202 222 224 207 648 MyRocks.none
182 189 203 197 168 589 MyRocks.none.ra
241 206 230 228 230 664 MyRocks.zstd
241 195 215 212 201 622 MyRocks.zstd.ra
292 181 194 212 127 587 InnoDB-5.6
419 148 157 203 99 508 InnoDB-5.7
Results for the i5 NUC are similar to the i3 NUC.
#seconds for each index scan
1 2 3 4 5 2-4 index/engine
394 345 381 366 342 1092 MyRocks.none
331 318 344 331 285 993 MyRocks.none.ra
385 346 383 386 372 1115 MyRocks.zstd
359 327 360 356 341 1043 MyRocks.zstd.ra
475 290 302 314 225 906 InnoDB-5.6
417 250 267 249 164 766 InnoDB-5.7
Next are hardware efficiency metrics that help explain the performance differences. There are many interesting things in here. The columns are explained here.
- InnoDB gets more read IO throughput than MyRocks. I need to understand this better but part of the problem is that MyRocks uses more CPU per row so it isn't able to consume data from storage as fast as InnoDB.
- The InnoDB PK scan gets more read IO throughput than the InnoDB secondary index scan because the secondary index is fragmented and the PK is not courtesy of the workload (PK inserts are in PK order, secondary index inserts are not).
- Filesystem readahead gets ~10% more read IO throughput for MyRocks.
- InnoDB in MySQL 5.7 uses much less CPU than in MySQL 5.6 (thanks InnoDB team). I have yet to confirm whether the work done to make InnoDB scans faster in 5.7 are InnoDB-only or will benefit other engines.
- The gap between InnoDB and MyRocks is larger for the PK scan than the secondary index scan. This might occur because of secondary index fragmentation for InnoDB.
- The CPU overhead from zstd compression for MyRocks is small.
Metrics for q4: the scan of a secondary index
secs rMB/s rKB/o rGB MCPU/o engine
secs rMB/s rKB/o rGB MCPU/o engine
366 32.3 0.024 12 16.872 MyRocks.none
331 35.7 0.024 12 17.044 MyRocks.none.ra
386 13.1 0.010 5 18.684 MyRocks.zstd
356 14.0 0.010 5 17.764 MyRocks.zstd.ra
314 44.2 0.028 13 10.748 InnoDB-5.6
249 56.0 0.028 14 7.592 InnoDB-5.7
Metrics for q5 - the second scan of the PK
secs rMB/s rKB/o rGB MCPU/o
342 72.7 0.051 24 16.064 MyRocks.none
285 87.3 0.051 24 14.612 MyRocks.none.ra
372 30.6 0.023 11 17.820 MyRocks.zstd
341 33.3 0.023 11 17.396 MyRocks.zstd.ra
225 147.0 0.067 32 14.732 InnoDB-5.6
164 198.1 0.067 32 8.900 InnoDB-5.6
And a graph of the results from the 2+3+4 column because some readers like graphs.
Disclaimer - MyRocks is able to cache the secondary indexes during this test while InnoDB is not. I didn't plan this and in the future I need to prevent that from happening. While being more efficient with space is a benefit of MyRocks, even without compression, there is a step function in performance when the database is cached that distorts performance differences. MyRocks needs less space for secondary indexes because it doesn't suffer fragmentation like a B-Tree and it uses key prefix compression.
This section has results for the read-write tests where the writer does 1000 inserts/second. The tables below have the most interesting data. The metrics are explained here. All of the engines were able to sustain the target insert rate on average (ips.av). The value is 999 rather than 1000 because of implementation artifacts. There were more write stalls on InnoDB than MyRocks based on the 99th percentile insert rate as ips.99 is lower for InnoDB. QPS is much better for MyRocks but see the disclaimer above. InnoDB writes ~20X more to storage per insert compared to MyRocks.
Absolute values for the i3 NUC
IPS.av IPS.99 wKB/i QPS.av QPS.99 rKB/q CPU/q
MyRocks.none 999 998 1.30 3349 2377 0.37 7912
MyRocks.zstd 999 998 1.70 3450 2506 0.25 7986
InnoDB-5.6 999 977 26.20 258 156 101.61 37834
InnoDB-5.7 999 977 30.70 311 238 98.68 26829
Absolute values for the i5 NUC
IPS.av IPS.99 wKB/i QPS.av QPS.99 rKB/q CPU/q
MyRocks.none 999 998 2.23 3818 2395 0.47 6945
MyRocks.zstd 999 998 1.96 3986 2858 0.28 6788
InnoDB-5.6 999 987 73.09 992 626 59.45 16798
InnoDB-5.7 999 988 71.22 1042 738 61.07 16115
Results: read-write, 100 writes/second
See the disclaimer in the previous section.
This section has results for the read-write tests where the writer does 100 inserts/second. The metrics are explained here. The analysis from the previous section is relevant here.
Absolute values for the i3 NUC
IPS.av IPS.99 QPS.av QPS.99 rKB/q CPU/q
MyRocks.none 100 100 3419 3184 0.01 7173
MyRocks.zstd 100 100 3549 3270 0.01 6969
InnoDB-5.6 100 100 827 534 66.49 13675
InnoDB-5.7 100 100 900 566 69.56 10269
Absolute values for the i5 NUC
IPS.av IPS.99 QPS.av QPS.99 rKB/q CPU/q
MyRocks.none 100 100 3911 3740 0 6202
MyRocks.zstd 100 100 4090 3841 0.01 5938
InnoDB-5.6 100 100 1334 951 50.42 11546
InnoDB-5.7 100 100 1383 996 52.50 10807
I did not include a graph. See the disclaimer above.
No comments:
Post a Comment