Posts

Showing posts from November, 2016

MyRocks: use less IO on writes to have more IO for reads

Holiday is almost here and I wrote a long blog post  on write-efficiency yesterday so this one will be short. A longer version of this is in progress because this is an interesting result for me to explain. We assume that an LSM is less efficient for reads because it is more efficient for writes and it is hard to be optimal for all of read, write & space efficiency. For real workloads it is complicated and for now I include benchmarks in "real workloads".  Here is one interesting result from my IO-bound tests of Linkbench. The summary is that when you spend less on IO to write back changes then you can spend more on IO to handle user queries. That benefit is more apparent on slower storage (disk array) than on faster storage (MLC NAND flash) because slower storage is more likely to be the bottleneck. IO-bound Linkbench means that I used a server with 50G of RAM and ran Linkbench with maxid1=1B (1B nodes). The MyRocks database was ~400G and the InnoDB database was ~1.

Why is MyRocks more write-efficient than InnoDB?

Image
This year I shared results where InnoDB wrote between 10X and 20X more data to storage than MyRocks for the same workload. I use KB written to storage per transaction as a measure of write efficiency and I usually compute this with data from the benchmark client and iostat . I get KB written/second from iostat, average transaction/second from the benchmark client and divide the former by the latter to compute KB written/transaction. When using SSD this excludes the writes done by SSD firmware and I previously reported that the overhead was worse for InnoDB than for RocksDB on one vendor's device. An engine that writes less to storage per transaction is more write efficient. It is a good thing if MyRocks writes 10X less to storage than InnoDB for the same workload. This might enable MyRocks to use lower-endurance SSD for workloads where InnoDB required higher-endurance SSD. This might enable MyRocks to use SSD for workloads in which the device would not last with InnoDB. This als

Sysbench, InnoDB, transaction isolation and the performance schema

I used sysbench to understand the impact of transaction isolation and the performance schema for InnoDB from upstream MySQL 5.6.26. The test server has 24 CPU cores, 48 HW threads with hyperthreading enabled, 256G of RAM and fast SSD. For sysbench I used the 1.0 version with support for Lua. Tests were run in two configurations -- cached and IO-bound. For the cached configuration I used 8 tables, 1M rows/table and the database cache was large enough to cache all data. For the IO-bound configuration I used 8 tables, 10M rows/table, a 2G database cache and buffered IO so that all data was in the OS page cache. The database was ~2G for the cached configuration and ~20G for the IO-bound configuration. InnoDB table compression was not used and jemalloc was used. The binlog was enabled but sync-on-commit was disabled for the binlog and InnoDB redo log. With 8 tables and 1M rows per table the database is very small -- a few GB. I am wary of drawing too many conclusions from sysbench resu