Posts

Showing posts from January, 2016

MyRocks vs InnoDB with Linkbench over 7 days

Image
After feedback from my previous post on Linkbench I repeated it for 7 days as the previous test ran for 1 day. The results are the same --  MyRocks sustains more QPS, is more IO efficient and provides better compression on IO-bound Linkbench. Ask your MySQL vendor when they will support MyRocks . The summary is: InnoDB writes between 8X and 14X more data to SSD per transaction than RocksDB RocksDB sustains about 1.5X more QPS Compressed/uncompressed InnoDB uses 2X/3X more SSD space than RocksDB I encourage others to use long running benchmark tests and present IO efficiency metrics in addition to performance results. Configuration I used the same configuration as described in the previous post  with one difference. For this test I ran 168 iterations of the query step and each step ran for 1 hour. The test ran for 7 days while the previous test ran for 1 day. What I describe as QPS below is TPS (transactions/second) and when I use per query below I mean per transaction .

How to build MongoRocks for MongoDB 3.2

This explains how I built MongoDB 3.2 from source with support for RocksDB thanks to help from Igor Canadi . There are more details here and here . My server uses Fedora. # Install many of the dependencies for MongoRocks sudo yum install snappy-devel zlib-devel bzip2-devel lz4-devel sudo yum install scons gcc-g++ git # Unpack MongoDB 3.2 source in $MONGOSRC # Directory in which git repos are created mkdir ~/git # Get MongoRocks engine cd ~/git git clone https://github.com/mongodb-partners/mongo-rocks.git cd mongo-rocks git checkout --track origin/v3.2 -b v32 # get and build RocksDB libraries git clone https://github.com/facebook/rocksdb.git cd rocksdb git checkout --track origin/4.4.fb -b 44fb make static_lib # prepare source build with support for RocksDB cd $MONGOSRC mkdir -p src/mongo/db/modules/ ln -sf ~/git/mongo-rocks src/mongo/db/modules/rocks # build mongod & mongo binaries # if you have zstd installed then use LIBS="lz4 zstd"

MyRocks vs InnoDB, the insert benchmark and a disk array

Image
This compares MyRocks and InnoDB using the insert benchmark . The test server has a disk array. The workload generates a lot of secondary index maintenance which can stress the server's random IO capacity. tl;dr - the average insert rate is much better for MyRocks than for InnoDB Configuration The test server has 2 sockets, 8 cores (16 HW threads) per socket, 64GB of RAM and 14 disks with SW RAID 0 and a 2MB RAID stripe. I tested MyRocks versus InnoDB from MySQL 5.6.26 and 5.7.10 from Oracle. Two configurations were tested for MyRocks. The first is the regular configuration described here . The second is the configuration optimized for load performance and called load-optimized. The load-optimized configuration sets rocksdb_bulk_load=1 to disable unique index checks and uses a smaller memtable to reduce the number of comparisons per insert. The command line to run the insert benchmark for MyRocks is here . These are links to the my.cnf files for default MyRocks , load-opti

The advantages of an LSM vs a B-Tree

The log structured merge tree (LSM) is an interesting algorithm. It was designed for disks yet has been shown to be effective on SSD. Not all algorithms grow better with age. A long time ago I met one of the LSM co-inventors, Patrick O'Neil , at the first job I had after graduate school. He was advising my team on bitmap indexes . He did early and interesting work on both topics. I went on to maintain bitmap index code in the Oracle RDBMS for a few years. Patrick O'Neil made my career more interesting. Performance evaluations are hard. It took me a long time to get expertise in InnoDB, then I repeated that for RocksDB . Along the way I made many mistakes. Advice on doing benchmarks for RocksDB is here and here . tl;dr - the MyRocks advantage is better compression and less write-amplification The MyRocks Advantage There are many benefits of the MyRocks LSM relative to a B-Tree. If you want to try MyRocks the source is on github , there is a wiki  with notes on buildin

MyRocks vs InnoDB via Linkbench with a disk array

Image
Previously I evaluated MyRocks and InnoDB for an IO-bound workload using a server with fast storage. Here I evaluate them for an IO-bound workload using a server with a disk array. MyRocks sustains higher load and query rates than InnoDB on a disk array because it does less random IO on writes which saves more random IO for reads. This was the original motivation for the LSM algorithm . MyRocks does better on SSD because it writes less data to disk per commit. It compressed data 2X better than InnoDB which helps on both disk and SSD courtesy of improving the cache hit ratio. While the LSM algorithm was designed for disk arrays it also works great on SSD thanks to a better compression rate and better write efficiency. The LSM algorithm has aged well. Configuration This test used a server with two sockets, 8 cores (16 HW threads) per socket, 40GB of RAM and a disk array with 15 disks and SW RAID 0 using a 2MB RAID stripe. Compared to the previous result , I used maxid1=200M in

Faster loads for MyRocks

Image
In my previous post I evaluated Linkbench performance for MyRocks and InnoDB and the insert rate during the load was faster for InnoDB. Here I show that with tuning MyRocks can load as fast as InnoDB on SSD. Tuning was required for SSD. On disk arrays loading is already much faster with MyRocks than InnoDB and I will publish those soon. The largest tuning benefit comes from setting the rocksdb_load_bulk session variable to disable checks for unique index constraints. A smaller tuning benefit comes from using a smaller value for write_buffer_size, 32MB rather than 128MB used in the previous test. The benefit from a smaller memtable is fewer compares per insert. The number of load threads is configurable for Linkbench via the loaders variable and I have been using loaders=20. But that is only for the threads that load the link table. The node table in Linkbench is also large and always loaded by a single thread. I hope to make it multi-threaded but until then using too many threa

Even more write amplification when InnoDB meets flash GC

Image
Yesterday I shared a benchmark report to compare MyRocks and InnoDB for Linkbench on a server with PCIe flash. One of the results was that InnoDB writes much more to storage per query, between 8X and 11X more. This is not a good thing because flash devices have a finite endurance and writing too much can lead to replacing the device too soon. The results from yesterday used iostat to measure how much InnoDB writes to storage. The results from today use the counters on the storage device and the data from iostat. The total writes reported by the storage device will be larger than the value reported by iostat because flash GC runs in the background to make flash blocks ready for writing.  The ratio of device writes divided by iostat writes is the flash WAF (write amplification factor) and is >= 1. On my storage device the flash WAF is much larger with InnoDB than with MyRocks. The flash WAF is about 1.4X for compressed InnoDB, 1.7X for uncompressed InnoDB and 1.03X for MyRock.

RocksDB vs InnoDB via Linkbench : performance and efficiency

Image
MyRocks can reduce by half the hardware, or at least the storage hardware, required to run Linkbench compared to InnoDB. That is kind of a big deal. A significant performance problem was recently fixed in MyRocks courtesy of the SingleDelete optimization. With this optimization RocksDB removes tombstones faster so that queries encounter fewer tombstones and waste less time on them. We hope to get the same feature into MongoRocks. I have been waiting a few months for this change and started another round of Linkbench tests when it arrived. Performance and efficiency for MyRocks look great relative to InnoDB. We are far from done but I am amazed we reached this state so fast. The performance summary from my recent tests with IO-bound Linkbench  and PCIe flash: Uncompressed InnoDB loads faster than MyRocks and MyRocks loads faster than compressed InnoDB. I hope to figure out how to make MyRocks load faster than uncompressed InnoDB. MyRocks uses about half the disk space compared to