Friday, December 30, 2016

The MyRocks community in 2016

The MyRocks community grew in 2016 with significant contributions from early evaluators. I appreciate their help and have been busy documenting the problems that they reported. I look forward to more contributions in 2017.

MariaDB Corporation and Percona announced plans to include MyRocks in their distributions. You will be able to get MyRocks with expert support. Equally big have been the contributions from Sergey Pertunya to MyRocks.

The MySQL team at Alibaba reported poor performance from range scans with sysbench. The problem was memory system contention on a performance counter and the fix made long range scans 2X faster on concurrent workloads.

I have been using sysbench more and created a script to run a sequence of tests with it. During my tests I noticed too much variance during the read-only tests and the problem is non-determinism from the state of the memtable and level 0 of the LSM tree. The overhead for a search of the LSM tree depends on the amount of data in the memtable and the number of files in level 0. This state is not deterministic when a read-only test follows a read-write test. I filed issue 427 to see if we can get RocksDB to adapt and flush data earlier than normal from the memtable and level 0 when a workload becomes read-only.

Sysbench is extremely useful even though the workloads provided by it are synthetic -- especially the read-only tests with a small (multi-GB) database and most tests use a uniform distribution for keys. There is a talk on sysbench at FOSDEM 2017 and I look forward to using it more next year.

Justin Swanhart pointed out the lousy performance MyRocks provided with a default configuration. As a result we improved the default configuration by increasing the memtable to 64M and block cache to 512M and I will share results at FOSDEM. See issues 369375, and 441.

Justin Swanhart and Rick Pizzi reported problems with too-large transactions leading to lousy performance and OOM. MyRocks buffers in memory all changes from a transaction until commit and at commit time the changes are briefly double buffered when copied into the memtable. Until we make this better the workaround is to use rocksdb_commit_in_the_middle for bulk loads. Even after we make this better that option will be useful for bulk loads. Today we have a poor limit on the max size of a transaction via rocksdb_max_row_locks. This is a limit on the number of locked rows rather than on the memory used by a transaction. Even worse, the default is 1B. We are changing the limit to be on the amount of memory used by a transaction. Issues for this include 341, 347, 348 and 365.

The MySQL team at Alibaba reported a performance regression from MyRocks group commit when running sysbench. They also reported a correctness problem that I ran into with sysbench. Auditing the code found another bug. A related problem is that rocksdb_disable_2pc was enabled by default, meaning engine & binlog crash safety wasn't guaranteed. We have changed the default configuration to be crash safe and renamed the option to avoid double negatives. We also used YSCB and sysbench to reproduce the performance regression and have work in progress to make this better. That requires collaboration between the MyRocks and RocksDB teams. It is fun to watch the teams solve problems. Issues for this problem include 474481, 482488.

2 comments:

  1. Hi Mark

    Happy New Year!

    The past 2016 is very meaningful for us (Alibaba Database Team). First, we keep on improve AliSQL(Alibaba's branch of MySQL). Besides, we also do thorough research and tests on MyRocks/RocksDB, I think MyRocks will give us more choice and benefits beyond InnoDB. To be honest, MyRocks/RocksDB is not so mature compared to InnoDB, I think we can do more contribute on it:)

    We will share the experiments on MyRocks/RocksDB at this year's percona live (MyRocks: best practice at Alibaba, 26 April - 2:00 PM - 2:50 PM at Ballroom C), see you!

    he dengcheng

    ReplyDelete
  2. I look forward to more contributions from the Alibaba team and the talk at Percona Live. MyRocks gets better faster when it has a community of users.

    ReplyDelete

RocksDB on a big server: LRU vs hyperclock

This has benchmark results for RocksDB using a big (48-core) server. I ran tests to document the impact of the the block cache type (LRU vs ...