Thursday, September 15, 2016

Peak benchmarketing season for MySQL

Maybe this is my XKCD week. With Oracle Open World and Percona Live Amsterdam we are approaching peak benchmarketing season for MySQL. I still remember when MySQL 4.0 was limited to about 10k QPS on 4 and 8 core servers back around 2005, so the 1M QPS results we see today are a reminder of the great progress that has been made thanks to investments by upstream and the community.

In General

But getting 1.5M QPS today compared to 1M QPS last year isn't at the top of the list for many (potential) users of MySQL. I use performance, usability, mangeability, availability and efficiency to explain what matters for web-scale DBMS users. My joke is that each of these makes a different group happy: performance -> marketing, usability -> developers, manageability -> operations, availability -> end users, efficiency -> management.

The benchmarketing results mostly focus on performance. Whether InnoDB does a bit more QPS than Amazon Aurora isn't going to make Aurora less popular. Aurora might have excellent performance but I assume people are deploying it for other reasons. I hope we make it easier to market usability, manageability, availability and efficiency in the MySQL community. MongoDB has gone a long way by marketing and then delivering usability and manageability.

Even when limited to performance we need to share more than peak QPS. Efficiency and quality-of-service (QoS) are equally important. QPS without regard to response time is frequently a bogus metric. I get more IOPs from a disk by using a too large queue depth. But more IOPs at the cost of 100 millisecond disk read response times is an expensive compromise. Even when great QPS is accompanied by a good average response time I want to know if there is lousy QoS from frequent stalls leading to lousy 99th percentile response times. Percona has built their business in part by being excellent at documenting and reducing stalls in InnoDB that occur on benchmarks and real workloads.

I have been guilty of sharing too many benchmark reports in the past that ignored efficiency and QoS. I have been trying to change that this year and hope that other providers of MySQL performance results do the same. This is an example of a result that includes performance, efficiency and QoS.

MyRocks and RocksDB

A lot of the RocksDB marketing message has been about performance. Database access is faster with an embedded database than client/server because you avoid network latency. The MyRocks message has been about efficiency. The target has been better compression and less write amplification than InnoDB so you can use less SSD and lower-endurance SSD. For a workload I care about we see 2X better compression and 1/10 the write rate to storage. This is a big deal.

When starting the project we had many discussions about the amount of performance loss (reduced QPS, higher response time) we could tolerate to get more efficiency. While we were vague the initial goal was to get similar QPS and response time to InnoDB for real workloads, but we were willing to accept some regressions. It turned out that there was no regression and similar performance with much better efficiency is a big deal.

But benchmarks aren't real workloads and there will soon be more benchmark results. Some of these will repeat what I have claimed, others will not. I don't expect to respond to every result that doesn't match my expectations. I will consult when possible.

One last disclaimer. If you care about read-mostly/in-memory workloads then InnoDB is probably an excellent choice. MyRocks can still be faster than InnoDB for in-memory workloads. That is more likely when the bottleneck for InnoDB is page write-back performance. So write-heavy/in-memory can still be a winner for MyRocks.

Seriously, this is the last disclaimer. While we are bickering about benchmark results others are focusing on usability and manageability and getting all of the new deployments.


  1. When you compare MyRocks to InnoDB, do you mean InnoDB with or without compression?

    1. I usually compare to both and do my best to explain that in the perf reports. I haven't shared many benchmark results this year as I didn't want to start any benchmarketing battles.

  2. I meant specifically in this case, when you are talking about one particular in-memory workload being comparable to InnoDB. Did you mean compressed or un-compressed InnoDB?

    I am going to test MyRocks, InnoDB compressed, and TokuDB on the SSB benchmark w/ shard-query when the data size is significantly larger than the buffer pool size. I've only published the results of TokuDB in memory, and that was awhile ago. I got surprisingly good SSB results from TokuDB when data was larger than memory. This benchmark isn't like to start any wars, as it doesn't represent the typical workload unless you are using Shard-Query, Spark (which Percona recently blogged about), or some other tool that can do parallel scans of partitions. Only Shard-Query can push down all operations including aggregation, joins, and the finest possible grain of filtering.

    1. We see 2X better compression and less than 1/10 the write rate to storage for MyRocks vs compressed InnoDB using both linkbench and production, if that is what you are asking about.

    2. >MyRocks can still be faster than InnoDB for in-memory workloads. That is more likely when the bottleneck for InnoDB is page write-back performance. So write-heavy/in-memory can still be a winner for MyRocks.

      I wanted to clarify if this was with or without compression in InnoDB. I generally wouldn't use compression for InnoDB for an in-memory workload, but I assumed this would be slower than non-compressed (uses CPU and disk). I wondered if MyRocks in this particular instance was as fast as even non-compressed InnoDB, because IO is so much slower than in-memory operations, and InnoDB does a lot of "extra" IO in that workload. It is impressive that a compressed engine can compete in all of there areas that you speak about: QOS, efficiency, and QPS.

    3. With InnoDB you usually get compression or performance, you rarely get both. In an odd case I might try to use compressed InnoDB for in-memory -- if that lets me fit all data in memory.

      With MyRocks you don't have to worry as much as enabling compression doesn't have a huge impact on performance. When we also get zstandard into production then the impact from compression is even less.

      How big is a database with SSB? I used the star schema benchmark many years ago when I worked on bitmap indexes at Oracle. I met Patrick O'Neil many years before that when I worked at Informix. Now I work on LSM. Prof. O'Neil made major contributions to LSM, bitmap indexes, and the star schema benchmark.

    4. Have you seen this fun hack, embedded a bitmap indexed based RDBMS through the UDF interface:

    5. Didn't know it was open source. I read about bitmap index work LBL was doing when I was at Oracle. LBL did word-aligned, Oracle citations implied byte-aligned and the person behind the work at Oracle was highly productive -

      I never got to evaluate word vs byte aligned. My memory is that word aligned saves CPU (faster queries), byte aligned saves disk space. Once again the RUM Conjecture applies -- can't be optimal for read, write and space.

    6. You mentioned using the Rocks API for new engines. Would it be possible to instead add a new index type to RocksDB?

    7. Good question. Our focus is on OLTP, but maybe someone in the community will add it. The API is too simple today to take advantage of bitmap-and, bitmap-or, etc so that would have to be designed.

  3. The SSB is based on the TPC-H so it is has a "scale factor". Scale factor 1 is 512MB of data, scale factor 20 is 10GB, etc. I used scale factor one to find an interesting different in performance of the benchmark when the defaults changed between versions:

    The point of the SSB is to test join and filtering performance of databases. Star schema are particularly bad for nested-loop joins, and proprietary RDBMS like Oracle have specialized join strategies for star schema.

    I generally test it with a MySQL capable of hash joins and excellent compression (ICE). The queries that Shard-Query generates are simple enough to avoid any bugs in ICE, and there are workarounds for any other bugs I encountered in it, so it is an excellent choice for data marts. The only downside is that it doesn't support partitioning, thus there is currently a need to shard data over multiple schema for shard-query parallelism.

    It is intended normally to restart the database and flush the filesystem buffers between each query run, but I don't generally do this for most of my tests, because I'm simply comparing parallel to serial performance, or in the case of redshift, you really can't flush anything.