Wednesday, September 14, 2016

zlib vs zstd for MyRocks running Linkbench

I used an IO-heavy configuration to determine the impact of zstandard vs zlib compression for MyRocks. There was about 1 read from SSD per transaction and decompression is done after each page read from the OS page cache and storage.

The results are impressive. Zstandard compresses like zlib level 1 but uses much less CPU.
  • zstandard reduces CPU by 45% vs zlib level 1 for the load test
  • zstandard reduces CPU by 11% vs zlib level 1 for the query test
  • zstandard gets 8% more TPS vs zlib level 1 for the query test


Configuration for MyRocks is still complex. The templates for the MyRocks my.cnf files for Linkbench and general usage are explained on the wiki. I used no compression for L0, L1, L2, then lz4 for all but the max level and then one of zlib level 1, zlib level 6 or zstd for the max level. The tests used an Aug5 build of MyRocks, so this used kZSTDNotFinalCompression as the build preceded the 1.0 release of zstandard.

The test host has 50G of RAM available to userland, fast storage (5TB of NVMe MLC) and 24 CPU cores with 48 HW threads. The RocksDB block cache was set to 10G, the binlog was disabled but sync-on-commit was disabled for the binlog and RocksDB. Linkbench is run with maxid1=1B, the load test uses 2 clients and the query tests use 16 clients. Query tests are run as 24 1-hour loops and I report metrics from the 24th hour. I used my branch of linkbench and support scripts.


The results for zstandard are impressive. I look forward to using this in production. Thanks Yann.

  • ips/tps - inserts & transactions per second
  • r/i, r/t - iostat reads per insert and per transaction
  • wKB/i, wKB/t - iostat KB written per insert and per transaction
  • Mcpu/i, Mcpu/t - usecs of CPU time per insert and per transaction
  • size - database size in GB
  • rss - mysqld RSS size in GB
  • un, gn, ul, gl - p99 response time in milliseconds for the most frequent transactions (Update Node, Get Node, Update Link, Get Link List)

Results for the load

ips     r/i     rKB/i   wKB/i   Mcpu/i  size    rss     engine
61543   0       0       0.98     81     324     3.1     zstd
61504   0       0       0.98    146     331     2.0     zlib-1
61457   0       0       0.97    153     312     2.2     zlib-6

Results for the 24th hour of the query test

tps    r/t   rKB/t   wKB/t  Mcpu/t  size  rss   un    gn   ul  gl   engine
39366  1.00  10.38   2.36    878    377   12.2  0.6   0.6  1   0.8  zstd
36524  1.00  10.47   2.45    992    381   12.1  0.7   0.6  1   0.9  zlib-1
37233  0.97   9.76   2.30   1002    360   12.0  0.7   0.7  1   0.9  zlib-6


  1. We have done a bunch of testing with column compression using a predefined dictionary with some great results. Are there any plans for MyRocks to support predefined dictionaries?

    1. I think there has been work-in-progress on that for RocksDB. Not sure whether it gets committed, and whether MyRocks can use it.

  2. I was wondering why, for the load using zstd, the mysqld RSS is 3.1G?
    Is the zstd lib using a custom allocator?

    1. I don't know yet. But the result I care most about is RSS from the query test and that is similar between zstd and zlib.