Tuesday, September 20, 2016

MyRocks and InnoDB with large objects and compression

I ran tests to explain the difference between MyRocks and InnoDB when storing large objects and data with varying amounts of compressibility.

Compression in MyRocks is simpler than in InnoDB. You should expect the database to use about 1.1X times the size of the compressed output. When rows compress to 60% of their original size and are 10kb before compression, then each row should use about 6.6kb in the database. The 1.1X adjustment is for space-amplification from leveled compaction.

Predicting the space used for InnoDB is harder. First, large LOB column are not stored inline and overflow pages are not shared. Second, disk pages have a fixed size and you risk using too much space or getting too many page splits when searching for a good value for key_block size. More details are here.

I ran two tests for two types of data. The first test is an insert only workload in PK-order for data with varying amounts of compressibility. The second test determined how fast point queries could be done on that data while rate-limited inserts were in progress. By varying amounts of compressibility I mean that there was one large varchar column per row and that 20%, 45%, 75% or 95% of the data in the column was random and the remainder was constant and easily compressed. Both tests used one connection for inserts. The query test also used one connection for queries.

The test pattern was run twice. In both cases the large column was a varchar. In the first case it had a length between 10,000 and 20,000 characters. In the second case it had a length between 100 and 1000 characters. The database block size was 16kb for MyRocks and InnoDB.

Insert only


For the insert-only workload the space used for MyRocks can be predicted from the compressibility of the data. That is much less true for InnoDB. For example compressed InnoDB uses about the same amount of space for pctRand in 20, 45 and 75.

MyRocks used the least amount of space. InnoDB used much more space when the column was larger (10,000 to 20,000 vs 100 to 1000). Overflow pages are the root cause.

The insert rates are better for MyRocks than for InnoDB. They were also stable for MyRocks and uncompressed InnoDB independent of the compressibility. Rates for uncompressed InnoDB are better than compressed InnoDB. While this wasn't a performance benchmark, it matches many other results I get. It is hard to get performance and compression from InnoDB.  The CPU overhead per insert was similar between MyRocks and uncompressed InnoDB. CPU overheads were mostly larger for compressed InnoDB.

Legend for the data:
  • ips - inserts per second
  • size - database size in GB at test end
  • Mcpu - microseconds of CPU per insert
  • pctRand - percentage of random data in large column
  • engine - rx.zlib-6 is MyRocks with zlib level 6 compression. i6n is InnoDB in MySQL 5.6.26 without compression. i6c is InnoDB in MySQL 5.6.26 with compression.

column up to 20,000       column up to 1000
ips     size    Mcpu      ips     size    Mcpu    pctRand engine
5489      7.7    1090     34468   11      151     20      rx.zlib-6
5540     16      1127     34824   19      149     45
5532     24      1307     34517   27      166     75
5523     30      1467     34701   33      160     95

ips     size    Mcpu      ips     size    Mcpu    pctRand engine
3995     87       933     23470   66      173     20      i6n
3981     87       928     23704   66      174     45
3981     86       917     23487   66`     175     75
3995     88       914     23658   66      176     95

ips     size    Mcpu      ips     size    Mcpu    pctRand engine
3339     36      1064     13429   33      262     20      i6c
2779     32      1278     13124   33      271     45
2133     35      1750      8767   30      392     75
1757     50      2061      7228   38      461     95

Point queries


MyRocks provides the best compression, the best query throughput, and the east CPU overhead per query. My conclusions for InnoDB space consumption are similar to the results from the insert-only workload.

Legend for the data:
  • qps - queries per second
  • size - database size in GB at test end
  • Mcpu - microseconds of CPU per query
  • pctRand - percentage of random data in large column
  • engine - rx.zlib-6 is MyRocks with zlib level 6 compression. i6n is InnoDB in MySQL 5.6.26 without compression. i6c is InnoDB in MySQL 5.6.26 with compression.

qps     size    Mcpu      qps     size    Mcpu    pctRand engine
 984      9.3    4308     2214    11      1585    20      rx.zlib-6
 910     19      4532     2113    19      1627    45
 846     30      4952     2102    27      1601    75
 795     37      5598     2051    33      1691    95

qps     size    Mcpu      qps     size    Mcpu    pctRand engine
 628    113      6240     1302    62      2527    20      i6n
 624    110      6226     1300    63      2501    45
 624    114      6312     1302    63      2536    75
 628    115      6218     1305    66      2474    95

qps     size    Mcpu      qps     size    Mcpu    pctRand engine
 708     38      5560      770    34      4450    20      i6c
 629     39      6643      687    34      4895    45
 513     44      8494      589    30      6046    75
 418     57     10619      576    39      6599    95


1 comment:

  1. Yes - same host but workload was different. Here it is modified insert benchmark client with 1 insert thread and optionally 1 query thread

    ReplyDelete

Speedb vs RocksDB on a large server

I am happy to read about storage engines that claim to be faster than RocksDB. Sometimes the claims are true and might lead to ideas for mak...