Friday, December 4, 2020

Tuning for the Insert Benchmark: MyRocks

This presents performance and efficiency results for MyRocks on the Insert Benchmark using MySQL 5.6.35 and a small server with 4 CPU cores, 16G of RAM and a fast SSD. The goal is to understand the impact of my.cnf options and I have results for 17 different configurations. 

Summary

I use scripts to generate the performance reports. The reports are here for the in-memoryless IO-bound and more IO-bound database sizes. My summary of the results is below. Some of the conclusions are specific to this workload and HW so be careful about applying this advice.

A summary of the results is below. This one is easy.

  • Configurations have little impact on throughput for the in-memory and less IO-bound databases
  • Throughput is lousy with the minimal configuration for the more IO-bound database size

My focus in on throughput. There are larger impacts on HW efficiency metrics that I ignore. The impact of configuration should be more interesting for the benchmark run on larger HW with more concurrency, but that is for a future post.

Tests

The insert benchmark was run for three database sizes that I call in-memory, less IO-bound and more IO-bound. The database was cached in memory for the in-memory size but not for the others. The initial load size was 20M rows for in-memory, 100M rows for less IO-bound and 500M rows for more IO-bound. Additional rows were inserted after the initial load, but most of the inserts were done by the initial load.

Tests were run at most 2 threads/connections -- one for inserts and another for queries -- and I consider that to be 1 insert benchmark client.

The benchmark has several steps as described here -- initial load, create 3 secondary indexes, load more data, read+write. Each read+write step is run for 30 minutes with a query thread that runs as fast as possible and a writer that that is rate limited to 100, 100, 200, 200, 400, 400, 600, 600, 800, 800, 1000 and 1000 inserts/s. The load more data step inserts 20M, 10M and 10M more rows for the in-memory, less IO-bound and more IO-bound workloads.

In the performance reports, the steps are named:

  • l.i0 - initial load
  • l.x - create 3 secondary indexes
  • l.i1 - load more data 
  • q100.2, q200.2, ..., q1000.2 - read+write where the insert rate limit is 100, 200, ..., 1000 and the .2 means the second run at each rate. The reports focus on the second run.

Configurations

All tests were done with fsync-on-commit disabled to avoid becoming an fsync latency benchmark. The option names below leave off rocksdb_ to save space. I also leave off my.cnf. when naming config files.

  • my.cnf.cx0 - minimal configuration
  • my.cnf.cx1 - cx0 + block_cache_size=10G
  • my.cnf.cx3 - cx1 +
    • max_background_jobs=2, default_cf_options=level_compaction_dynamic_level_bytes=true
  • my.cnf.cx4a - cx3 +
    • default_cf_options += block_based_table_factory= {cache_index_and_filter_blocks=1}
  • my.cnf.cx4b - cx3 +
    • default_cf_options += block_based_table_factory= {cache_index_and_filter_blocks=1; filter_policy=bloomfilter:10: false; whole_key_filtering=1}; optimize_filters_for_hits=true
  • my.cnf.cx5a - cx4b + 
    • default_cf_options += max_write_buffer_number=4; write_buffer_size=64m; max_bytes_for_level_base=256m
  • my.cnf.cx5b - cx4b +
    • default_cf_options += max_write_buffer_number=4; write_buffer_size=128m; max_bytes_for_level_base=512m
  • my.cnf.cx5c - cx4b +
    • default_cf_options += max_write_buffer_number=4; write_buffer_size=256m; max_bytes_for_level_base=1024m
  • my.cnf.cx6a - cx5c +
    • default_cf_options += compression_per_level=kNoCompression 
  • my.cnf.cx6b - cx5c +
    • default_cf_options += compression_per_level=kNoCompression; bottommost_compression=kLZ4Compression
  • my.cnf.cx6c - cx5c +
    • default_cf_options += compression_per_level=kNoCompression; bottommost_compression=kZSTD
  • my.cnf.cx8 - cx6b + default_cf_options += format_version=5
  • my.cnf.cx9a - cx6b + block_size=8192
  • my.cnf.cx9b - cx6b + block_size=16384
  • my.cnf.cx10 - cx6b + bytes_per_sync=1M, wal_bytes_per_sync=1M
  • my.cnf.cx11 - cx6b + delayed_write_rate=32M
  • my.cnf.cx12 - cx6b + max_open_files=-1

































No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...