Posts

Showing posts from July, 2023

Tuning MyRocks for the Insert Benchmark on a large server

I used the  Insert Benchmark  on a small server to see if I could improve the configuration (my.cnf) I have been using. tl;dr The good Performance is much better when subcompactions and the hyper clock cache are enabled. The not as good Peak RSS is larger for configs that enable the hyper clock cache. The problem here is not as large as it is on pure open source builds that I test because the version of jemalloc used here is tuned differently. Understanding this is a work in progress. The select min(transactionid) from TABLE query used at the start of benchmark steps can take up to 20 seconds when it should finish in a few milliseconds. I have yet to try to fix this via tuning. Updates Added values from the RocksDB perf context counters to explain the slow select min() query Builds I used MyRocks from FB MySQL 8.0.28 with source from June 2023 at git hash  ef5b9b101.  Benchmark The insert benchmark was run in three configurations. cached by RocksDB - all tables fit in the RocksDB bl

Tuning MyRocks for the Insert Benchmark on a small server

I used the  Insert Benchmark  on a small server to see if I could improve the configuration (my.cnf) I have been using. tl;dr With jemalloc the peak RSS for mysqld is larger with rocksdb_use_hyper_clock_cache=ON so I reduce the value of rocksdb_block_cache_size from 8G to 6G for some configurations. This isn't fully explained but experts are working on it. The base config (a0) is good enough and the other configs don't provide a significant improvement. This isn't a big surprise, while the hyper clock cache and subcompactions are a big deal on larger servers the server in this case is small and the workload has low concurrency. In some cases the a3 config that disables intra-L0 compaction hurts write throughput. This result is similar to what I measured on a larger server. Builds I used MyRocks from FB MySQL 8.0.28 using the rel_native_lto build with source from June 2023 at git hash  ef5b9b101.  Benchmark The insert benchmark was run in three configurations. cached by

How do you limit MVCC GC debt?

Too much MVCC GC debt usually means bad & weird performance along with too much space-amplification. And too much space-amp means the database will be much larger than needed. How do you limit MVCC GC debt for MyRocks, InnoDB and Postgres? This can be an issue for write-heavy workloads especially when the writes aren't slowed by doing reads from storage (because the working-set is cached) or by fsync on commit (because fsync is fast or disabled). This problem appeared recently for both InnoDB and Postgres while doing write-heavy tests (Insert Benchmark) with a database that could be cached (assuming there isn't too much MVCC GC debt). Limits are set via: MyRocks Tune slow_pending_compaction_bytes_limit and hard_pending_compaction_bytes_limit to limit the total amount of compaction debt. I think the hard/soft pending limits would be easier to use if they were relative rather than absolute, but at least something exists. Tune level0_slowdown_writes_trigger and level0_stop_

Insert benchmark perf over time for MyRocks 5.6 on a large server

I used the Insert Benchmark to compare performance for MyRocks builds from March 2022 through today to understand how performance changes over time. I was unable to go back further in time because of conflicts between old code and a new compiler toolchain. tl;dr A build from March 2022 gets ~15% more inserts/s and ~5% more queries/s when compared to recent builds Builds All builds use MyRocks 5.6.35 but from different points in time. I used the same compiler toolchain and gcc for all builds. The builds are: fbmy5635_202203072101 - from 7 March 2022 at git hash 84ce624a with RocksDB 6.28.2 fbmy5635_202304122154 - from 12 April 2023 at git hash f2161d019 with RocksDB 7.10.2 fbmy5635_202305292102 - from 29 May 2023 at git hash 509203f4 with RocksDB 8.2.1 fbmy5635_jun23_7e40af67 - from 23 June 2023 at git hash 7e40af67 with RocksDB 8.2.1 Benchmark The insert benchmark was run in two configurations. cached by RocksDB - RocksDB block cache caches all tables IO-bound - the database is large

MyRocks, InnoDB and Postgres as a queue

As part of my changes to the Insert Benchmark to support longer-running benchmark steps without growing the size of the benchmark tables I added a feature to optionally do deletes at the same rate as inserts and with this the table is like a queue -- inserts at one end, deletes from the other end. The benchmark steps that use deletes must run a query when they start to determine the minimum value of the PK column (transactionid) currently in the table and in some cases that query is very slow for InnoDB, somewhat slow for MyRocks and not really slow for Postgres. The query is: select min(transactionid) from table . tl;dr Postgres does great, MyRocks and InnoDB have problems I can tune away the problem for InnoDB via the innodb_max_purge my.cnf options but that comes at a big cost -- less write throughput I don't know yet whether I can tune the problem away for MyRocks One response to this post is that doing a queue in a SQL DBMS is an anti-pattern. One response to that response is

Tuning InnoDB for the insert benchmark on a medium server, part 2

This continues work I shared in June for InnoDB with the Insert Benchmark on a medium server. In this post I have results for more my.cnf variations with MySQL 8.0.33. tl;dr I am certain I don't understand how innodb_io_capacity and innodb_io_capacity_max are supposed to work in MySQL 8. I suspect I am not alone. MySQL 8.0.33 is almost always faster than 5.6.51 on a medium server with medium concurrency. This result is different from what I get on my small servers where the benchmarks are run with low concurrency. Does GCP provide any way of detecting a slowly failing SSD? I'd rather not scrape dmesg output to figure this out. I have several benchmark runs that are in doubt because of a failing SSD (those that use the a17, a18 and a19 configs). I am still chasing a workaround for stalls from the page cleaner falling behind, this also affects queries because they might get stalled by doing single-page flushes to make pages clean for reuse. One workaround is Percona Server. Des

my.cnf options that influence create index performance

Image
I measured the time to create indexes for InnoDB and MyRocks using data and schemas from the insert benchmark. tl;dr For InnoDB changing the my.cnf options was more significant when the secondary indexes were created via one alter table statement Increasing innodb_ddl_threads helps, up to a point, when all indexes are created at once Increasing innodb_ddl_buffer_size helps, up to a point, when all indexes are created at once Increasing rocksdb_merge_combine_read_size didn't help Benchmarks All of my scripts and output files are here  and the spreadsheet with all results is here .  The table was created with a PK index, then loaded with N rows (N=1B, 2B, 4B, 8B) and then I measured the time to create three secondary indexes. The secondary indexes were created using two methods: one at a time - create each index separately (three alter table statements) three at once - create all indexes at once (one  alter table  statements) Example my.cnf files are here for InnoDB and for MyRock

Keeping up with the SQL DBMS market

Sometimes it is easier to talk about tech when you can name things. There is much innovation in progress in the SQL DBMS market. The same is true for NoSQL, but I focus on SQL. My current attempt to classify SQL DBMS products is: TradSQL traditional SQL DBMS solutions that arrived long before cloud native was a thing. They can be used as the nodes in ShardSQL. They don't provide ACID across shards, although brave people use XA to get atomic writes across shards. Examples are Oracle, Postgres and MySQL. ShardSQL Run many TradSQL DBMS in one cluster, provide some way to figure out where data exists and you have ShardSQL. This might involve but doesn't require a proxy or middleware. Examples include roll your own, Vitess and CitusDB. These have been popular with web-scale companies and I supported roll your own deployments at Google and Facebook. These provide limited support for cross-shard semantics -- perhaps XA for atomic writes, it will be interesting to see what happens with

Insert+delete benchmark, medium server and MyRocks, part 2

This has more results for MyRocks vs the insert benchmark on a medium server. It expands on work from my previous post by finding a few changes to the config file (my.cnf) that improves performance. tl;dr Variance during the l.i1 benchmark step is much better for MyRocks than for Postgres or InnoDB. The challenge for a b-tree is the read-modify-write cycle during secondary index maintenance and that is done via blind-writes (RocksDB Put operations) and is read-free for MyRocks. Query rates at 1-second intervals have an interesting sawtooth pattern. I assume this is the CPU overhead from searching more data as the write buffer and/or L0 fill up and then empty at regular intervals. Things that help performance Enabling the hyper clock cache improves performance but has a cost. With it enabled the peak RSS of mysqld is larger when using jemalloc. I have enlisted help from experts to figure that out. Enabling subcompactions reduces the time for L0->L1 compactions which means there wil