Tuesday, July 25, 2023

How do you limit MVCC GC debt?

Too much MVCC GC debt usually means bad & weird performance along with too much space-amplification. And too much space-amp means the database will be much larger than needed. How do you limit MVCC GC debt for MyRocks, InnoDB and Postgres? This can be an issue for write-heavy workloads especially when the writes aren't slowed by doing reads from storage (because the working-set is cached) or by fsync on commit (because fsync is fast or disabled).

This problem appeared recently for both InnoDB and Postgres while doing write-heavy tests (Insert Benchmark) with a database that could be cached (assuming there isn't too much MVCC GC debt).

Limits are set via:

  • MyRocks
  • InnoDb
    • tune purge. Note that innodb_max_purge_lag is 0 by default meaning there is no limit and your workload is free to create too much MVCC GC debt. This is easy to notice with file-per-table enabled when purge falls behind -- table *.ibd files will be much larger than expected if purge isn't removing delete-marked rows as fast as they are being created. The Innodb history list length is a proxy for the amount of MVCC GC debt.
  • Postgres
    • I don't know if this is possible. Certainly write rates will slow down a lot when vacuum falls too far behind. But that is a bit of a disaster and I am not sure if it is possible to set configuration options that will make writes slower (as done by InnoDB and MyRocks) prior to the disaster. Alas, my MyRocks and InnoDB skills exceed my Postgres skills so I won't rule out that I don't know the answer or forgot it. I also don't know if there are any counters or queries that can be done to estimate MVCC GC debt.

No comments:

Post a Comment

Evaluating vector indexes in MariaDB and pgvector: part 2

This post has results from the ann-benchmarks with the   fashion-mnist-784-euclidean  dataset for MariaDB and Postgres (pgvector) with conc...