Monday, November 6, 2017

Questions I have about workloads with high write rates

I am sure I borrowed some of these from co-workers. When I read discussions about write-heavy workloads I ask myself some questions. I assume VoltDB and Tarantool has answers, but these are hard problems. I know that many of my write heavy MySQL tests ignore many of these problems because I usually run them with replication disabled.
  1. How long do you expect the storage device to last for it? Is it OK to replace the storage device once per month because you need too many DWPD?
  2. Do you have network capacity to support that QPS load on a master? Assume all servers in the rack get that QPS and the QPS source is not rack local.
  3. Do you have network capacity to replicate that write load to a replica? Assume some replicas are far away.
  4. Can the replica replay those writes fast enough to avoid lag?
  5. Will you be able to keep enough replication log archives to support PITR?
  6. How will you take backups?
  7. Will restore ever finish applying logs during the catch up phase?


  1. John Hugg from VoltDB here. I'm curious what you mean by heavy write workloads. Most of our customers are running thousands to tens of thousands of transactions (stored procs) per second, and then using 2-10x that number of actual SQL statements. Some customers are averaging six figures, with some burstiness closer to 1M TPS. I suspect FB could achieve better utilization would might change some of the answers below.

    Most of these transactions are fairly small on the wire, but not all.

    1. Good question. I don't believe anyone has ever complained, but I'm a bit surprised at this. VoltDB does write full snapshots to truncate logs on a regular basis, but there are a few mitigating factors. First, snapshots are compressed and have no index or materialized data, so they are often much smaller than in-memory data. Meanwhile the between-snapshot logs are logical, which often helps with size compared to binary logs (not always).

    2. This depends a lot on the workload type. If the transaction is small on the wire, (e.g. give me a #5 with cheese), then network usually isn't the issue. If the transaction is more KV in nature with blobs, then network quickly becomes the issue. The YCSB benchmark is typically network limited for us, but not our Voter benchmark.

    3. Same answer as above for local replication. Cross-DC replication is binary (not logical) and does need some capacity planning, but also uses more compression too.

    4. No customer is fully utilizing internal throughput capacity. Most size clusters for memory size; some size for networks; a few size for desired redundancy. People complain that CPUs are too idle. I wouldn't be surprised if someone sized for recovery time, but I don't know of anyone who has.

    5. Our command logs (logical WAL) support this, but they get periodically truncated. There's no technical reason they couldn't be kept longer (besides space).

    6. VoltDB clusters support transactionally consistent, CoW snapshots. It's livin the dream™.

    7. In general, replaying a log is faster than running the transactions for the first time. There are some nice optimizations you can make (for starters, you don't have to log to disk). Loading a snapshot and re-generating index and view data can take a while depending on data size, schema and hardware. Still, as I hinted in question 4, you can size your cluster to achieve the desired time.

    1. Thanks for answers. Given the loads that VoltDB is handling per server I assume you have a lot of experience with these questions.

      I will remain vague about the meaning of "write heavy". Do we measure it by?
      1) GB/s on network between client & server?
      2) statements/second
      3) stored procedure calls/second (I wish I had performant stored procs like you do)
      4) row rates (fetched/second, read/second, modified/second)