Wednesday, June 26, 2024

A simple test to measure CPU per IO

What should I expect with respect to CPU overhead and latency when using the public cloud. I won't name the vendor here because they might have a DeWitt Clause.

Hardware

My server has 16 real cores, HT or SMT disabled, Ubuntu 22.04 and ext4 is used in all cases. The two IO setups tested are:

  • local - 2 NVMe devices with SW RAID 0
  • network - 1TB of fast cloud block storage that is backed by SSD and advertised as being targeted for database workloads.
Updates:
  • Fixed a silly mistake in the math for CPU usecs per block read
Benchmark

This uses fio with O_DIRECT to do 4kb block reads. My benchmark script is here it is run by the following command lines and I ignore the result of the first run:
for d in 8 16 32 ; do bash run.sh local2_iod${d} /data/m/t.fio io_uring $d 300 512G ; done
for d in 4 8 16 32 ; do bash run.sh network_iod${d} /data2/t.fio io_uring $d 300 900G ; done

Results

I compute CPU usecs as: (((vmstat.us + vmstat.sy)/100) * 16 * 1M) / IOPs where
  • vmstat.us, vmstat.sy - the average value for the us (user) and sy (system) columns in vmstat
  • 16 - the number of CPU cores
  • 1M - scale from CPU seconds to CPU microseconds
  • IOPs - average number of r/s per fio
With a queue depth of 8
  • local: ~54k reads/s at ~150 usecs latency and ~10.16 CPU usecs/read
  • network: ~15k reads/s at ~510 usecs latency and ~12.61 CPU usecs/read
At queue depth =16 I still get ~15k reads/s from network so the setup is already saturated at queue depth =8 and ignore the results for =16.

From these results and others that I have not shared the CPU overhead per read from using cloud block storage is ~2.5 CPU usecs in absolute terms and ~24% in relative terms. I don't think that is bad.

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...