Wednesday, June 26, 2024

A simple test to measure CPU per IO

What should I expect with respect to CPU overhead and latency when using the public cloud. I won't name the vendor here because they might have a DeWitt Clause.

Hardware

My server has 16 real cores, HT or SMT disabled, Ubuntu 22.04 and ext4 is used in all cases. The two IO setups tested are:

  • local - 2 NVMe devices with SW RAID 0
  • network - 1TB of fast cloud block storage that is backed by SSD and advertised as being targeted for database workloads.
Updates:
  • Fixed a silly mistake in the math for CPU usecs per block read
Benchmark

This uses fio with O_DIRECT to do 4kb block reads. My benchmark script is here it is run by the following command lines and I ignore the result of the first run:
for d in 8 16 32 ; do bash run.sh local2_iod${d} /data/m/t.fio io_uring $d 300 512G ; done
for d in 4 8 16 32 ; do bash run.sh network_iod${d} /data2/t.fio io_uring $d 300 900G ; done

Results

I compute CPU usecs as: (((vmstat.us + vmstat.sy)/100) * 16 * 1M) / IOPs where
  • vmstat.us, vmstat.sy - the average value for the us (user) and sy (system) columns in vmstat
  • 16 - the number of CPU cores
  • 1M - scale from CPU seconds to CPU microseconds
  • IOPs - average number of r/s per fio
With a queue depth of 8
  • local: ~54k reads/s at ~150 usecs latency and ~10.16 CPU usecs/read
  • network: ~15k reads/s at ~510 usecs latency and ~12.61 CPU usecs/read
At queue depth =16 I still get ~15k reads/s from network so the setup is already saturated at queue depth =8 and ignore the results for =16.

From these results and others that I have not shared the CPU overhead per read from using cloud block storage is ~2.5 CPU usecs in absolute terms and ~24% in relative terms. I don't think that is bad.

No comments:

Post a Comment

Sysbench for MySQL 5.6 through 9.5 on a 2-socket, 24-core server

This has results for the sysbench benchmark on a 2-socket, 24-core server. A post with results from 8-core and 32-core servers is here . tl;...