Small Datum: A simple test to measure CPU per IO

Wednesday, June 26, 2024

A simple test to measure CPU per IO

What should I expect with respect to CPU overhead and latency when using the public cloud. I won't name the vendor here because they might have a DeWitt Clause.

Hardware

My server has 16 real cores, HT or SMT disabled, Ubuntu 22.04 and ext4 is used in all cases. The two IO setups tested are:

local - 2 NVMe devices with SW RAID 0
network - 1TB of fast cloud block storage that is backed by SSD and advertised as being targeted for database workloads.

Updates:

Fixed a silly mistake in the math for CPU usecs per block read

Benchmark

This uses fio with O_DIRECT to do 4kb block reads. My benchmark script is here it is run by the following command lines and I ignore the result of the first run:

for d in 8 16 32 ; do bash run.sh local2_iod${d} /data/m/t.fio io_uring $d 300 512G ; done

for d in 4 8 16 32 ; do bash run.sh network_iod${d} /data2/t.fio io_uring $d 300 900G ; done

Results

I compute CPU usecs as: (((vmstat.us + vmstat.sy)/100) * 16 * 1M) / IOPs where

vmstat.us, vmstat.sy - the average value for the us (user) and sy (system) columns in vmstat
16 - the number of CPU cores
1M - scale from CPU seconds to CPU microseconds
IOPs - average number of r/s per fio

With a queue depth of 8

local: ~54k reads/s at ~150 usecs latency and ~10.16 CPU usecs/read
network: ~15k reads/s at ~510 usecs latency and ~12.61 CPU usecs/read

At queue depth =16 I still get ~15k reads/s from network so the setup is already saturated at queue depth =8 and ignore the results for =16.

From these results and others that I have not shared the CPU overhead per read from using cloud block storage is ~2.5 CPU usecs in absolute terms and ~24% in relative terms. I don't think that is bad.

Small Datum

Wednesday, June 26, 2024

A simple test to measure CPU per IO

No comments:

Post a Comment

Common prefix skipping, adaptive sort