What should I expect with respect to CPU overhead and latency when using the public cloud. I won't name the vendor here because they might have a DeWitt Clause.
Hardware
My server has 16 real cores, HT or SMT disabled, Ubuntu 22.04 and ext4 is used in all cases. The two IO setups tested are:
- local - 2 NVMe devices with SW RAID 0
- network - 1TB of fast cloud block storage that is backed by SSD and advertised as being targeted for database workloads.
Updates:
- Fixed a silly mistake in the math for CPU usecs per block read
Benchmark
This uses fio with O_DIRECT to do 4kb block reads. My benchmark script is here it is run by the following command lines and I ignore the result of the first run:
for d in 8 16 32 ; do bash run.sh local2_iod${d} /data/m/t.fio io_uring $d 300 512G ; done
for d in 4 8 16 32 ; do bash run.sh network_iod${d} /data2/t.fio io_uring $d 300 900G ; done
Results
I compute CPU usecs as: (((vmstat.us + vmstat.sy)/100) * 16 * 1M) / IOPs where
- vmstat.us, vmstat.sy - the average value for the us (user) and sy (system) columns in vmstat
- 16 - the number of CPU cores
- 1M - scale from CPU seconds to CPU microseconds
- IOPs - average number of r/s per fio
With a queue depth of 8
- local: ~54k reads/s at ~150 usecs latency and ~10.16 CPU usecs/read
- network: ~15k reads/s at ~510 usecs latency and ~12.61 CPU usecs/read
At queue depth =16 I still get ~15k reads/s from network so the setup is already saturated at queue depth =8 and ignore the results for =16.
From these results and others that I have not shared the CPU overhead per read from using cloud block storage is ~2.5 CPU usecs in absolute terms and ~24% in relative terms. I don't think that is bad.
From these results and others that I have not shared the CPU overhead per read from using cloud block storage is ~2.5 CPU usecs in absolute terms and ~24% in relative terms. I don't think that is bad.
No comments:
Post a Comment