I am trying to understand why 4kb random reads from an SSD are about 2X slower when using a filesystem vs a raw device and this reproduces across different servers but I am only sharing the results from my home Intel NUCs. The symptoms are, the read response time is:
- ~2X larger per iostat's r_await for reads done via a filesystem vs a raw device (.04 vs .08 millisecs)
- ~3X larger per blkparse for reads done via a filesystem vs a raw device (~16 vs 50+ microsecs)
My test scripts for this are rfio.sh and do_fio.sh.
Update - the mystery has been solved thanks to advice from an expert (Andreas Freund) who is in my Twitter circle, and engaging with experts I would never get to meet in real life is why I use Twitter. I have been running the raw device tests with the device mostly empty and the fix is to run the test with it mostly full otherwise the SSD firmware can do something special (and faster) when reading data that was never written.
I used fio for 3 configurations: raw device, O_DIRECT and buffered IO. For O_DIRECT and buffered IO the filesystem is XFS and there were 8 20G files on a server that has 16G of RAM. The results for O_DIRECT and buffered IO were similar.
- IOPs are similar
- CPU is higher with a filesystem, ~1.2X with O_DIRECT and ~1.5X with buffered IO vs raw for numjobs >= 8. But this just means a few more microseconds as the cost is ~8, ~9, ~12 microseconds per IO for raw, O_DIRECT and buffered. I don't know yet how to explain the larger CPU/IO numbers for numjobs < 8. Is that a warmup cost or is something amortized?
- iostat r_await is similar
- iostat rareq-sz is the same