RocksDB is used for OLTP and supports workloads with high concurrency from user requests and from compaction done in the background. There might be 8 threads doing compaction on a busy server and a typical compaction step done by a thread is to read from ~11 SST files and then write ~11 SST files. While each SST file is read sequentially, all of the files are read at the same time. The SST files are written in sequence. So we have ~12 streams of IO per compaction thread (11 for reads, 1 for write) and with 8 compaction threads then we can have ~96 streams of concurrent IO. The reads are done a block at a time (for example 16kb) and depend on file system readahead to get larger requests and reduce seeks. The write requests are likely to be large because the files are multi-MB and fsync is called when all of the data to the file has been written or optionally when sync_file_range is called after every N MB of writes.
Even if we limit our focus to writes there is a lot of concurrency as there is likely 1 write stream in progress for each compaction thread and another for the WAL. The reads done by compaction are interesting because many of them will get data from the OS filesystem cache. When level style compaction is used, the small levels near the top of the LSM are likely to be in cache while the largest level is not.
Reads vs stripe sizeEnough about RocksDB, this post is supposed to be about the performance impact from RAID stripe sizes. I ran a simple IO performance test on two servers that were identical except for the RAID stripe size. Both had 15 disks with HW RAID 0, one used a 256KB stripe and the other a 1MB stripe. While the disk array is ~55TB I ran tests limited to the first 10TB and used the raw device. Domas provided the test client and it was run for 1 to 128 threads doing 1MB random reads.
The array with a 1MB stripe gets more throughput when there are at least 2 threads doing reads. At high concurrency the array with a 1MB stripe gets almost 2X more throughput.
Read MB/second by concurrency
1 2 4 8 12 16 24 32 64 128 threads
67 125 223 393 528 642 795 892 891 908 1MB stripe
74 115 199 285 332 373 407 435 499 498 256KB stripe