Monday, April 25, 2016

TRIM, iostat and Linux

I use iostat and vmstat to measure how much CPU and storage is used during my performance tests. Many of the database engines have their own counters to report disk IO but it is good to use the same measurement across engines. I use the "-k" option with iostat so it reports KB written per second per device.

The rate of writes to storage can be overstated by a factor of two in one case and I don't think this is widely known. When TRIM is done for an SSD then the Linux kernels that I use report that as bytes written. If I create an 8G file then I will see at least 8G of writes reported by iostat. If I then remove the file I will see an additional 8G of writes reported by iostat assuming TRIM is used. But that second batch of 8G of writes wasn't really writes.

One of the database engines that I evaluate, RocksDB, frequently creates and removes files. When TRIM is counted as bytes written then this overstates the amount of storage writes done by RocksDB. The other engines that I evaluate do not create and remove files as frequently -- InnoDB, WiredTiger, TokuDB, mmapv1.

The best way to figure out whether TRIM is done for your favorite SSD is to test it yourself.
  1. If TRIM is done then iostat reports TRIM as bytes written. 
  2. If iostat reports TRIM as bytes written and your database engine frequently removes files then iostat wKB/second might be overstated.

Testing this:


My test case is:

output=/path/to/many/GB/file/this/will/create

# run this long enough to get a file that is many GB in size

dd if=/dev/zero of=$output bs=1M oflag=direct &
dpid=$!

sleep 30
kill $dpid


iostat -kx 1 >& o.io &
ipid=$!
sleep 3
rm -f $output; sync
sleep 10
kill $ipid
# look at iostat data in o.io

Example iostat output from a 4.0.9 Linux kernel after the rm command:
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
md2               0.00     0.00    0.00 65528.00     0.00 8387584.00   256.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00 65538.00     0.00 8387632.00   255.96     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00 65528.00     0.00 8387584.00   256.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00 65528.00     0.00 8387584.00   256.00     0.00    0.00   0.00   0.00

Example iostat output from a 3.10.53 Linux kernel after the rm command:
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00 31078.00     0.00 3977984.00   256.00     0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00 283935.00     0.00 36343552.00   256.00     0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00 288343.00     0.00 36907908.00   256.00     0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00 208534.00     0.00 26692352.00   256.00     0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...