Friday, January 12, 2018

XFS, nobarrier and the 4.13 Linux kernel

tl;dr

My day
  • nobarrier isn't supported as a mount option for XFS in kernel 4.13.0-26 with Ubuntu 16.04. I assume this isn't limited to Ubuntu. Read this for more detail on the change.
  • write throughput is much worse on my SSD without nobarrier
  • there is no error on the command line when mounting a device that uses the nobarrier option
  • there is an error message in dmesg output for this

There might be two workarounds:
  • switch from XFS to ext4
  • echo "write through" > /sys/block/$device/queue/write_cache

The Story

I have a NUC cluster at home for performance tests with 3 NUC5i3ryh and 3 NUC7i5bnh. I recently replaced the SSD devices in all of them because previous testing wore them out. I use Ubuntu 16.04 LTS and recently upgraded the kernel on some of them to get the fix for Meltdown.

The NUC7i5bnh server has a Samsung 960 EVO SSD that uses NVMe. I use the HWE kernel to make wireless work. The old kernel without the Meltdown fix is 4.8.0-36 and the kernel with the Meltdown fix is 4.13.0-26. Note that with the old kernel I used XFS with the nobarrier option. With the new kernel I assumed I was still getting nobarrier, but I was not. I have since switched from XFS to ext4.

The NUC5i3ryh server has a Samsung 850 EVO SSD that uses SATA. The old kernel without the Meltdown fix is 4.4.0-38 and the kernel with the Meltdown fix is 4.4.0-109. I continue to use XFS on these.

Results sysbench for NUC5i3ryh show not much regression from the Meltdown fix. Results for the NUC7i5bnh show a lot of regression for the write-heavy tests and not much for the read-heavy tests.
  • I started to debug the odd 7i5bnh results and noticed that write IO throughput was much lower for servers with the Meltdown fix using 4.13.0-26. 
  • Then I used sysbench fileio to run IO tests without MySQL and noticed that read IO was fine, but write IO throughput was much worse with the 4.13.0-26 kernel.
  • Then I consulted my local experts, Domas Mituzas and Jens Axboe.
  • Then I noticed the error message in dmesg output

2 comments:

  1. I just looked into as I was updating our DB server, and XFS now picks up whether to use barriers or not, from the block layer. The block layer picks this up automatically from what the device advertises to the kernel. ( this is depending if the raid has battery backup or if the drive has power loss capacitor protection )I verified that on our cheap ssds the kernel has "write back"  ( cat /sys/block/nvme0n1/queue/write_cache ) which tells XFS to use barriers, and on our optane 4800x with have power loss caps the kernel has the correct "write through", which tells XFS to skip barriers.

    ReplyDelete

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...