Small Datum: First day with InnoDB transparent page compression

Friday, August 28, 2015

First day with InnoDB transparent page compression

I ran linkbench overnight for a database that started at 100G using MySQL 5.7.8 and InnoDB transparent page compression. After ~24 hours I have 1 mysqld crash with nothing in the error log. I don't know if that is related to bug 77738. I will attach gdb and hope for another crash. For more about transparent page compression read here, here and here. For concerns about the feature see the post by Domas. I previously wrote about this feature.

On the bright side, this is a great opportunity for MyRocks, the RocksDB storage engine for RocksDB.

Then I ran 'dmesg -e' and get 81 complaints from XFS on the host that uses transparent compression. The warnings are from the time when the benchmark ran. My other test host isn't using hole-punch and doesn't get these warnings. I get the same error message below from a host with CentoOS 6.6 host using a 3.10.53 kernel.

[Aug27 05:53] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
[ +1.999375] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
[ +1.999387] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
[ +1.983386] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
[ +1.999379] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)

The host has Fedora 19 and the kernel is 3.14.27-100.fc19.x86_64. I don't know if Fedora 19 is officially supported. I know that hole punch is available because this is in the error log:
    [Note] InnoDB: PUNCH HOLE support available

And this was used in a create table statement:
    ENGINE=InnoDB COMPRESSION="zlib" DEFAULT CHARSET=latin1;

From my host without hole-punch where the files for the table are ~44G.
    $ xfs_bmap /ssd1/scratch/data.mdcallag/data/linkdb/linktable*.ibd | wc -l
    11446

And the host with it where the files for the table are ~104G according to ls but are much smaller because of the holes.
    $ xfs_bmap /ssd1/scratch/data.mdcallag/data/linkdb/linktable.ibd | wc -l
    11865620

I don't like the vmstat output either. On the host that uses transparent page compression swap is being used and that started during the linkbench load. It is not being used on the other host. Doesn't look right.

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b   swpd   free   buff cache   si   so    bi    bo   in   cs us sy id wa
13 6 4095996 8410200   1164 6298324    0    0    65   534    0    0 8 1 89 2

But wait, there is more

This is from a host with CentOS 6.6 and the 3.10.53 kernel. After running for a few hours with transparent page compression mysqld got stuck. I looked in the database error log and there was the standard output from threads getting long waits on InnoDB synchronization objects. I attached gdb to look at thread stacks, and that took ~15 minutes just to attach. Fortunately I got stacks before needing to reboost this host. All threads appear to be blocked on calls into the kernel. This gist shows two of the threads -- one is stuck doing aio and another trying to do fallocate64 called from os_file_punch_hole.

Deadlocked mysqld + reboot server == victory? I filed bug 78277 for this. After the reboot dropping a ~72G ibd file that had been using hole-punch took ~3 minutes. Then I created an 80G file and dropping that took 10 seconds. It a good idea to have database files that take minutes to drop given that InnoDB and filesystem state can get out of sync during a crash requiring manual repair.

And more

I have two CentOS 6.6 servers with the 3.10.53 kernel and XFS. The host in the previous section doesn't use partitioning. The host in this section uses partitioning and transparent page compression. They both get InnoDB deadlocks but this host was healthy enough the the InnoDB background thread was able to kill the server. This was triggered by a DROP DATABASE statement. I don't have thread stacks but will guess that a file unlink operation took too long or got stuck. Because DDL isn't transactional the data dictionary isn't in a good state at this point. Some of the per-partition ibd files are missing.

Reboot #2 has arrived. Did kill -9 on mysqld and the process is still there in state Z after 40 minutes. Delete the database directory but space is still in use according to df. I assume that space won't be released until the zombie process goes away. I give up on trying to use transparent compression on this host.