Thursday, August 28, 2014

The InnoDB Mutex, part 3

I repeated tests from part 1 and part 2 using a system that has an older/smaller CPU but more recent versions of glibc & Linux. In this case the CPU has dual sockets with 6 cores per socket and 24 vCPUs with HT enabled. The host uses Fedora, glibc 2.17 and Linux kernel 3.11.9-200. Tests were run up to 512 threads. There was an odd segfault at 1024 threads with the TTASFutexMutex that I chose not to debug. And tests were run for 1, 4 and 12 mutexes rather than 1, 4 and 16 because of the reduced vCPU count. The lock hold duration was ~6000 nsecs rather than ~4000 because of the different CPU.

My conclusions from this platform were less strong than from the previous tests.
  • The InnoDB syncarray mutex still has lousy behavior at high concurrency but it is less obvious here because I did not run a test for 1024 threads.
  • pthread default is usually better than pthread adaptive, but in at least one case pthread adaptive was much better
  • The new custom InnoDB mutex, TTASFutexMutex, was occasionally much worse than the alternatives. From looking at code in 5.7, it looks like the choice to use it is a compile time decision. If only to figure out the performance problems this choice should be a my.cnf option and it isn't clear to me that TTASFutexMutex is ready for prime time.

Graphs for 0 nsec lock hold

Graphs for 1000 nsec lock hold

Graphs for 6000 nsec lock hold


  1. The futex makes a kernel call, if the think time is 0 then the spin and loop must be increased. Otherwise the cost shifts to the kernel.

  2. Compare it to the old-style InnoDB mutex, "inno syncarray" in the graph above:
    * they have similar busy-wait loops
    * inno syncarray suffers from broadcast on unlock, inno futex only waits one waiter

    So inno futex should be strictly better than inno syncarry on the graphs from part2 and part3 (here), ignoring some variance that occurs, but it definitely is not and I don't think variance explains it.

  3. I like this little program you've written, extremely useful. I'm still trying to figure out if I'm running it correctly. I want to run the test in the first graph. This is my take.

    ./innotsim 64 1000000 0 4 1 (futex | inno2) 1 0 1

  4. From my shell script the args are:
    ./innotsim $nthr $nloops $thinkd $spinr $spind $y $nmux $retry_after_reserve $max_spinners

    nthr - number of user threads
    nloops - number of loop iterations per user thread
    thinkd - number of iterations to do the work loop (ut_busy, which is ut_delay minus the pause instruction)
    spinr - number of rounds for the busy wait loop
    spind - number of loop iterations for ut_busy
    y - inno, futex, posixadapt, etc
    nmux - number of mutexes, threads evenly distributed across mutexes
    retry_after_reserve - number of times to try to get the lock after reserving sync array slot, 4 is what innodb uses
    max_spinners - max number of concurrent spinners for the new mutex variations I added that can limit max spinning threads -- posixgnspin, posixlnspin

    Also, I compile via: g++ -DFUTEX_ON -Wall -O2 -g -o innotsim innotsim.c -lpthread -lm

  5. ok, thanks. Let me try and reproduce the results and attempt to understand what's going on. Should help in improving the mutex code.

  6. I've tried replacing the log_t::mutex with a POSIX mutex and also other hot mutexes in the code. It is trivial to do that with the new infrastructure. While they perform well in synthetic benchmarks. In practice I see higher contention and a bigger drop in QPS.

  7. My request at this point is for performance of futex-mutex to be better than the sync-array mutex all of the time. Today it appears to be better sometimes and worse sometimes. I opened for this.