I repeated tests from
part 1 and
part 2 using a system that has an older/smaller CPU but more recent versions of glibc & Linux. In this case the CPU has dual sockets with 6 cores per socket and 24 vCPUs with HT enabled. The host uses Fedora, glibc 2.17 and Linux kernel 3.11.9-200. Tests were run up to 512 threads. There was an odd segfault at 1024 threads with the TTASFutexMutex that I chose not to debug. And tests were run for 1, 4 and 12 mutexes rather than 1, 4 and 16 because of the reduced vCPU count. The lock hold duration was ~6000 nsecs rather than ~4000 because of the different CPU.
My conclusions from this platform were less strong than from the previous tests.
- The InnoDB syncarray mutex still has lousy behavior at high concurrency but it is less obvious here because I did not run a test for 1024 threads.
- pthread default is usually better than pthread adaptive, but in at least one case pthread adaptive was much better
- The new custom InnoDB mutex, TTASFutexMutex, was occasionally much worse than the alternatives. From looking at code in 5.7, it looks like the choice to use it is a compile time decision. If only to figure out the performance problems this choice should be a my.cnf option and it isn't clear to me that TTASFutexMutex is ready for prime time.
Graphs for 0 nsec lock hold
Graphs for 1000 nsec lock hold
Graphs for 6000 nsec lock hold
The futex makes a kernel call, if the think time is 0 then the spin and loop must be increased. Otherwise the cost shifts to the kernel.
ReplyDeleteCompare it to the old-style InnoDB mutex, "inno syncarray" in the graph above:
ReplyDelete* they have similar busy-wait loops
* inno syncarray suffers from broadcast on unlock, inno futex only waits one waiter
So inno futex should be strictly better than inno syncarry on the graphs from part2 and part3 (here), ignoring some variance that occurs, but it definitely is not and I don't think variance explains it.
I like this little program you've written, extremely useful. I'm still trying to figure out if I'm running it correctly. I want to run the test in the first graph. This is my take.
ReplyDelete./innotsim 64 1000000 0 4 1 (futex | inno2) 1 0 1
From my shell script the args are:
ReplyDelete./innotsim $nthr $nloops $thinkd $spinr $spind $y $nmux $retry_after_reserve $max_spinners
nthr - number of user threads
nloops - number of loop iterations per user thread
thinkd - number of iterations to do the work loop (ut_busy, which is ut_delay minus the pause instruction)
spinr - number of rounds for the busy wait loop
spind - number of loop iterations for ut_busy
y - inno, futex, posixadapt, etc
nmux - number of mutexes, threads evenly distributed across mutexes
retry_after_reserve - number of times to try to get the lock after reserving sync array slot, 4 is what innodb uses
max_spinners - max number of concurrent spinners for the new mutex variations I added that can limit max spinning threads -- posixgnspin, posixlnspin
Also, I compile via: g++ -DFUTEX_ON -Wall -O2 -g -o innotsim innotsim.c -lpthread -lm
ok, thanks. Let me try and reproduce the results and attempt to understand what's going on. Should help in improving the mutex code.
ReplyDeleteI've tried replacing the log_t::mutex with a POSIX mutex and also other hot mutexes in the code. It is trivial to do that with the new infrastructure. While they perform well in synthetic benchmarks. In practice I see higher contention and a bigger drop in QPS.
ReplyDeleteMy request at this point is for performance of futex-mutex to be better than the sync-array mutex all of the time. Today it appears to be better sometimes and worse sometimes. I opened http://bugs.mysql.com/bug.php?id=73763 for this.
ReplyDelete