A rw-lock where reads dominate can suffer from contention because it has at least twice the memory writes per lock/unlock pair compared to a mutex. So when the lock hold time is short a mutex wins even when exclusive access isn't required. This can often be seen in PMP output where there are convoys and the worst-case is when a thread gets stuck trying to get the internal latch during unlock, but the InnoDB custom rw-lock might not have that problem. Lock elision for the rw-lock might be a big deal in this case.
RocksDB might also benefit from this change.
One of the challenges with glibc pthreads is documentation. I previously wrote about the difficulty of finding documentation for PTHREAD_MUTEX_ADAPTIVE_NP. The problem continues. There isn't much about pthreads in a recent version of the glibc manual. From Google searches I wasn't able to find recent docs elsewhere, except for man pages. But man pages don't document PTHREAD_MUTEX_ADAPTIVE_NP. With lock elision we get new options -- PTHREAD_MUTEX_ELISION_NP and PTHREAD_MUTEX_NO_ELISION_MP. Google searches will take you to bits of source code and email list discussions. I hope this can be improved. Given the lack of docs you might need to read the source. I hope that the community (web-scale companies) can sponsor a tech writer to provide the missing docs.
There has been drama because the introduction of this feature failed when it encountered buggy microcode on certain CPUs. Then there was more drama when it broke buggy software that worked despite the bugs, until lock elision made the bugs serious. Google searches find many of the stories.
One of my favorite perks at work is getting answers from experts. In this case the expert is Nathan Bronson (thank you). A summary of the glibc 2.23 implementation per the expert is:
- NPTL lock elision is performed using TSX's RTM (Restricted Transactional Memory) instructions XBEGIN, XEND, and XABORT, rather than TSX's HLE (Hardware Lock Elision) instructions XACQUIRE and XRELEASE
- On x86, elision support is always present when detected by HAS_CPU_FEATURE(RTM)
- pthread_rwlock_t always attempts elision if the hardware has it (both for .._rdlock and .._wrlock)
- pthread_rwlock_t uses an adaptive strategy for falling back to the non-TSX implementation. If the lock is held in a non-TSX mode, there is a transaction conflict, or the transaction exceeds TSX's (undocumented) capacity, then the current lock acquisition and the 3 following use the non-TXN code path. This means that once a lock falls off the elision path it needs several uncontended acquisitions before a transaction it will be attempted again. This seems quite conservative
- pthread_rwlock_rdlock -> pthread_rwlock_unlock with a successful transaction is about twice as fast as the non-TSX implementation under no contention, and massively better under contention