Wednesday, October 30, 2019

USL, universal scalability law, is good to know

The USL is worth understanding. USL is short for universal scalability law and was created by Neil Gunther for use in capacity planning. I don't do much capacity planning but the USL is also valuable for explaining performance. Performance problems for the old InnoDB rw-lock (circa 2007) would have been easy to describe via the USL because "wake all" on unlock is an N2 overhead -- see the 𝛽 parameter in the USL.

A longer version of the USL is here. The USL predicts how throughput will change (increase or decrease) with concurrency. One version of the formula where X(i) is throughput with i concurrent clients is below.

                 X(1) * N
     X(N) =  -------------------
             1 + α(N-1) + ꞵN(N-1)

The denominator determines how throughput changes with concurrency. When it is one then there is linear scaleup. The best way to get linear scaleup is to cheat and choose an easy base case but otherwise the denominator is greater than 1 and the USL explains less than linear scaleup. The components in the denominator represent the overhead from contention and coherency. The alpha term represents contention and the beta term represents coherency.

I write above that "wake all" on unlock has an N2 overhead. By this I mean that when N threads are trying to get through a synchronization object one at a time and all waiters wake on unlock then there are O(N2) wakeups -- (N-1) + (N-2) + (N-3) + ... + 3 + 2 + 1. The beta term in the denominator represents the coherency overhead and 
has an N2 term as 𝛽N(N-1) = 𝛽N2-𝛽N.

Amdahl's Law

The long post states that when B=0 and X(1)=1 then the USL is reduced to Amdahl's Law. I will derive the math for that here. The key point is that α=(1-p) where p is the fraction of the workload that is parallel in Amdahl. 

# Amdahl's law where p is the fraction of the workload
# that is parallel, N is same as for USL
                    1
     Speedup = -----------
               (1-p) + p/N

# Start with USL
                      N
     X(N) =  -------------------
             1 + α(N-1) + ꞵN(N-1)

# Update for X(1)=1, ꞵ=0, α=(1-p)
                 N
     X(N) =  ----------
             1 + (1-p)(N-1)

# Algebra for denominator, expand and then reduce
                 N
     X(N) =  ----------
             N - pN + p

# Multiply numerator and denominator by 1/N
                 1
     X(N) =  ----------
             (1 - p) + p/N

# Now it looks like Amdahl where α=(1-p)

Predicted max throughput

The long post states that the max throughput occurs at sqrt((1-α)/). That is the value of N for which X(N)' = 0. I won't derive it here but X(N)' is:
                  1 - 
α - N2
     X(N)' = ---------------------
             (1+α(N-1) + N(N-1))2

Then X(N)'=0 when the numerator is zero and that happens when N=sqrt((1-α)/). That determines a critical point for X(N) which is either a min or max. In this case it is a max. I assume that X(N) is concave without deriving X(N)''.

No comments:

Post a Comment

Evaluating vector indexes in MariaDB and pgvector: part 2

This post has results from the ann-benchmarks with the   fashion-mnist-784-euclidean  dataset for MariaDB and Postgres (pgvector) with conc...