Tuesday, January 7, 2020

It is all about the constant factors

Assuming I put performance bugs I worked on into one of three buckets -- too much big-O, too much constant factor and other -- then too much big-O would have the fewest entries. Maybe we worry too much about big-O and not enough about everything else? This post is inspired by a post and tweet by Dan Luu.

By too much big-O I mean using an algorithm with more computational complexity when one with less is available and the difference causes problems in production. A common problem is using an O(n^2) algorithm when O(nlgn) or O(n) are possible. By too much constant factor I mean an inefficient implementation. Everything else goes into the other bucket.

It is all about the constant factors was a reply to someone who dismissed performance concerns with it is just a constant factor. That still amuses me. I assume that too much big-O is the least likely cause of performance problems given my classification scheme. This is not a rigorous study but I wasn't able to find many too much big-O bugs from my bug history. The search is complicated because it isn't easy to search for O(n*n). What search terms do you use -- O(n*n), O(n^2), squared?
  1. Pagination is O(n*n) - this happened several times in production, so it counts as many bugs. Be careful with pagination queries.
  2. O(n*n) parsing in MySQL client - this was MySQL bug 34578
  3. O(n*n) allocation on array resize - Dan Luu explained an example of this. I have also encountered this but my memory is vague.
  4. O(n*n) common subexpression elimination - while working on a closed-source DBMS I fixed code that did CSE using an O(n*n) algorithm. It fell over on machine generated SQL that had thousands of terms in the WHERE clause.
  5. The InnoDB rw-lock and mutex so many years ago had O(n*n) overhead for n threads trying to get through them (lock/unlock) because of the way that wake-all on unlock was implemented. See MySQL bug 52806.
Perhaps shared_ptr deserves its own bucket. I wrote about finding shared_ptr perf bugs in MyRocks and MongoDB. I found at least one more in MyRocks, someone else reported one for RocksDB and I have seen a few more problems from this in production.

3 comments:

  1. https://bugs.mysql.com/bug.php?id=84958
    (InnoDB's MVCC has O(N^2) behaviors)

    https://bugs.mysql.com/bug.php?id=67044
    (Secondary index updates make consistent reads do O(N^2) undo page lookup)

    ReplyDelete
  2. This reminded me of MySQL bug 49047, reported by Harrison - deadlock detection could be O(n^2), which then got improved.

    Big-O is useful, but I agree, maybe overly relied-on. One algorithm might perform much better on real datasets, but have a worse big-O because of particular, unusual worst-case scenarios. There are so many factors besides worst-case analyses.

    I like to point out my O(1) - for a fixed disk size - lookup algorithm, better than any index, apparently - scan the entire disk for your entry.

    Whether you scan RAM or SSD or HDD is "just a constant factor" for performance, so who needs caches, buffers, pre-fetching, etc?

    ReplyDelete

Battle of the Mallocators

If you use RocksDB and want to avoid OOM then use jemalloc or tcmalloc and avoid glibc malloc. That was true in 2015 and remains true in 202...