By too much big-O I mean using an algorithm with more computational complexity when one with less is available and the difference causes problems in production. A common problem is using an O(n^2) algorithm when O(nlgn) or O(n) are possible. By too much constant factor I mean an inefficient implementation. Everything else goes into the other bucket.
It is all about the constant factors was a reply to someone who dismissed performance concerns with it is just a constant factor. That still amuses me. I assume that too much big-O is the least likely cause of performance problems given my classification scheme. This is not a rigorous study but I wasn't able to find many too much big-O bugs from my bug history. The search is complicated because it isn't easy to search for O(n*n). What search terms do you use -- O(n*n), O(n^2), squared?
- Pagination is O(n*n) - this happened several times in production, so it counts as many bugs. Be careful with pagination queries.
- O(n*n) parsing in MySQL client - this was MySQL bug 34578
- O(n*n) allocation on array resize - Dan Luu explained an example of this. I have also encountered this but my memory is vague.
- O(n*n) common subexpression elimination - while working on a closed-source DBMS I fixed code that did CSE using an O(n*n) algorithm. It fell over on machine generated SQL that had thousands of terms in the WHERE clause.
- The InnoDB rw-lock and mutex so many years ago had O(n*n) overhead for n threads trying to get through them (lock/unlock) because of the way that wake-all on unlock was implemented. See MySQL bug 52806.
Perhaps shared_ptr deserves its own bucket. I wrote about finding shared_ptr perf bugs in MyRocks and MongoDB. I found at least one more in MyRocks, someone else reported one for RocksDB and I have seen a few more problems from this in production.
(InnoDB's MVCC has O(N^2) behaviors)
(Secondary index updates make consistent reads do O(N^2) undo page lookup)
Thank you anonDelete
This reminded me of MySQL bug 49047, reported by Harrison - deadlock detection could be O(n^2), which then got improved.ReplyDelete
Big-O is useful, but I agree, maybe overly relied-on. One algorithm might perform much better on real datasets, but have a worse big-O because of particular, unusual worst-case scenarios. There are so many factors besides worst-case analyses.
I like to point out my O(1) - for a fixed disk size - lookup algorithm, better than any index, apparently - scan the entire disk for your entry.
Whether you scan RAM or SSD or HDD is "just a constant factor" for performance, so who needs caches, buffers, pre-fetching, etc?