Saturday, November 16, 2019

My theory on technical debt and OSS

This is a hypothesis, perhaps it is true. I am biased given that I spent 15 years acknowledging tech debt on a vendor-based project (MySQL) and not much time on community-based projects.

My theory on tech debt and OSS is that there is more acknowledgement of tech debt in vendor-based projects than community-based ones. This is an advantage for vendor-based projects assuming you are more likely to fix acknowledged problems. Of course there are other advantages for community-based projects.

I think there is more acknowledgement of tech debt in vendor-based projects because the community is criticizing someone else's effort rather than their own. This is human nature, even if the effect and behavior aren't always kind. I spent many years marketing bugs that needed to be fixed -- along with many years  leading teams working to fix those bugs.

4 comments:

  1. I agree that vendors acknowledge debt more. I think one of the reasons is because in OSS it's harder to even realize that tech debt is happening, and I think a lot of that can be viewed through the lens of USL to describe information and throughput in human networks. The cohesion costs are usually far far higher in OSS, as the organizational structures are often much more distributed. Our stateful systems are often quite complex, exacerbating the amount of information required to resolve issues in a low-tech-debt creating way. There are a few stages of tech debt, all of which are more drastic in OSS compared to in more closely coordinated teams:

    1. birth - often happens when work is performed without a thorough perspective of the overall system architecture or long-term goals
    2. pain - when working in impacted areas, some things might feel hard. it can take a while before this turns into certainty that there is actually a problem, or if the specific problems encountered are the result of trade-offs that somebody with more information than you made
    3. acknowledgement - at some point, people are able to really agree that there even is a problem. this happens more quickly when collaborators are in more direct communication with each other. Knowing an issue impacts people other than you increases the chances of an escalation of the issue, hopefully expediting its resolution.
    4. death - resources are allocated to understand and fix the issue

    I think that this cycle can be low in some OSS situations. A lot of open source stateful systems have a really dramatic contribution skew towards a single author (minimized cohesion costs). The cycle can also be shorter when significant coordination efforts happen, but this tends not to be something most volunteers are interested in taking on.

    ReplyDelete
    Replies
    1. Thank you for an interesting comment. I have ignored pain and delayed fixing tech debt a few times. After moving from Google to FB I was slow in porting the patch for crash-safe replication state and my excuse was that the operations team wasn't yelling loudly enough.

      Delete
  2. A lot of tech debt is about human's ability to understand and navigate code, locate possible source of error, etc. I. e. if there is a person who "owns" the project and works on it regularly, a lot of debt might accrue without them realising (or realising to much lesser degree).

    "Vendor-based" = "decently financed" = more than one person works on every subsystem, + there may be some turnover => generally requires to keep debt below certain level, just in order to be able to support the system anywhere effectively.

    There are little truly community-based projects, under anywhere active development. I wonder if you consider Linux kernel, Spark, and kubernetes vendor-based or community-based, in your hypothesis.

    ReplyDelete
    Replies
    1. I don't know much about Spark and k8s development. I also don't know much about Linux development but would call it community-based for the tech debt theory.

      Some of this theory was inspired by the practice of web-scale companies giving talks about various Hadoop projects then becoming quiet when said projects didn't turn out so well.

      Delete

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...