Monday, November 18, 2019

Always be upgrading

On the production DBMS side I want to provide a high QoS and a stable environment helps to achieve that. Upgrades are done slowly with much testing. Upgrades include moving to a new compiler toolchain or new library version.

On the security side people don't want libraries with known security problems to be used in production. This leads to a push to always be upgrading (ABU) and this is justified when the expected value of ABU exceeds the expected cost. Alas this will be hard to figure out. I assume the expected cost can be estimated. I am skeptical there can be an estimate of the expected value.

The expected value of ABU is:
cost(security-event) X probability(security-event)
I assume it is possible to estimate the cost of a security event but will be much harder to estimate the probability. It would help if more vendors were to publish full incident reports. My guess is that old library version won't be near the top of the list of root causes, but this is not my area of expertise.

The cost of ABU is easier to quantify -- count debugging time, testing time and down time from all teams that will be upgrading frequently. Examples of things that can go wrong include Postgres depends on glibc locale, ext-4 performance changes for O_DIRECT and many more, but I will be lazy and many of the problems I experienced weren't reported in public.

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...