Sep 12, 2011

Smeed's Law for Programming

Smeed’s law is an empirical relationship that predicts the number of deaths in traffic accidents in a country, normalized to the number of vehicles in it. There are two astounding things about this law. Firstly, the only variable used in it is the number of vehicles per capita. Secondly, the death rate increases sub-linearly with this variable.

Here, D = number of deaths, N = number of vehicles, P = population.

(Graph taken from this paper¹.)

Note some of the things that one might think have an impact on traffic fatalities that according to Smeed’s law do not: better traffic signage, traffic rules, more stringent enforcement of such rules, mass adoption of safety equipment etc. According to Smeed’s law—which has held up to a surprising degree of accuracy, across a large number of countries, across long time spans—none of that matters. The only thing that does matter is the number of vehicles per capita, which can be looked upon as a proxy for how much the population at large is exposed to traffic. It is particularly telling how the 1966 vehicle safety legislation had no statistical effect²:

Most of the vehicle safety standards adopted in the US and 1966 have been implemented worldwide… Yet, when (modern) vehicles are driven in Third World countries, they achieve kill rates per vehicle as high or higher than those achieved in Britain and the US in the early years of this century with model T Fords.

Another way to look at this is that we all have a fixed appetite for risk, and we revert to it. This is called risk homeostatis ³⁴, or risk compensation. When we reduce risk in some areas, we tend to take on more risk in others. For example, a study showed that automatic braking systems(ABS) did not reduce the number of accidents, because drivers with ABS tended to drive faster and closer to other cars. Another study showed that drivers drove more carefully around non-helmeted cyclists than helmeted ones.

And now that I’ve talked about some hard data and rigorous research, I’m going to venture into the territory of pure anecdote and speculation.

Here’s the thing: I’ve noticed something similar to Smeed’s law when it comes to programming. More specifically, large-scale programming, where large teams with lots of flux build large systems, over long periods of time.

Let’s use bugs per line of code as a parallel to deaths per vehicle. As a system grows and matures, the number of bugs in it per line of code decreases. In other words, the total number of bugs increases sub-linearly with lines of code.

Of course, I have no hard data to back this up, but think about the alternative for a minute: that the number of bugs increases at least linearly, or possibly super-linearly with lines of code. This would make it practically impossible to build any large system.

Also, factors that you would think have a big impact on how many bugs are created, such as the choice of programming language, end up having little or no effect. You would think that languages like C or C++, where one has to manually manage memory, and it’s very easy to write code with buffer overruns, would have a propensity for more bugs than languages like Java, which have strong memory safety and garbage collection. Programmers get used to the “risk profile” of each language, and adjust their behavior accordingly. For example, a C programmer is hypervigilant about buffer overflows, and that balances out the language’s lack of safeguards. On the flip side, Java programmers “save” the risk of buffer overruns and other memory errors, but they “spend” that saved risk in other places, such as complicated frameworks and deep class hierarchies.

Does your experience as a programmer support or contradict my conjecture?

John G. U. Adams, Smeed’s Law: Some further thoughts. ↩︎
http://en.wikipedia.org/wiki/National_Traffic_and_Motor_Vehicle_Safety_Act ↩︎
Gerald J. S. Wilde, Risk homeostasis theory: an overview ↩︎
http://en.wikipedia.org/wiki/Risk_homeostasis ↩︎