Sep 30, 2012

Science = Prediction

Computer Science has had an identity crisis since its very inception. What exactly constitutes it? Is it a science?

Here is Denning¹:

Science, engineering, and mathematics combine into a unique and potent blend in our field. Some of our activities are primarily science—for example, experimental algorithms, experimental computer science, and computational science. Some are primarily engineering—for example, design, development, software engineering, and computer engineering. Some are primarily mathematics—for example, computational complexity, mathematical software, and numerical analysis. But most are combinations.

But Vint Cerf, the recently appointed president of the Association for Computing Machinery, gets the last punch²:

In the physical world, science is largely about models, measurement, predictions, and validation. Our ability to predict likely outcomes based on models is fundamental to the most central notions of the scientific method. The term “computer science” raises expectations, at least to my mind, of an ability to define models and to make predictions about the behavior of computers and computing systems… When we write a piece of software, do we have the ability to predict how many mistakes we have made (that is, bugs)? Do we know how long it will take to find and fix them? Do we know how many new bugs our fixes will create? Can we say anything concrete about vulnerability? What about the probability of exploitation?

In other words, you can’t call your field a science until it gives you predictive power over your world.

At the beginning of a complex software project, the margin of error in predicting completion time and resources required is so large as to be practically useless. This cone of uncertainty decreases as the project progresses. The only way out is through.

Estimating (i.e. prediction) is the dirty secret of the computing industry. Everyone wants them. Everyone provides them. Nobody attaches probabilities to them. And absolutely nobody believes them. Reported estimates are often padded by a factor of two (or more) for planning downstream activities, and even that isn’t enough.

I am pessimistic about the endeavor of bringing scientific estimation to software development, at least, in an a priori manner. It might be possible to build post-hoc models based on history. For example, by the tenth time you build the same thing, you can provide a good estimate for the time and resources required.

I am pessimistic because the bulk of uncertainty in software development comes from human complexity. If the world and behavior of humans was governed by principled laws, and if they all used the same language, then software in such a world would be simple and predictable. But the world of humans is complex and messy and ureasonable, and software that deals with them also must be.