Jan 21, 2012

Repositories of science

The first scientific paper with more than 1000 authors (1055, to be precise) was published recently¹. It documented the first results from the large hadron collider.²

I think a paper in a journal is an antiquated way to desseminate results this complex, created by teams this big.

There is a precedent for a system that can track, manage and publish the work of large teams on a single, though complex, artifact – source control. Particularly, distributed source control systems such as git.

Consider an alternative publishing model based on source control that works as follows:

all artifacts for the project (or at least, those that can be digitized) must be checked in. Documents. Scripts. Data analysis. Raw data from experiments. Lab notes. Diagrams. Everything.
every “commit” is subject to rigorous review by your peers. For day-to-day commits this will usually be your immediate team members. For more substantial “commits” or results, external reviewers may be called.

Compared to the existing model of writing papers and submitting them to journals, this model has a number of advantages:

the progress of the project can be accurately traced over time, by looking at the history of commits.
questions about results are easily answered. “How did you arrive at this table?” “Let me go look at the commit.” There are no throwaway scripts lying around in people’s home directories that are lost. Every analysis is reproducible.
individual contributions are clear. Just look at a person’s commit history.
the process of producing scientific results can be clearly seen. With journal papers, one could only see the end result of a long series of experiments and analysis, rarely getting a glimpse into the meandering path which brought the investigators there. By looking at not just the commits, but the comments made during their review, one can get a deeper insight into that path.
collaboration across sites and universities is simplified. Ideally you wouldn’t even need permission. Repositories would be world-readable and you could simply “pull” from them. If you spot a problem or have a contribution to make, you don’t even need to be part of that group. Pull, fix, then request a review for your “patch”.

What would a “paper” look like in this world? It would be a high-level journey explaining the big picture of the repository in question, linking copiously to artifacts in it. Almost like a top-level README. The curious would be free to dig into the commits or series of commits they’re more interested in.

What would a “journal” look like in this world? It would be nothing but a link aggregator, pointing to repositories of projects and experiments the editors deem worthy. You could cut editors out altogether, and move to a voting model like Hacker News or Reddit.

First proton–proton collisions at the LHC as observed with the ALICE detector ↩︎
It would be an interesting back-story to see how the line for authorship was drawn. Certainly there were many people who contributed who didn’t make it into the author list. ↩︎