In response to Doug Thain’s question:
What currently-available tools do you recommend for enabling reproducible scientific computing? Is there a tool that we ought to have, but do not?
P.S. I am using “reproducibility” as an easy shorthand for re-usability, re-creation, verification, and related tasks that have already seen some discussion. Please interpret the question broadly.
I think the most important thing we need is an electronic lab notebook that really allows us to go back and understand exactly what we did, repeat it, modify it, record it, etc. If you accept this, it leads to a number of points:
- Why don’t more people (including me) do this now?
- What tool(s) should we use?
- How should this integrate into the publication process (for papers, software, data, etc.)
Tools I’ve seen that I’ve liked include VizTrails for data exploration and visualization, and Project Jupyter for a lot of other things. And software version control (e.g., Git) is also part of the answer, most likely. While these independent tools have their strengths and weaknesses, I don’t think they really fit the bill.
If we did have something that really was more an automated work-tracking notebook, we could use it to help us with publications as well. For example, in my idea of transitive credit, if we are going to decide what products contributed to a new product, a starting point is the the products we used during the creation of the new product, which could be created by such a notebook. Or we could track our reading list in the period leading towards a new paper as a starting point for the papers we should reference.
(note: this is a crossposted blog from http://reproduciblescience.blogspot.com/2015/07/the-need-for-notebooks.html)