Scalable computational reproducibility

While on a call today for the ACM Emerging Interest Group on Reproducibility and Replicability, I realized that for computational reproducibility to become pervasive, we need to solve a scalability problem. Today, a lot of computational reproducibility is checked by hand, whether this is in terms of artifact evaluation in a conference or a paper, or with post-publication checks.

I was reminded of fault tolerance and how some algorithms can be checked after they are run to determine if they ran correctly, at least with respect to transient errors. The highest cost way of doing such a check is to simply run the algorithm a second time to see if the same results are returned. This is 100% overhead. In many cases, however, there are lower cost checks that are possible (thus with much lower overhead) such as result checking, derived from algorithm based fault tolerance (ABFT). I was involved in some work at JPL in this area around the turn of the century, for example, where we checked an O(n2) matrix mutliplication with an O(n) check. In this work, another key element was being able to know whether a calculation was correct, subject to both numeric and algorithmic noise.

The similarity here is that hand-checking of reproducibility is similar to redundant calculations. If we were to do this for all calculations, this would have a huge cost in terms of our time. If on the other hand, we could build low overhead methods to reproduce our work, this would be an incentive for increased reproducibility.

This seems to me to call for automated reproducibility checking, similar to how continuous integration works for software development. This is not a new idea, nor is it my idea. The first time I heard about this was, I think, in James Howison’s “Retract bit-rotten publications: Aligning incentives for sustaining scientific software” at WSSSPE2. I think it’s worth revisiting and republicizing this idea.

Open questions

  • What are the pros and cons of automated reproducibility checking versus the hand performed reproducibility checking we are doing today?
  • What infrastructure would we need to make this pervasive?
  • How would this impact common peer-review practices?
  • Would this help make progress toward the larger goal of better research, or would this instead lead to emphasis on one element that doesn’t really help overall?

Published by:

danielskatz

Chief Scientist at NCSA, Research Associate Professor in CS, ECE, and the iSchool at the University of Illinois Urbana-Champaign; works on systems and tools (aka cyberinfrastructure) and policy related to computational and data-enabled research, primarily in science and engineering

Categories UncategorizedTags , Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s