This is an expansion of a comment on Titus Brown’s blog post “Please destroy this software after publication. kthxbye.” which talked about how much work should go into software that was used in a submitted paper.
In the past, I’ve thought of software as having one of two different purposes:
- Some software is just written for a single research purpose – this can be quick and dirty, as long as it does the immediate job.
- Other software is intended for reuse, particularly by a community – this has to be of higher quality (documentation, unit tests, etc.) if the community is going to use it.
I’ve gradually realized that there’s almost never just a single research purpose for software; almost all will be used again in the future. If nothing else, it’s likely that the researcher will go back to the code at some future point, either to check something, to reuse it, or to build on it, at which time, having written better code would help. And if the results are going to be published (even as negative results), others may also want to review or run or modify the code. (Note that software quality is the important point here, not software sharing.)
So maybe quick and dirty code is never acceptable? If we want to avoid or improve such code, we need to add some time and effort into the research workflow, and in the short term at least, less research will be done. One important question is if more research will be done in the long run under this concept. I think this needs to be decided by the overall scientific community to set the next generation of standards of practice. But certainly having people push on both sides so we can see the results will sway the community.
Personally, I think we need to go ahead and make the effort to write better code. And by “we”, I mean the original code developer(s).
Having others (for example, university-supported computational scientists in computing or software or data centers, possibly also called coding cores) do this is unlikely to end well for anyone. As a computational scientist at JPL in the mid 1990s, I learned that if I modified a group’s code without them involved, even if this was the job they and I agree I would do, it the code then became my code rather than their code, which wasn’t good for me or them. When it was time for them to further develop the code, they went back to “their” version and ignored “my” version.
Because computing is now an integral part of every aspect of science, but most scientists are never taught how to build, use, validate, and share software well. As a result, many spend hours or days doing things badly that could be done well in just a few minutes. Our goal is to change that so that scientists can spend less time wrestling with software and more time doing useful research.
If this goal was achieved, that all scientific software was better written with “a few minutes” of extra effort, it would likely lower the effort:reward ratio enough that we would demand higher quality code. We might still (falsely in my current opinion) imagine there was code written for a single research purpose, and still think quick and dirty code was ok, but the quick would be slightly less quick than now, and the dirty would be much less dirty than now. The science community would be in a better place than we are now.
Some work by the author was supported by the National Science Foundation (NSF) while working at the Foundation; any opinion, finding, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the NSF.