Scientific Software Quality versus Purpose?

This is an expansion of a comment on Titus Brown’s blog post “Please destroy this software after publication. kthxbye.” which talked about how much work should go into software that was used in a submitted paper.

In the past, I’ve thought of software as having one of two different purposes:

  1. Some software is just written for a single research purpose – this can be quick and dirty, as long as it does the immediate job.
  2. Other software is intended for reuse, particularly by a community – this has to be of higher quality (documentation, unit tests, etc.) if the community is going to use it.

I’ve gradually realized that there’s almost never just a single research purpose for software; almost all will be used again in the future. If nothing else, it’s likely that the researcher will go back to the code at some future point, either to check something, to reuse it, or to build on it, at which time, having written better code would help. And if the results are going to be published (even as negative results), others may also want to review or run or modify the code. (Note that software quality is the important point here, not software sharing.)

So maybe quick and dirty code is never acceptable? If we want to avoid or improve such code, we need to add some time and effort into the research workflow, and in the short term at least, less research will be done. One important question is if more research will be done in the long run under this concept. I think this needs to be decided by the overall scientific community to set the next generation of standards of practice. But certainly having people push on both sides so we can see the results will sway the community.

Personally, I think we need to go ahead and make the effort to write better code.  And by “we”, I mean the original code developer(s).

Having others (for example, university-supported computational scientists in computing or software or data centers, possibly also called coding cores) do this is unlikely to end well for anyone. As a computational scientist at JPL in the mid 1990s, I learned that if I modified a group’s code without them involved, even if this was the job they and I agree I would do, it the code then became my code rather than their code, which wasn’t good for me or them.  When it was time for them to further develop the code, they went back to “their” version and ignored “my” version.

An important question is how much extra effort is needed.  I think I am partially agreeing with answer by Software Carpentry to the question about why it exists:

Because computing is now an integral part of every aspect of science, but most scientists are never taught how to build, use, validate, and share software well. As a result, many spend hours or days doing things badly that could be done well in just a few minutes. Our goal is to change that so that scientists can spend less time wrestling with software and more time doing useful research.

If this goal was achieved, that all scientific software was better written with “a few minutes” of extra effort, it would likely lower the effort:reward ratio enough that we would demand higher quality code.  We might still (falsely in my current opinion) imagine there was code written for a single research purpose, and still think quick and dirty code was ok, but the quick would be slightly less quick than now, and the dirty would be much less dirty than now.  The science community would be in a better place than we are now.

Disclaimer

Some work by the author was supported by the National Science Foundation (NSF) while working at the Foundation; any opinion, finding, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the NSF.

6 thoughts on “Scientific Software Quality versus Purpose?”

  1. quick-and-dirty is rarely a goal, but usually a stage of the development process. One can consider it as prototyping. If that prototype turns out to be moderately useful, then it is worth investing effort in better code, even if that does not advance what the code actually does. If it does not turn out to be useful, then that is ok, too — we learn and move on. The dead code may survive in a niche for a while, but is ultimately doomed.

    It becomes complicated if code is abandoned, but is also used, possibly outside its intented niche. Given the flux of students and contributors in academia, this happens way too often, and causes pain…

    My $0.02 🙂

    Like

  2. A few thoughts come to mind:

    1) It’s simply good practice to habitually write code with both sharing and reuse in mind. I think of it as being a little like exercising regularly; if you haven’t been doing it regularly, it’s hard to get in the habit. But once you are doing it regularly, it’s much easier, and doesn’t feel like nearly as much of a burden.

    2) It can be a big problem if not everyone in a research group agrees on how reusable a piece of code is meant to be. Often a researcher will deliver a one-off prototype that works as a proof-of-concept, then suddenly move on to a new problem (or make a career change, or even retire). If their collaborators (or funders, or students) expected to get a more tangible result, in the form of reusable code, those people might feel cheated, and/or they may be tempted to adapt the low-quality prototype without a proper rewrite.

    3) If your code doesn’t meet certain standards of readability and testing, that casts some doubt upon whether or not it works correctly at all. A routine that is complex probably needs to have some detailed testing and review just to be reasonably sure that there are no glaring coding errors. If a routine is simple enough that it is “obvious” that it works without testing, it is probably also generic enough that a reusable version should be written, or more likely already has been. (Also, almost no code belongs in the latter category. Reading code is the best way of catching bugs, but humans are just not very good at catching certain bugs without testing.)

    There is something concerning about writing a “single-use” program, testing it only in a very broad way, and publishing a paper on the prototype, even if the program is never used again. Tests that work on a whole program, without detailed unit testing or at least component testing, are likely to only find easily recognizable problems. However, it does sometimes happen that a coding error actually makes a program seem *better* than it really is. (For instance, if a systematic bias from a coding error partially cancels the bias from a numerical method in a particular test case. Or if an algorithm seems fast, but only because it fails to handle an important edge case.) This is concerning because weak testing can weed out obvious problems that would soon be noticed anyway, while retaining more subtle errors that end up *increasing* the chance that faulty code will be emulated, copied, or reused.

    Like

    1. Maybe a simpler way of putting that last point is that there are two “benefits” to having weak testing. First, you spend less time on testing. Second, the errors that you miss are disproportionately likely to make your program seem more effective, or its output more interesting, which can mean more publishable results. This creates a perverse incentive, similar to how it can be beneficial to one’s career to use sloppy statistical methods if they tend to produce publishable (though false) positive results.

      (Of course, there are obviously downsides, like using a program that is hard to debug, hard to reuse for future work, and, well, broken. But in the short term, failing to catch bugs still has the potential to provide a perverse payoff.)

      Like

Leave a comment