As some may know, the first image of a black hole was announced April 10. This quickly led to a lot of different institutions explaining how they were involved (e.g., my own University of Illinois), as well as a bunch of software projects explaining how their software was used (e.g., Matplotlib).
Those of us concerned with open source research software and its sustainability are trying to raise the profile of such software in research. We can do this through software citation, though this requires that researchers cite the software that they use, which is a cultural change that is taking time to happen, in part because it depends on authors, journal policies, reviewers, etc.
While for this particular announcement, the journal article does include and cite the software that was used, it only discusses the software concepts/packages, not the specific versions of software used, and the citations for each software package listed are to old papers about the software, in one case written 16 years ago, which likely means that recent contributors to the software are not being credited for their work and contributions to this discovery.
This discussion about the software used quickly led to a discussion about funding for that software, or the lack of it. While participating in this discussion, and in part based on an email from Neil Chue Hong, I realized that there is a step we can take that would help both the culture change towards citation of the specific software that is used in research, and its funding.
Funding agencies should require that their grantees’ annual reports list the research software they used.
This would create data that could then be used by the agencies to make sure those software projects were funded. And it would get researchers used to collecting this data and reporting on it.
One might think that only open source software need be reported, because closed source (often commercial software) does not need funding, and this is often true, but reporting on both open source and closed source is needed for the culture change of citing all research software.
Some might ask why agencies should make this data request in the award process rather than in the proposal process. One reason is that the software actually used may change between the proposal and when the work is complete. Another reason is that the culture change I think we want is to track software used in research and cite it in reports, in papers, etc.
There are some potential concerns with this:
- It would be more work for grantees to collect and submit this data for their projects, and additional non-research paperwork is never welcome.
- Funding agencies would have to act on this data, which is never easy.
- Software used by other software would be missed.
But this collection of data would be evidence that could be used to push for analysis and funding. For example, at NSF, the Office of Advanced Cyberinfrastructure (OAC) could use this data to help other NSF divisions understand the software that their researchers use, which would help those other divisions understand the overall impact of research software, and could lead to increased funds for such software, either collectively led by OAC, or by each division individually.
And multiple funding agencies could work together and combine their data, perhaps publicly or perhaps behind some privacy/access agreements as happens with other types of sensitive data today, so that both they and individual researchers could mine it to understand more about the role of software in research, both generically and specifically.
While software dependencies would be missed by this, and these underlying libraries and packages are already at a disadvantage in terms of funding, I think that the cultural changes brought about by software reporting to funding agencies would be tremendous step forward overall, and that we could then move towards better understanding of dependencies either in parallel or later, via implementing transitive credit or through some other method.