Evaluating research software proposals

Research software is an essential part of most research today, as I’ve written about before. One of the important models that allows people to develop and maintain it is grant funding: someone proposes building and/or maintaining software to a funder as an effort that aligns with the funder’s mission and programs. The funder, which could be a public agency, a philanthropic organization, a company, etc., then needs to evaluate the proposal, but how should this be done? What are current good/best practices?

This topic has come up in discussion in the Research Software Funders Forum over the past several months, where funders of research software come together and talk to each other. I’ve been a facilitator of this discussion, which has given me the opportunity to think about some of the points that are made, and I also spent four years co-running a research software funding program at the US National Science Foundation, so I have some of my own opinions as well. It has been a pleasure to hear about the efforts of various global funders in this area, including The Chan Zuckerberg Initiative’s Essential Open Source Software for Science (EOSS) program and FAPESP’s Research Program on eScience and Data Science. My thinking has also been influenced by a November workshop in Amsterdam hosted by the Research Software Alliance (ReSA) and the Netherlands eScience Center to set the future agenda for national and international funders to support sustainable research software, and a preceding workshop organized by Tom Honeyman of the Australian Research Data Commons, as well as many talks and discussion in those two meetings, in particular talks by Carole Goble, Fabio Kon, and Rob van Nieuwpoort.

This post is not intended to solve this challenge, as I’m not a funder today (except in a limited way through some internal opportunities at my own institution, the National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana Champaign), but rather to propose a set of aspects that funders might choose to collectively address in the future.

Note: a large issue I’m mostly going to ignore is that software is an element of many proposals. While some proposals are explicitly aimed at software, a large fraction of other proposals also include some software development, and this may or may not be made explicit and/or highlighted in the proposal, depending on if the proposers think it is important enough (given typical space limits in proposals) and also based on how the proposers think software work will be seen in the evaluation process.

Evaluation issues

As stated previously, there are a variety of practices to evaluate proposals that involve research software, and they have multiple aspects that we can examine.

One aspect is simply how they compare with research review practices outside of software. For example, peer review is a useful input into making funding decisions, though some funders may rely more on internal expertise. There are a variety of ways in which such peer reviews can be performed, including asynchronously and synchronously, which can be virtual, hybrid, or in-person. Proposals can be reviewed independently or in groups (panels). Other variables include the number of reviewers per proposal and per group and their expertise, and for opportunities that continue over multiple years, if reviewers have any continuity with such opportunities or specific panels. Each of these options may have different advantages and disadvantages for software proposals than other types of proposals.

In terms of reviewer expertise, in some sense most research software projects can be viewed as multidisciplinary, including at least one subject area and a set of software skills. Of course, many projects include multiple subject areas as well, and they may include a subject area in which the software is developed, plus more areas where the software can be used, such as computer science, mathematical, or statistical software that can have wide applicability. Review processes would benefit from reviewers who collectively have all relevant sets of expertise: software development, the subject area(s) in which the software is being developed, and the subject area(s) in which the software will be used.

Another aspect is review criteria, which may need to be reconsidered or at least thought of differently for software than for research, in part, because software is not purely research but is also infrastructure. As infrastructure, while it may have scholarly novelty, it also may have more inherent utility than novelty, and this is not typically something that research funders and peer reviews select for. At NSF, I encouraged reviewers to think about the intellectual merit review criterion in terms of how the software would lead others to generate new knowledge, in addition to considering how the software itself contained new knowledge, and this was incorporated into guidance for the SI2 program, though reviewers and program officers could choose how to take this..

The model of much research, where the research is done, and the results are captured in an artifact such as a publication or in human knowledge, does not really fit software, where software can be both the research and the knowledge artifact. Additionally, unlike publications, software as an executable object is subject to collapse, where it will stop working if not actively maintained. For a software knowledge artifact to be most useful after a research project ends, there must be some plan for how it will be sustained. This concept has been considered by some funders who now require data management plans (DMPs), but software is not just data, and software proposals probably need either additional software management plans (SMPs), as are now being discussed, or more general outputs/artifact management plans (which could also be a way of thinking about open access publications). 

In terms of the technical review of software itself, rOpenSci has defined a set of software peer review practices, which have been adapted by a set of software journals, including the Journal of Open Source Software, the Journal of Open Research Software, and SoftwareX, as well as other communities such as pyOpenSci. This is leading to a community consensus on how to review software. However, which of these practices and criteria make sense to embed in a software proposal review process is currently unclear.

Next steps

My goal is that some of the research funders working in this space will start meeting to discuss these issues, ideally working towards a set of practices for the evaluation of research software proposals, with guidance to understand when and why different practices fit a particular situation. I also think that members of the research software community (developers, maintainers, and users) would be good contributors to such a process. One option for this type of community gathering would be a ReSA working group. There also might be value in a more intense meeting on this topic, whether physical or not, and of course, both a working group and a meeting could occur and benefit each other. If you’re interested in being part of such a group then please let me know.

Next steps

Thanks to Michelle Barker, Tom Honeyman, and Rajiv Ramnath for useful comments on an earlier draft of this post.

This post has also been published as https://doi.org/10.59350/vvhxf-m3319 on the Rouge Scholar.

Published by:

Daniel S. Katz

Chief Scientist at NCSA, Research Associate Professor in CS, ECE, and the iSchool at the University of Illinois Urbana-Champaign; works on systems and tools (aka cyberinfrastructure) and policy related to computational and data-enabled research, primarily in science and engineering

Categories RSETags , , Leave a comment

Leave a comment