Software is a process, not a series of products: the failure of traditional metrics

(by Daniel S. Katz and Tom Honeyman)

There are a number of challenges in rewarding scholars for their work in software, including the fact that both software itself and the idea of it being of scholarly value are relatively new, particularly given the centuries of experience we have with journals and the Humboldtian university model.

Traditional scholarly research and metrics

Much scholarly research can be considered either hypothesis-driven or exploratory. These types of research projects typically have some starting point (e.g., is x true?, how does y work?) and some ending point (e.g., x is true, z is a key mechanism for how y works). This research is often done at least semi-privately, with the results being shared publicly in the form of a paper, perhaps with data or software also being shared in cases where this is possible.

Our scholarly metrics and recognition system is built to support this model, based on the idea that research is a cycle of periods of effort marked by outputs, with grants being given to support these periods of effort, and citation of these outputs being used to measure their value and reward the scholars for their efforts.

Each of these efforts typically has an identifiable set of scholars involved, and these scholars are usually named as the authors of the outputs. These outputs are stable, rarely revised, and don’t tend to add authors over time.

Imagine a new project that includes a subset of the scholars who worked on the previous effort on which it builds. This subset will likely be authors of the output (paper) about this new project, and the other scholars from the previous effort will be recognized by citing the paper from the previous project. Even if the new work could not have been done without their effort, if they haven’t been directly involved in it, they will not expect to be considered as authors.

Scholars, including students, may be expected to produce outputs on an annual cycle, in some cases being driven by annual conferences that in some sense are clubs of scholars in subdisciplines. This drives the idea of minimal scholarly units of work, i.e., the minimum amount of progress that must be made to lead to a new publication. The paper system discourages this type of publication, sometimes referred to as salami-slicing, but this is partially because such paper publication generally involves a single set of authors “gaming” the system to get multiple units of credit for one unit of work.

It is common practice in publications for the authors to fully identify themselves, to associate the work with author affiliations and funding that supports the work. This has allowed for the development of metrics based on this detail.

This idea is implicit in metrics such as the h-index, where a count is made of the units of effort that are sufficiently valuable to others that their outputs are cited by some number.

Software is different

However, software, particularly open source software, is different. It occurs in a more continuous process of development, and is also more open, where scholars who are not the main developers can use it, or can contribute to it at any point, not just when the main developers declare it finished.

Of course, most software that is in use is never “finished.” To avoid software collapse as underlying dependencies (both hardware and software) change, to fix bugs, and to add new features, software must be changed. These changes can occur multiple times per day, and every change could potentially be made by a new contributor. This can lead to many versions that may only have small differences from each other, though unlike papers, the set of authors can be different. 

Users need to cite the specific version they used to indicate precisely what they used, and credit its developers. To do so, amongst other things, full and robust identification of authors, affiliations, and funding sources is needed, though this information is rarely provided by developers.

In cases of open development, software can be associated with an open-ended community who come together in both a planned and ad hoc way to develop, maintain, use, and improve it, with social aspects related to both creation and usage, which can lead to new versions with new authors. This is unlike the typical paper where social aspects are only related to usage (reading) and the “response” is usually another paper.

Additionally, software is often used in an automated way, brought into an assemblage of software via package management. These “imports” can be mostly tracked, but are not part of the scholarly metrics system today.  And they are not tied to authors, since authorship is not tracked by package management systems or by revision control systems. (While repository contributions are tracked, these are not the same as authorship contributions, as some repository contributions, e.g., typo fixes, may not be sufficient for authorship, and some authorship contributions, e.g., design, may not appear in the repository.)

Need for new metrics and practices

Because of these differences, existing metrics that work for a small set of products (papers, many datasets) do not work for software. This means that the overall credit/recognition that has been built for papers, and that seems to be able to work for datasets, does not work for software, and that changes are needed, specifically around manually or semi-manually capturing authorship at the time of development within the code environment (perhaps via something like All Contributors as well as via citation metadata) and capturing software usage in other software through some automated means (perhaps via something like Libraries.io or deps.dev), as well as manually in papers, which could be supported by tracking and recording the provenance of the computational environment in which research is performed.

New metrics present an opportunity for the research software community to think about what behaviors should be incentivized in reaction to what have been incentivised for papers. For instance, the negative “salami slicing” in papers potentially corresponds to positive behavior for software development: greater modularity. The push for new and independent efforts in creating papers shouldn’t mean that’s what works for software: metrics that equally measure contributions and new independent software outputs remove the disincentive to contribute to an existing software effort instead of creating a new and independent one. Developing metrics is the first step towards gaining credit for this important contribution to science.

This post has also been published as https://doi.org/10.59350/xp7x6-05m24 on the Rouge Scholar.

Published by:

Daniel S. Katz

Chief Scientist at NCSA, Research Associate Professor in CS, ECE, and the iSchool at the University of Illinois Urbana-Champaign; works on systems and tools (aka cyberinfrastructure) and policy related to computational and data-enabled research, primarily in science and engineering

Categories RSETags , , , Leave a comment

Leave a comment