The 2018 European Commission report, Turning FAIR into Reality, concludes that FAIR digital objects (including software) need to be supported by metrics, incentives, skills and FAIR services that provide persistent identifiers, metadata specifications, stewardship and repositories, actionable policies and Output Management Plans.
This has made me think about the role of software repositories in encouraging FAIR software.
For data (the subject of the original FAIR principles), the role of a repository is to store data and associated metadata, and in some sense, to add an external enforcement mechanism that requires the metadata associated with the data to be recorded. This helps enable parts of FAIR: Findability, as the metadata makes the data properties searchable, and the repositories provide a set of places that can be searched; Accessibility, as the repositories provide archival storage of the data and metadata, as well as a means to retrieve them; and Reusability, because once you have the data, you can read and use it. Note that for data, read and use are closely related properties. While there may be algorithms or software needed to do both, what’s important is that those algorithms or software do both.
However, when we consider software as the research object rather than data, reading it and using it are quite different. Well-written source code can be preserved and read, which allows the reader to see its intent and algorithms. However, using such source code can be more challenging, as doing so often depends on the hardware and software environment, as well as software dependencies. The problem is software collapse, the fact that software eventually stops working if is not actively maintained due to changes in environment and dependencies. On the other hand, there is also software that can be executed but cannot be read, such as executables and services.
I’m going to consider containers a packaging technology rather than a solution, since they can fix the environment and the dependencies, but if there are bugs in either that have been discovered and fixed, reuse of the software (as opposed to simply re-executing it) still has the same software collapse problem.
In other words, we can preserve source code for reading, and we can preserve executable software for re-execution, but we cannot simply preserve software for reuse. For reuse, we need to sustain the software: to supply human effort that continually works on the software to adapt it as needed for changes in the environment and dependencies.
This implies that preservation/archival repositories are not a complete solution to software reuse, while they can be data reuse, at least for static data sets. For software reuse, we also need an up-to-date working version of the software, which for open source likely comes from well-maintained software on a software development platform (such as GitHub or GitLab), and for closed source software likely comes from a vendor.
I believe this means that FAIR for software cannot simply depend on the infrastructure and environment being built to support FAIR data, specifically the existing repositories. This is one of the reasons we need a separate FAIR for research software (FAIR4RS) activity, and I hope it will be able to determine a solution to this problem, likely working with the repository community.
Acknowledgements: Thanks to Martin Fenner and Neil Chue Hong for helpful comments on this blog post.