Software and repositories in the context of FAIR

The latest Research Software Alliance (ReSA) newsletter includes the fact that:

The 2018 European Commission report, Turning FAIR into Reality, concludes that FAIR digital objects (including software) need to be supported by metrics, incentives, skills and FAIR services that provide persistent identifiers, metadata specifications, stewardship and repositories, actionable policies and Output Management Plans.

This has made me think about the role of software repositories in encouraging FAIR software.

For data (the subject of the original FAIR principles), the role of a repository is to store data and associated metadata, and in some sense, to add an external enforcement mechanism that requires the metadata associated with the data to be recorded. This helps enable parts of FAIR: Findability, as the metadata makes the data properties searchable, and the repositories provide a set of places that can be searched; Accessibility, as the repositories provide archival storage of the data and metadata, as well as a means to retrieve them; and Reusability, because once you have the data, you can read and use it. Note that for data, read and use are closely related properties. While there may be algorithms or software needed to do both, what’s important is that those algorithms or software do both.

However, when we consider software as the research object rather than data, reading it and using it are quite different. Well-written source code can be preserved and read, which allows the reader to see its intent and algorithms. However, using such source code can be more challenging, as doing so often depends on the hardware and software environment, as well as software dependencies. The problem is software collapse, the fact that software eventually stops working if is not actively maintained due to changes in environment and dependencies. On the other hand, there is also software that can be executed but cannot be read, such as executables and services.

I’m going to consider containers a packaging technology rather than a solution, since they can fix the environment and the dependencies, but if there are bugs in either that have been discovered and fixed, reuse of the software (as opposed to simply re-executing it) still has the same software collapse problem.

In other words, we can preserve source code for reading, and we can preserve executable software for re-execution, but we cannot simply preserve software for reuse. For reuse, we need to sustain the software: to supply human effort that continually works on the software to adapt it as needed for changes in the environment and dependencies.

This implies that preservation/archival repositories are not a complete solution to software reuse, while they can be data reuse, at least for static data sets. For software reuse, we also need an up-to-date working version of the software, which for open source likely comes from well-maintained software on a software development platform (such as GitHub or GitLab), and for closed source software likely comes from a vendor.

I believe this means that FAIR for software cannot simply depend on the infrastructure and environment being built to support FAIR data, specifically the existing repositories. This is one of the reasons we need a separate FAIR for research software (FAIR4RS) activity, and I hope it will be able to determine a solution to this problem, likely working with the repository community.

Acknowledgements: Thanks to Martin Fenner and Neil Chue Hong for helpful comments on this blog post.

Published by:

danielskatz

Assistant Director for Scientific Software and Applications at NCSA, Research Associate Professor in CS, ECE, and the iSchool at the University of Illinois Urbana-Champaign; works on systems and tools (aka cyberinfrastructure) and policy related to computational and data-enabled research, primarily in science and engineering

Categories UncategorizedTags 2 Comments

2 thoughts on “Software and repositories in the context of FAIR”

  1. For me, your blog post is highlighting the difference between preservation and sustainability. Typically, preservation is entrusted to specialists (“curators”) and the aim is for this to be protecting against specific and catastrophic situations (such as the demise of a major part of the ecosystem). Whereas sustainability is about ensuring that the general population is able to continue to use a resource, normally through the passing on of practices and skills. The example I often use when I talk about this subject is the difference between seed banks (“preservation”) and community gardens (“sustainability”) – both have the overall goal of ensuring that we are able to engage in farming and horticulture in the future but take very different approaches. We need both, but we don’t always place value on either.

    Liked by 1 person

  2. The software collapse problem lurks everywhere. Your suggestion that a well maintained software development platform is required for ‘”community sustainability” implies just that (thanks NPCH for the garden analogy). Repositories that also provide development capabilities are inherently complex and subject to changing fashions like any other aspect of IT. If the community maintaining a project isn’t willing to deal with changes in platform, the project is lost.

    I also wonder frequently about the F in FAIR. Standardised repository platforms make finding projects and processing metrics from their metadata feasible (ie without need of a web crawler), but those searches ignore repositories not on those platforms, and metadata structure different to the expected or recommended norms for what ever reason. This diversity is both a threat to evaluation with FAIR and an indicator that should widely used platforms disappear or approach collapse (as they have in the past), software will continue to be AIR (If you can find it !!).

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s