Generalizing FAIR

by Daniel S. Katz and Michelle Barker

Most researchers and policymakers support the idea of making research, and specifically research outputs, findable, accessible, interoperably, and reusable (FAIR). The concept of FAIR has been well-developed for research data, but this is not the case for all research products. This blog post seeks to consider how the application of FAIR to a range of research products (beyond data) could result in the development of different sets of principles for applying FAIR to different research objects, and to ask about the implications of this.

Work on making research outputs FAIR started with a meeting in 2014, and led to the publication of “The FAIR Guiding Principles for scientific data management and stewardship” (Wilkinson et al. 2016). However, this work always had a bit of a split focus. One one hand, the principles as written clearly focus on both data and the metadata associated with that data, while on the other hand, the work is intended to be much more general, as stated by Wilkinson et al. in multiple places in the paper (bolding is our emphasis):

All scholarly digital research objects—from data to analytical pipelines—benefit from application of these principles, since all components of the research process must be available to ensure transparency, reproducibility, and reusability.”

“The meeting concluded with a draft formulation of a set of foundational principles that were subsequently elaborated in greater detail—namely, that all research objects should be Findable, Accessible, Interoperable and Reusable (FAIR) both for machines and for people. These are now referred to as the FAIR Guiding Principles.”

“Analytical workflows, for example, are a critical component of the scholarly ecosystem, and their formal publication is necessary to achieve both transparency and scientific reproducibility. The FAIR principles can equally be applied to these non-data assets, which need to be identified, described, discovered, and reused in much the same manner as data.”

The goal of this blog post is to discuss the tension between these two foci (FAIR data and FAIR research products) and how we might resolve it. Regarding terminology, Wilkinson et al. 2016 generally uses the term “research objects” to implicitly mean research outputs, those that are intended to be stored and shared, not those objects that are used internally within a research process and are not stored. Therefore, in the rest of this post, we will use the term “research products” to be more clear. Research products can include data, software, workflows, training materials, executable notebooks, machine learning models, etc. Additionally, we will use the term “FAIR Principles” (see the GO FAIR listing at https://www.go-fair.org/fair-principles/ for a very slightly updated version from the original paper), rather than “FAIR Guiding Principles.”

Two of the FAIR concepts, findable and reusable, are quite general at a high level and are applicable at that level to all research products, though some of the details of how they are applied may differ according to the type of research product. For example, to make and produce findable products, metadata about that product must be stored in a way that it is searchable, but the details of specific metadata for products such as data and software might differ, with the programming language being appropriate for software but not for data.

The two remaining FAIR concepts, interoperable and reusable, are less easily understood and applied to research objects other than data. While the meaning of these two principles when applied to data is clearly defined, there are multiple possible meanings when they are applied to software, none of which are the same as when the concept is applied to data. Part of this is because different research products are fundamentally different, for example, data and software (Katz et al. 2016).

One part of the tension between applying the FAIR principles both to data and to other types of research products comes from the fact that the original FAIR Principles are really a combination of two things: principles regarding data, as a particular research product, and principles regarding metadata about research outputs (specifically about data). It appears these two things were combined in an effort to make the principles concise, without fully considering how this would affect the development and application of these principles in the future to other research products.

Potentially, this means that there could be complementary sets of specific FAIR principles developed, one for metadata, one for data, and one for each additional type of research product. Or alternatively, there could be a set of FAIR principles for each type research object and its associated metadata. Together, all these sets of FAIR principles could be called the “FAIR principles for research products.” Work has already been initiated on applying FAIR to diverse research objects. For example, Garcia-Silva et al. (2019) report on current FAIR implementation practices for software, services, workflows and executable notebooks to identify commonalities and gaps; and Deniz Beyan (2020) considers adoption of FAIR research objects within earth sciences.

With regard to specific research objects, the Fair for Research Software Working Group was convened in 2020 by the Research Software Alliance, FORCE11, and the Research Data Alliance to create a community-endorsed application of the FAIR principles to research software by mid-2021, building on a range of work in this area (FAIR4Software reading materials 2020). There are also initiatives considering how to apply FAIR to services (Koers 2020), computational workflows (such as Goble et al. 2020), training materials (Garcia et al. 2020), and machine learning (Katz, et al. 2020), along with related events and projects (including Eguinoa 2020, WorkflowHub project n.d.).

(Note: in 2020 the FAIR for Research Software Working Group undertook an exercise to identify work applying the FAIR principles so research products beyond data and software, which identified these resources and more.)

With FAIR for research software principles now being actively developed, the stage is set for the existence of a number of different sets of FAIR principles for different research products. But what are the implications of this? Does this pose challenges that need resolving by generalizing these separate principles into a single set of FAIR principles for all research products? And if so, how would this occur?

References

Deniz Beyan O. Chue Hong N. Cozzini S, Hoffman-Somer M, Hooft R, Liisi L, Juuso M, Teperek M. 2020, July 8. Seven Recommendations for Implementation of FAIR Practice. https://doi.org/10.5281/zenodo.3931993

Eguinoa I, Grüning B, Coppens F, Goble C, Soiland-Reyes S, Capella-Gutierrez S. 2 2020, Sept 2. ELIXIR | Workshop on FAIR Computational Workflows, in European Conference on Computational Biology. https://eccb2020.info/ntbew01-workshop-on-fair-computational-workflows/

FAIR4Software reading materials. 2020. https://www.rd-alliance.org/group/software-source-code-ig/wiki/fair4software-reading-materials

Garcia L, Batut B, Burke ML, Kuzak M, Psomopoulos F, Arcila R, et al. 2020. Ten simple rules for making training materials FAIR. PLoS Comput Biol 16(5): e1007854. https://doi.org/10.1371/journal.pcbi.1007854

Garcia-Silva A, Gomez-Perez, JM, Palma, R, Krystek M, Mantovani S, Foglini F, et al. 2019. Enabling FAIR research in Earth Science through research objects. https://doi.org/10.1016/j.future.2019.03.046

Goble C, Cohen-Boulakia S, Garijo D, Gil Y, Crusoe MR, Peters K, Schober D. 2020. FAIR computational workflows. Data Intelligence (2020) 2 (1-2): 108–121. https://doi.org/10.1162/dint_a_00033

Katz DS, Niemeyer KE, Smith AM, Anderson WL, Boettiger C, Hinsen K, Hooft R, Hucka M, Lee A, Löffler F, Pollard T, Rios F. 2016. Software vs. data in the context of citation. PeerJ Preprints 4:e2630v1. https://doi.org/10.7287/peerj.preprints.2630v1

Katz DS, Pollard T, Psomopoulos F, Huerta , Erdmann C, Blaiszik, B. 2020. FAIR principles for Machine Learning models. Presented at the Research Data Alliance Virtual Plenary 16. Zenodo. http://doi.org/10.5281/zenodo.4271996

Koers H, Gruenpeter M. Herterich P, Hooft R, Jones S, Parland-von Essen J Staiger C. 2020. Assessment report on ‘FAIRness of services’. https://doi.org/10.5281/ZENODO.3688762

Wilkinson, M, Dumontier, M, Aalbersberg, I et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018. https://doi.org/10.1038/sdata.2016.18

WorkflowHub project. n.d. https://about.workflowhub.eu/

Published by:

Daniel S. Katz

Chief Scientist at NCSA, Research Associate Professor in CS, ECE, and the iSchool at the University of Illinois Urbana-Champaign; works on systems and tools (aka cyberinfrastructure) and policy related to computational and data-enabled research, primarily in science and engineering

Categories Uncategorized4 Comments

4 thoughts on “Generalizing FAIR”

  1. Also, in the para “(Note: in 2020 the FAIR for Research Software Working Group undertook an exercise to identify work applying the FAIR principles so research products beyond data and software, which identified these resources and more.)” the link to the resources is misformed and doesn’t resolve.

    Like

Leave a comment