I’ve been watching and participating in FAIR work for a while, first in terms of data, and more recently in research software, workflows, and machine learning models. Some of this has been based on my general interest in how research works and how we improve it, and some has been based on more specific work, such as raising the profile of research software and its developers and maintainers, and even some collaborative work in specific domains, such as high energy physics.
Nowhere in these contexts and communities did anyone get up one morning and say, “I think we should create a new goal, FAIR, and then define metrics to measure how this goal is met.” This concept wasn’t created by funders or bureaucrats to give researchers something else to do and be measured upon. Instead, people who have been thinking for a long time about how research is performed thought about the overall process of research, and over a number of discussions and meetings, decided to identify and name some of the elements of the research ecosystem and process that needed to be improved in an effort to reinvigorate the agenda.
At least in part because they came up with a clever name (FAIR) and did a really good job of disseminating this, it has caught on in the research management, research administration, and research funding communities, and to a lesser extent, in the researcher community as well. And something identifiable that people can pursue and try to measure has led to lots of focus on being FAIR.
But FAIR isn’t the end goal, it’s just one part of the solution.
In my opinion, which highlights research software since this is my area of expertise, we want to move towards:
- All research software is open
- All research software is high-quality and robust
- All research software is findable, accessible, and usable & used by others (for their own research), and is cited when it is used
- All contributors to research software are recognized for their work, with good careers
- All research software is sustained as long as it is useful
- All research is reproducible
There are groups or communities working on most of these areas, including the open science and open source communities, the FAIR research software community, the software citation community, the RSE community, the software sustainability institutes, and the reproducibility community.
Why isn’t FAIR a good goal by itself? Again thinking of software, I can create FAIR software that is brittle (only works on my laptop), has bugs (doesn’t always give the right answers), is closed source so that it can’t easily be understood or extended, doesn’t say who created it so that the authors don’t get credit when it is used, isn’t sustained so it stops working when the next operating system comes out, etc.
Alternatively, I could write good quality, robust, open source software, with clear metadata that can be used for citation, and actively maintain it over time, but not deposit it in a public repository or make it findable except via verbal conference talks in my community and only share it via a GitLab instance in my university. This could lead to it becoming widely used, and a community forming around it, and me becoming recognized for my work developing it, maintaining it, and leading the community around it.
Which of these is better for me and for the research community? I think the latter, though of course, the latter + FAIR would be even better. But the key is that FAIR is positive in this larger context, as an element of and a means towards a larger vision, not as an isolated goal.
Some open questions
This post leads to a number of questions, some of which I hope to further explore, but also that might be interesting for others to explore.
- Is FAIR + FAT + open + citable a good way to describe the end goal for ML models?
- If not, what’s missing?
- If so, what’s the equivalent for software?
- Perhaps open + high-quality + robust + FAIR + citable + sustained?
- What’s the equivalent for data? Or is FAIR alone enough for data?
- Could all of these be generalized into one set?
Once we answer some of these questions, we also will need to consider marketing and branding, and how we influence the research community. Is there a good acronym for this? Or is this the point where the idea of an acronym breaks downs, and we should use multiple acronyms to cover different aspects (e.g., FAIR++, CARE, TRUST, FAT, DOME, TREE, OC [open, citable], QRR [Quality, Robust, Reproducible])?
Thanks to Michelle Barker, Carole Goble, and Fotis Psomopoulos for comments on this post.