FAIR is not the end goal

I’ve been watching and participating in FAIR work for a while, first in terms of data, and more recently in research software, workflows, and machine learning models. Some of this has been based on my general interest in how research works and how we improve it, and some has been based on more specific work, such as raising the profile of research software and its developers and maintainers, and even some collaborative work in specific domains, such as high energy physics.

Nowhere in these contexts and communities did anyone get up one morning and say, “I think we should create a new goal, FAIR, and then define metrics to measure how this goal is met.” This concept wasn’t created by funders or bureaucrats to give researchers something else to do and be measured upon. Instead, people who have been thinking for a long time about how research is performed thought about the overall process of research, and over a number of discussions and meetings, decided to identify and name some of the elements of the research ecosystem and process that needed to be improved in an effort to reinvigorate the agenda.

At least in part because they came up with a clever name (FAIR) and did a really good job of disseminating this, it has caught on in the research management, research administration, and research funding communities, and to a lesser extent, in the researcher community as well. And something identifiable that people can pursue and try to measure has led to lots of focus on being FAIR.

But FAIR isn’t the end goal, it’s just one part of the solution.

In my opinion, which highlights research software since this is my area of expertise, we want to move towards:

  • All research software is open
  • All research software is high-quality and robust
  • All research software is findable, accessible, and usable & used by others (for their own research), and is cited when it is used
  • All contributors to research software are recognized for their work, with good careers
  • All research software is sustained as long as it is useful
  • All research is reproducible

There are groups or communities working on most of these areas, including the open science and open source communities, the FAIR research software community, the software citation community, the RSE community, the software sustainability institutes, and the reproducibility community.

Based on a discussion in the FAIR Festival last week, for machine learning models, perhaps what we need is FAIR + FAT + open + citable, at least as a starting point.

Or should high-quality and robust also be added? Or DOME? Or TREE?

Why isn’t FAIR a good goal by itself? Again thinking of software, I can create FAIR software that is brittle (only works on my laptop), has bugs (doesn’t always give the right answers), is closed source so that it can’t easily be understood or extended, doesn’t say who created it so that the authors don’t get credit when it is used, isn’t sustained so it stops working when the next operating system comes out, etc.

Alternatively, I could write good quality, robust, open source software, with clear metadata that can be used for citation, and actively maintain it over time, but not deposit it in a public repository or make it findable except via verbal conference talks in my community and only share it via a GitLab instance in my university. This could lead to it becoming widely used, and a community forming around it, and me becoming recognized for my work developing it, maintaining it, and leading the community around it.

Which of these is better for me and for the research community? I think the latter, though of course, the latter + FAIR would be even better. But the key is that FAIR is positive in this larger context, as an element of and a means towards a larger vision, not as an isolated goal.

Some open questions

This post leads to a number of questions, some of which I hope to further explore, but also that might be interesting for others to explore.

  • Is FAIR + FAT + open + citable a good way to describe the end goal for ML models?
  • If not, what’s missing?
  • If so, what’s the equivalent for software?
  • Perhaps open + high-quality + robust + FAIR + citable + sustained?
  • What’s the equivalent for data?  Or is FAIR alone enough for data?
  • Could all of these be generalized into one set?

Once we answer some of these questions, we also will need to consider marketing and branding, and how we influence the research community. Is there a good acronym for this?  Or is this the point where the idea of an acronym breaks downs, and we should use multiple acronyms to cover different aspects (e.g., FAIR++, CARE, TRUST, FAT, DOME, TREE, OC [open, citable], QRR [Quality, Robust, Reproducible])?

Acknowledgements

Thanks to Michelle Barker, Carole Goble, and Fotis Psomopoulos for comments on this post.

One thought on “FAIR is not the end goal”

  1. Enjoyed reading the post.

    I think the idea of an acronym breaks down if we try to at best include everything or worst generalise. For instance, FAIR data can be conflated with open data. The ethical implications of data publishing and sharing get inserted into FAIR, context gets lost (then you need two more terms: ELIS + CARE). Where do you stop? Each domain already has acronyms that are specialised now we are adding a few more on top of that.

    However, I agree, creating a clever and catchy acronym has a purpose.

    Some random comments, thinking aloud here.

    >> Is FAIR + FAT + open + citable a good way to describe the end goal for ML models?

    (note: I think FAT is slowly being replaced by FAccT https://facctconference.org/faq.html)

    The key phrase here is the “end goal” that is difficult to generalise. Maybe there are common elements within scientific practices that can be highlighted, but there will always be context specific gaps. 

    >> If not, what’s missing?

    Societal-well being, software ethics,? Thinking about AI applications that are used in warfare and surveillance (lots of these applications are open source).

    Environmental costs of data extractions? (See Kate Crawford’s Atlas of AI https://yalebooks.yale.edu/book/9780300209570/atlas-ai)

    Maybe instead of acronym we think about slogans, phrases, manifestos? 

    A few examples: 

    1. Protocols not platforms (https://knightcolumbia.org/content/protocols-not-platforms-a-technological-approach-to-free-speech)

    2.  https://investinopen.org/ (Infra level thinking shows the big picture, connectedness of software and data and other factors).

    3. https://archiveofourown.org (a community driven archival project)

     4. Or use names signifying something like Hathi (https://www.hathitrust.org/) or Wiki? 

    5. Maybe a manifesto or oath? like: https://github.com/Widdershin/programmers-oath

    Liked by 1 person

Leave a comment