How does research software fit with the open-source software community?

While at the Chan-Zuckerberg Initiative’s Open Science 2022 Annual Meeting a couple of weeks ago, I was struck by a comment from Demetris Cheatham about how she hadn’t known about the scientific open-source community until she was introduced to it fairly recently, even though she has a huge amount of experience with the larger open-source community. This was especially confounding when she shared that she realized upon learning about us that the voice of our community was missing from her work to create a more inclusive environment within open source.

One tricky aspect here is that I prefer to talk about the “the research software community” instead of the “scientific open-source community”, because “research” is a more general term than “scientific”, and because while open-source is the dominant type of license, and the one I generally prefer, there is also important research software, particularly at the disciplinary level, that is not open source.

Demetris’s comment, and some others that came up in the meeting, made me think about how the open-source community can be understood, particularly in relation to the research software community.

One question is how is the overall open-source community defined? Perhaps it’s purely by license: software that uses licenses that have been approved by the Open Source Initiative.

A more challenging question is what are the parts of the community? To me, this includes where the research software community fits into the open-source community, as one part of it. But then what are the other parts? If we were to create a Venn diagram of the community, with research software as one of the circles, what would the other circles be, and how would they be laid out in relationship to each other?

I think such a discussion and diagram would be quite useful, as there are important characteristics of the research software community that are different than those of other communities, such as the frequent desire for academic credit (in the form of citations) by its developers, which is not the case for most other software. Because of the dependency web of open-source projects, many not research projects may depend on research software, and thus understanding the differences is important to the overall community.

As far as I know, this type of understanding of open-source software and the role of research software within it does not exist or if it does, it’s not widespread.

While at the CZI meeting, I talked briefly with Abigail Cabunoc Mayes who works at GitHub and asked her about how GitHub classifies open-source communities. She initially responded that this was done by language (Python, R, etc.) and then added it is also done by geography.

The research community also could be classified by language and geography, and this might make sense for some fundamental research software, but at higher levels of the research software stack, discipline (astronomy, chemistry, bioinformatics, linguistics, etc.) is also quite important.

I don’t have any answers here: this is an open question, where I hope to find others who have thought about this and come up with ideas of how to represent these software communities. And if not, does anyone want to work together on this?

This post has also been published as https://doi.org/10.59350/wbe26-8f285 on the Rouge Scholar.

Published by:

Daniel S. Katz

Chief Scientist at NCSA, Research Associate Professor in CS, ECE, and the iSchool at the University of Illinois Urbana-Champaign; works on systems and tools (aka cyberinfrastructure) and policy related to computational and data-enabled research, primarily in science and engineering

Categories RSELeave a comment

Leave a comment