To better understand research communication, we need a GROUPID (group object identifier)

After a number of general discussions in the research communication community, mostly focused on software citation, and then a few separate discussions with Anita Bandrowski and Martin Fenner, it’s become clear to me that we need something like a group (perhaps hierarchical) object identifier (GROUPID), which is somewhat different than a DOI, or at least different than how DOIs are commonly used today. I’m thus trying to document some uses cases and their associated requirements, with the hope that this will be interesting to others who might want to discuss what we can do to address the use cases.

Some uses cases are:

  • A researcher wants to cite a software project, meaning a collection of releases of that software, not one specific release. The researcher can cite the GROUPID itself.
  • A funder wants to collect citations to a project that they funded. The GROUPID can be queried to find the individual DOIs that it contains.
  • A contributor to a software project wants to collect citations to the versions of the software that she contributed to. The GROUPID can be queried with a date range to find the individual DOIs that it contains and that have registration dates that match that date range.

The requirements that come out of these use cases are:

  • A GROUPID can be cited, and like a DOI, has metadata, and points to a landing page.
  • A GROUPID is a container, which by default defines a parent-child relationship to the contents of the container, though other relationships might also be possible.
  • A GROUPID can be queried to return the contents of the container, with optional parameters to return a subset of the contents that match those parameters.

Discussion:

Anita believes that RRIDs (see https://www.force11.org/group/resource-identification-initiative and https://scicrunch.org/resources) can be used to solve this need. I’m less sure, because this is only somewhat a naming issue but also a querying/relationship issue, and RRIDs just solve the naming issue. In addition, I would prefer that we use the same type of index for both the individual objects and the groups, which implies to me that DOIs are the right thing to use.

(Anita also points out that identifiers of any kind, whether GROUPIDs, DOIs, etc., need an organization standing behind them that makes sure that they are accurate and unique, and to make sure they resolve. Specifically, this means that researchers should not mint their own DOIs, but should be sure that this is done by an institution that has considered these issues, including persistence.)

Martin Fenner suggests “DOIs support this functionality already, and there are many examples of DataCite and Crossref DOis referring to multiple objects. One of many examples is what Dryad is doing with ‘DataPackages’ and ‘DataFiles’.” See the ‘What is a data package?’ and ‘What is a Dryad DOI?’ in http://datadryad.org/pages/faqIn terms of naming and relationships, this is probably sufficient. However, the real problem that remains is how indexing and counting is done.

In fact, Martin says, “I see this more as a feature that should be supported by the persistent identifiers in general (and several support this), rather than a need for a new specific identifier,” and I’m fairly sympathetic to his point-of-view.

I think that there are needs for understanding scholarly communication that are unmet by our current system, though how best to meet these needs is much less clear.  The concept of a GROUPID is one idea, but it might be met through expanded use of existing identifiers.

We probably need a fair amount of community discussion about this; I hope this gets a few people interested enough that they start this discussion.

Published by:

Daniel S. Katz

Chief Scientist at NCSA, Research Associate Professor in CS, ECE, and the iSchool at the University of Illinois Urbana-Champaign; works on systems and tools (aka cyberinfrastructure) and policy related to computational and data-enabled research, primarily in science and engineering

Categories Uncategorized4 Comments

4 thoughts on “To better understand research communication, we need a GROUPID (group object identifier)”

  1. Good idea! Here are some thoughts on the topic. Firstly, I would call that ID a Project Object Identifier (PROID or ProID or similar). The rationale is that, terminology-wise, group refers to… well, group, whereas we want to establish relationships with projects (by the way, not only software projects – any scientific research projects). Secondly, following the thought above, I would suggest to construct PROIDs similarly to DOIs, but based on X.500 standards subset. The rationale is that an address in X.500 scheme reflects an organizational hierarchy, which perfectly fits current scientific model, where all or most research projects and their artifacts are affiliated (directly or via participants) with certain organizations and their departments, units and research groups. An additional benefit might be an existing integration between X.500 directory elements and X.509 digital certificates (https://en.wikipedia.org/wiki/X.500#The_relationship_of_the_X.500_Directory_and_X.509v3_digital_certificates), thus, eliminating the problem of validating credentials and such. Thus, a global (full) PROID might be in the following format: / (a corresponding URL would be formed similarly to DOI URLs).

    Like

  2. Hi,
    I was pointed towards this blog post by a colleague who attended the CW17 workshops.
    I’ve been pondering how I could represent the organisational structure of a University (faculties, departments, schools, research groups, …) in a persistent manner. As a research group’s funding ends, their presence in internal systems disappears – but any outputs of the group should still (in my opinion) be linked to the group that created them, and also to other outputs produced by the same group.
    A DOI could point to the group’s web-presence whilst the group is active.
    When this web presence disappears, the DOI tombstone page could provide a set of metadata, including the dates when the group was active.

    The same would be useful for digitised copies of old theses. Being able to ‘properly’ associate a thesis with it’s long-defunct department or school would be nice (and we’d be able to show the evolution of the University over time – as disciplines come in to, and fall out of fashion).

    I’m not sure how the X.500 suggestion above would interact with this idea (more reading for me to do!).

    Like

Leave a comment