Group PIDs – a needed abstraction?

This is the second of a set of short blogs based on thoughts at the RDA Plenary 10.  (The first was on agency and dynamicity as key properties of software.)

In the Persistent ID (PID) interest group session at the RDA meeting, there was a talk by Patricia Cruse  about an organization ID project, which aims to create a new set of PIDs for organizations, followed by some discussion.  As Martin Fenner pointed out in an earlier talk in the session, creating a new set of PIDs that is actually used requires a lot of effort, including careful thinking about uniqueness, persistence, descriptiveness, interoperability, and governance.  It’s clear that the organization ID working group has been thinking about all these things, and has done good work in providing potential answers, but listening to the Org ID talk and discussion, I wonder if this is a case where we should be thinking of abstractions first.

We currently seem to have one basic abstraction for most PIDS, based in large part on the Handle system, on which DOIs are built. This system use the namespaces abstraction, making it parallel in one dimension: each handle has a prefix (naming authority) and a suffix (local name).

I propose that we consider a group as a new abstraction, to support another dimension of parallelism.

Grouping PIDs is not a new idea, but it is currently buried in the metadata, rather than being exposed as a first-class concept.  In DOIs, for example, it can be accomplished by using the recommended Relators, which, for example, is implemented in DataCite’s metadata schema using the relationType property.

If we accept the idea of groups of PIDs as a valid high-level abstraction, there would be no need for a new PID for organizations, as this could be handle by a group of ORCID IDs. And it would enable the same group PIDs to support both formal organizations (for example, university, non-profits, etc.) as well as informal organizations (such as projects.)

Group PIDs would also help software, where a software project could be a group of software version PIDs, as I’ve previously discussed.  And it also likely would help in data, where there are issues about PIDs for datasets, PIDs for data collections, and PIDs for individual datum within them.

I don’t know how this group abstraction should be implemented, but some PID enthusiasts might have ideas.  Maybe we can talk about this at PIDapalooza?  If anyone want to work with me to propose something about this topic for that meeting, please let me know.  I’m told the deadline has been extended to today (Friday, 22 Sept.)

Advertisements

Published by:

danielskatz

Assistant Director for Scientific Software and Applications at NCSA, Research Associate Professor in CS, ECE, and the iSchool at the University of Illinois Urbana-Champaign; works on systems and tools (aka cyberinfrastructure) and policy related to computational and data-enabled research, primarily in science and engineering

8 Comments

8 thoughts on “Group PIDs – a needed abstraction?”

  1. If the IDs given are unique prime numbers, then one can multiply the prime numbers for individuals to get a unique group which it is easy to test if any one ID belongs to. The numbers could get very big for large numbers of IDs and for groups made of many individuals, but the numbers might not be too big to handle. Given a group ID it would also be possible (if a little long winded) to factorise the group ID to get the IDs of all individuals.

    This concept can be extended to include say organisation affiliations of researchers and individual researchers that may change affiliations or be affiliated with multiple institutions.

    So an individual researcher can multiply their ID with their affiliated organisations’ IDs to provide an ID which is unique to them for that particular affiliation. All researchers then that are in a group can have a grouped ID which maps on to their affiliations.

    Affiliations can change over time quite a lot, but mostly organisations and individual researchers will be unchanging. Of course it can be possible to have multiple IDs for any individual or organisation if that is important.

    Of course, this concept would need some analysis and testing to see if it is would be practicable, but in theory it should work.

    In practical terms it would probably be sensible to give organisations the smaller prime numbers and individuals large ones. It might be better to use another encoding for the numbers than base 10 to reduce the length of the IDs. As the IDs on the entire number line will be sparse, there might be some good ways to index this for speed. HTH

    Like

    1. I am surprised that this is even a discussion – group/aggregate identifiers are absolutely required. Organisations, groups, pseudonymous entities, collections, anthologies, special edtions, and archives are all creations of active intellectual effort (agency) and therefore worth identifying (and adding metadata to). Not only that but most have temporal variance of their membership which needs to be captured as part of their provenance and taken into account when referencing (see previous discussion on software). These groups are also some of the most important framing entities as far as individual dataset provenance is concerned.

      Liked by 1 person

      1. I agree that this is needed, as I think a lot of people do. My main point is that it needs to raised up as a concern, rather than being buried inside the existing implementations.

        Like

  2. An organization is more than the sum of the individuals involved. For one thing, individuals have roles within an organization (employee, student, manager, board member, etc). One could easily imagine (and this probably happens) two or more distinct organizations whose “members” are (at least at some point in time) the same collection of individuals. And as Neil Jefferies noted above, organizations have changing membership over time. So while groups of PID’s may be a useful concept in themselves, they cannot replace having distinct identifiers for organizations.

    (Disclaimer: I’m a member of the Org ID working group that Trish Cruse represented).

    Liked by 1 person

    1. I’m certainly willing to be wrong about Org IDs, but I do think that if we need a different type of ID and a different governance group for every type of thing that exists, we’re thinking about the problem incorrectly.

      On the other hand, I don’t see any issue with having groups that overlap, where each group has a unique group ID. Groups can also be dynamic, and can have maintainers (who keep track of who is in the group.)

      Like

      1. Hmm, maybe I’m not following what you’re suggesting here. Do you consider ORCID to be part of the handle system? But I don’t believe there’s a prefix/suffix involved there – other than a linked data URI prefix of https://orcid.org/ – but the same mechanism gives you a pretty general solution for any persistent identifier as long as there’s a stable organization able to provide such a prefix via DNS name. I think maybe what you’re suggesting is that a group / organization identifier could resolve at least in part to some sort of container that contains the (current?) collection of individual id’s that are its “members”. But that member list can’t be an intrinsic part of the identifier itself, or it would not be a persistent identifier. Perhaps we are lacking a standard for expressing lists neatly in the “PID” world?

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s