This post is written in response to a recent article: “Beyond authorship: attribution, contribution, collaboration, and credit,” Learned Publishing 28(2), April 2015 (DOI: 10.1087/20150211). As a member of the CASRAI working group who “provided critical review of the [contributorship] taxonomy,” I am generally supportive of this idea of better and identifying contributions to scientific works.
However, as I have previously proposed, there is additional value in not just identifying such contributions but also quantifying them, as this would help us gain the ability to understand credit networks, which I’ve been calling transitive credit.
The Project CRediT taxonomy header says, “when there are multiple people serving in the same role, a degree of contribution may optionally be specified as ‘lead’, ‘equal’, or ‘supporting’.” This led me to realize that this information could be turned into numeric values, perhaps as follows, though there are many other options.
Set z to the number of roles that have listed contributors. For a given role, if no degrees or “equal” are specified, assume x contributors have contributions of 1/zx. If x “lead” and y “supporting” contributors are specified, assume the lead contributors have contributions of 3x/(3zx+zy) and the supporting contributors have contributions of y/(3zx+zy). After this is done for all roles, add up the values for each contributor. (Note that the sum of all values for all contributors is 1.)
There are two assumptions here, both of which are debatable:
- All roles are equally important
- Lead contributors make contributions that are uniformly 3 times as significant as supporting contributors
The idea is that this provides an easy way to turn these roles and possible contribution levels into numeric values, not that this is a unique way, or the best way to do so.
Doing this gives me half of the information I’ve claimed is needed for transitive credit to work, weighted contributions from people. The other information that we need is weighted contributions from products. Products are those things that the authors of the new work believe made a contribution to that work, including a subset of the papers that were cited, perhaps software that were cited, perhaps datasets that were cited, perhaps computing systems and instruments that were cited, etc. In addition to the citations, there also could be products that are mentioned in footnotes, or inline, often parenthetically.
Could we find this information automatically, similar to how we propose calculating people information, again with the idea that we might not get the numbers exactly right using such an automated method?
If all products were cited in other products, then the answer would be yes. There are methods to mine papers to find the products that are used; see this work in understanding data usage as an example. But this is based on the fact that text mining can search text publications and extract knowledge.
Can we also mine software? If we have source code, in theory yes, but in practice, it’s not clear. Can we mine data? Maybe in a future with pervasive provenance systems. Can we mine computing systems and other instruments for this information? Probably not.
To conclude, if we believe that idea of transitive credit would be useful but that it’s too much work to get the product developers to provide all the numeric data we need to implement it, the contributorship taxonomy leaves us better off than we were before, but not where we want to be.
Some work by the author was supported by the National Science Foundation (NSF) while working at the Foundation; any opinion, finding, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the NSF.