How should we add citations inside software?

In a series of tweets (starting with one from @dimazest and including some from @turingfan@Iza_Romanowska, and CIRCA_StAndrews), it’s become apparent that we don’t have a standard way of providing attribution (e.g., citations, acknowledgements, dependencies) within software like we do in papers, where we have standard acknowledgement and reference sections, and for at least the reference section, a standard practice of indexers scanning this section to build up citation networks.  While my transitive credit idea would handle this, perhaps there are simpler things that we can do. depsy does some of this via dependency analysis, but doesn’t consider the equivalent of citations or acknowledgements, as far as I know.  CIRCA_StAndrews suggests REFERENCES or DEPENDENCIES files, in addition to CITATION and LICENSE files.  This might be a good idea, but it leads to questions about how such files would be formatted (perhaps JSON-LD as for transitive credit?), generated, edited, and read, both by humans and machines.  But I don’t think these things would be hard to figure out or program, for someone with a bit of extra time on their hands who is not transitioning from one job to another 🙂

So, this short blog post is a call for thinking and community action, and maybe a bit of hacking, on this idea.  If I had managed to go to the SSI Collaborations Workshop, I might have suggested this for discussion and the hack day.  It could wait for WSSSPE4, or for the 2016 Collaborations Workshop, but I hope someone will want to further investigate this sooner.

Disclaimer

Some work by the author was supported by the National Science Foundation (NSF) while working at the Foundation; any opinion, finding, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the NSF.

 

 

Advertisements

Published by:

danielskatz

Assistant Director for Scientific Software and Applications at NCSA, Research Associate Professor in CS, ECE, and the iSchool at the University of Illinois Urbana-Champaign; works on systems and tools (aka cyberinfrastructure) and policy related to computational and data-enabled research, primarily in science and engineering

9 Comments

9 thoughts on “How should we add citations inside software?”

  1. I think this is a great topic for WSSSPE4, but maybe if some ideas can be discussed between now and then, we can spend the time at the workshop in a hacking session to try and implement something.

    Without a better place to continue the conversation at the moment, I’ll just start to add some possible ideas here.

    One option is to adopt/modify the PETSc approach (http://www.mcs.anl.gov/papers/P5010-0913_1.pdf), that keeps track of the libraries or algorithms used in the source code, by adding reference items at runtime. However, a downside to this approach is that it might not be easily machine-readable, since there is no absolute list of citations.

    Like

  2. Note that software is not merely a static archive of source code, but a program that is executed by different people for different purposes. Similarly, the correct citation graph (and weighting) is not static; it depends (non-locally) on the run-time configuration/input. This is an issue that we attempted to address in https://figshare.com/articles/Accurately_Citing_Software_and_Algorithms_Used_in_Publications/785731 . I would argue that any system for transitive credit would need to dynamically generate the credit graph as part of execution. Since a scientific result often depends on many executions of different configurations, that credit graph would need a combiner function. I would also hope that the weighting (necessarily according to some algorithm, per above) be subject to peer review.

    Like

  3. Hello Daniel,

    My proposal for the 2016 SSI Collaborations Hackday was in line with, and inspired by, your transitive credit proposal: https://docs.google.com/document/d/1phwLtJATzh1PVTrri3izaDt8UlRkVAJWufr09XWgCB4/edit

    In short: The syntax developed for the Common Workflow Language standards (mostly coded by Peter Amstutz) seems to be a useful, easy to read way of using linked data in a lightweight manner. A simple application of this is machine-readable software citation communication at a variety of granularities: the software tool as a whole, for a particular subcommand or parameter, or based upon the actual code paths followed. As a bonus, the first two granularities don’t require modification of the tool’s source code, and thusly can be applied to existing projects easily.

    Interestingly there wasn’t any interest from the SSI crowd in working on this, so I joined Iza’s project.

    A quick clarification:

    The initial tweet by Dmitrijs Milajevs (@dimazest) was from the SSI Collaborations Workshop 2016 that was held last week in Edinburgh. The other team members are Iza Romanowska (@Iza_Romanowska), Louise Brown, and myself.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s