Redefining Cyberinfrastructure

Based on some recent experiences, I want to try to redefine cyberinfrastructure. While I think Craig Stewart’s 2010 definition is still useful, new thinking might also have benefits.

Let’s start with the elements of cyberinfrastructure.

At the most basic level, these elements are simply people and things.

But there are different kinds of things: ideas and facts (data). And some ideas can be instantiated in a physical object (hardware), while others cannot (software and processes). This brings us to five fundamental cyberinfrastructure elements:

  1. People
  2. Hardware
  3. Software
  4. Processes
  5. Data

Cyberinfrastructure is then the integration of some of these elements for the purpose of scholarship, which I’ll define as the development and sharing of knowledge.

Putting this all together:

Cyberinfrastructure is the integration of elements of people, hardware, software, processes, and data, for the purpose of scholarship: developing and sharing knowledge.

(I don’t have any strong feelings about the word cyberinfrastructure itself, and would be fine with it being replaced by another word, but I do think we need a word for this concept, and in the absence of an alternative, I’ll stick with cyberinfrastructure for now.)

Discussion

Craig originally defined:

Cyberinfrastructure consists of computing systems, data storage systems, advanced instruments and data repositories, visualization environments, and people, all linked together by software and high performance networks to improve research productivity and enable breakthroughs not otherwise possible.

https://doi.org/10.1145/1878335.1878347

Comparing this to my definition, Craig’s computing systems are my hardware and software, as are his data storage systems, his advanced instruments, his visualization environments, and his high-performance networks. His data repositories are my hardware, software, and data. His people are my people and my processes, and his software is my software.

One difference, which is probably more a difference of the times than any profound disagreement, is that I’m exposing data itself as fundamental.

A second difference is that I’m also calling out processes, which I think Craig implicitly includes in people. I think this explicit addition of processes allows me to bring cybersecurity into cyberinfrastructure more cleanly.

Another difference is that I’m not grouping and naming a number of the things that Craig did specify, preferring to work at a lower level. Perhaps my fundamental cyberinfrastructure elements could be seen as atoms, and Craig’s as a mix of atoms and molecules?

A final difference is that Craig’s definition requires these elements to be used to “improve research productivity and enable breakthroughs not otherwise possible,” while for me, this desire to do things better and learn more is part of our scholarly culture, not tied to the cyberinfrastructure itself. I also add transmitting knowledge, as I think cyberinfrastructure has a role in education.

I also want to compare this with work done by the Campus Research Computing Consortium (CaRCC), and in particular, their five facings model. I am partially writing this in response to their use of the term “Research Computing and Data (RCD)” in their vision and mission, which I initially thought was limiting because it only named two elements, missing software. In the process of writing this, I realized that this term also missed people and processes, though to be fair, CaRRC is focused on them in its vision and mission; it just looks at them as orthogonal to RCD, and it does include software implicitly in RCD.

Acknowledgements

As I hope is clear, I have a great deal of respect for Craig and his colleagues who came up with this original definitions, in the context of many others who were thinking about these terms and ideas around the same time. I also have a lot of respect for the CaRCC community, and the huge amounts of effort they have put into our overall community over the past 5 years. Finally, my thinking about this and feeling there was a need for a new definition came most directly from discussions at the recent CI Workforce Development Workshop 2020 over the past few weeks.

Published by:

danielskatz

Assistant Director for Scientific Software and Applications at NCSA, Research Associate Professor in CS, ECE, and the iSchool at the University of Illinois Urbana-Champaign; works on systems and tools (aka cyberinfrastructure) and policy related to computational and data-enabled research, primarily in science and engineering

Categories UncategorizedTags 1 Comment

One thought on “Redefining Cyberinfrastructure”

  1. CaRCC has not included network / data movement or software or security or … in RCD, as they are implicit (and the ack gets too wieldly) but does include people, just not in the RCD acronym like CI does not. The RCD definition is: “Research computing and data” involves people, scholarship, and resources supporting the needs of researchers and research leveraging compute, data, networking, and software, broadly defined, including the professionals who execute and support these efforts. Whereas entities supporting research computing and data historically emerged from operating and supporting high performance computing, the needs, capabilities and technologies have sufficiently broadened the scope of research information technology to include virtualization, support for the cloud, containers, middleware, workflows, data management, data movement, compliance and security, user training, support of instruction using advanced research computing and data, on-boarding into new technologies, and deep engagement (“facilitation”) to help guide researchers.

    Various CaRCC working groups had extensive discussions about what to call the professionals, the top 3 were CI profs, RCD profs and Research IT profs. Each has drawbacks. CI is only known in the NSF community and not recognized by CIOs, etc. RCD is not fully inclusive as noted. Research IT works and is descriptive, however has negative connotations or perceptions as it relates to enterprise or central IT. I also, rather than focus on scholarly activities, focused on supporting research and researchers which brings the researchers and RCD professionals together, where the RCD professionals bridge the researchers and research to the technology.

    Note finally we haven’t explicitly set-up the software-facing yet, but we intend to do this cooperatively with the RSE’s and/or other organizations in this space. If these groups wants to do it on their own or independently do this, that is fine too, but we are happy to help facilitate. I will also note that I personally have two views of software-facing aka software developers (often in the domains) vs. software supporters (modules, containers, compiling, testing/verifying, benchmarking, optimizing who are often RCD/CI center staff). At our ecosystem’s workshop there was considerable push to roll out the data-facing and emerging-centers facing, which were done. Security-facing, software-facing, etc. may come noting that CaRCC does not want to own or control anything and does not necessarily have to lead efforts. From our almost complete Charter: CaRCC does not control or own the Groups and Tracks. For example, the Researcher-Facing Track community owns the Researcher-Facing Track, and they decide their activities and fate, with guidance from the Track Coordinators, the People Network Coordinators, their own steering committee, and potentially others from the Chairs Leadership Team, if they seek council. Also CaRCC welcomes collaborators and partnerships, with governance by consensus, with shared credit across participating organizations. For example. the capabilities model was a shared effort of Internet2, EDUCAUSE, and CaRCC, and an emerging stakeholder and leadership facing track will be led by the EDUCAUSE RC communities group in collaboration with CaRCC and CASC. Note that we are still trying to evolve and learn and as a relatively small community, we all need to work together openly, transparently, and collaboratively recognizing different identities and ownership in order to sustain the ever-expanding, transforming, and diversifying landscape that is CI or RCD.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s