scispace - formally typeset
Open AccessJournal ArticleDOI

Science friction: data, metadata, and collaboration

Reads0
Chats0
TLDR
This work proposes an alternative view of metadata, focusing on its role in an ephemeral process of scientific communication, rather than as an enduring outcome or product, and argues that while metadata products can be powerful resources, usually they must be supplemented with metadata processes.
Abstract
When scientists from two or more disciplines work together on related problems, they often face what we call 'science friction'. As science becomes more data-driven, collaborative, and interdisciplinary, demand increases for interoperability among data, tools, and services. Metadata--usually viewed simply as 'data about data', describing objects such as books, journal articles, or datasets--serve key roles in interoperability. Yet we find that metadata may be a source of friction between scientific collaborators, impeding data sharing. We propose an alternative view of metadata, focusing on its role in an ephemeral process of scientific communication, rather than as an enduring outcome or product. We report examples of highly useful, yet ad hoc, incomplete, loosely structured, and mutable, descriptions of data found in our ethnographic studies of several large projects in the environmental sciences. Based on this evidence, we argue that while metadata products can be powerful resources, usually they must be supplemented with metadata processes. Metadata-as-process suggests the very large role of the ad hoc, the incomplete, and the unfinished in everyday scientific work.

read more

Content maybe subject to copyright    Report

UC Irvine
UC Irvine Previously Published Works
Title
Science friction: data, metadata, and collaboration.
Permalink
https://escholarship.org/uc/item/2tj4g7sc
Journal
Social studies of science, 41(5)
ISSN
0306-3127
Authors
Edwards, Paul N
Mayernik, Matthew S
Batcheller, Archer L
et al.
Publication Date
2011-10-01
DOI
10.1177/0306312711413314
Copyright Information
This work is made available under the terms of a Creative Commons
Attribution License, availalbe at
https://creativecommons.org/licenses/by/4.0/
Peer reviewed
eScholarship.org Powered by the California Digital Library
University of California

Social Studies of Science
41(5) 667 –690
© The Author(s) 2011
Reprints and permission: sagepub.
co.uk/journalsPermissions.nav
DOI: 10.1177/0306312711413314
sss.sagepub.com
Science friction: Data,
metadata, and collaboration
Paul N. Edwards
School of Information, University of Michigan, Ann Arbor, MI, USA
Matthew S. Mayernik
Graduate School of Education and Information Studies, UCLA, CA, USA
Archer L. Batcheller
School of Information, University of Michigan, Ann Arbor, MI, USA
Geoffrey C. Bowker
School of Information Sciences, University of Pittsburgh, PA, USA
Christine L. Borgman
Graduate School of Education and Information Studies, UCLA, CA, USA
Abstract
When scientists from two or more disciplines work together on related problems, they often
face what we call ‘science friction’. As science becomes more data-driven, collaborative,
and interdisciplinary, demand increases for interoperability among data, tools, and services.
Metadata – usually viewed simply as ‘data about data’, describing objects such as books, journal
articles, or datasets – serve key roles in interoperability. Yet we find that metadata may be
a source of friction between scientific collaborators, impeding data sharing. We propose
an alternative view of metadata, focusing on its role in an ephemeral process of scientific
communication, rather than as an enduring outcome or product. We report examples of highly
useful, yet ad hoc, incomplete, loosely structured, and mutable, descriptions of data found in
our ethnographic studies of several large projects in the environmental sciences. Based on this
Corresponding author:
Paul N. Edwards, School of Information, University of Michigan, 3439 North Quad, 105 S. State St., Ann Arbor,
MI 48109-1285, USA.
Email: pne@umich.edu
413314
SSSXXX10.1177/0306312711413314Edwards et al.Social Studies of Science

668 Social Studies of Science 41(5)
evidence, we argue that while metadata products can be powerful resources, usually they must
be supplemented with metadata processes. Metadata-as-process suggests the very large role of
the ad hoc, the incomplete, and the unfinished in everyday scientific work.
Keywords
collaboration, communication, data, metadata
Humanity is now in the business of managing the planet (Elichirigoity, 1999; Serres,
1995, 2007). As the world population has soared over the last 150 years, people have
commandeered an ever larger percentage of the incoming solar energy, whether directly
by converting it to electricity, or indirectly by harnessing it through biofuels, agriculture,
forestry, and use of ecosystem services. According to recent estimates, human beings
appropriate about 24 percent of Earth’s potential net primary productivity (a measure of
the biomass available in terrestrial ecosystems) each year, and approximately 83 percent
of the world’s land surface is directly influenced by human activity (Haberl et al., 2007;
Sanderson et al., 2002). Meanwhile, humanity is provoking very rapid climatic change
as well as one of the largest extinction events in the history of life on Earth, even while
seeking ways to mitigate the most dramatic of these effects.
Monitoring and managing all this – to the extent that we can – requires vast amounts
of observational data, coordinated across a bewildering multitude of so-called scientific
disciplines. Meanwhile, the explosion of computer processing power and storage capac-
ity has transformed the sciences’ ability to find, use, coordinate, and re-use these data.
This paper explores issues arising from this new environment, which some go so far as
to call a ‘fourth paradigm’ of scientific work driven by the availability of large datasets,
wherein patterns may be sought directly rather than through more traditional hypo-
thetico-deductive methods (Hey et al., 2009).
Science studies has probed many kinds of data problems within particular scientific
disciplines, such as contested interpretations of data, relations between database struc-
tures and data collecting practices, questions about when and why certain instrument
readings count as data, the ‘experimenters regress’, and boundaries between docu-
ments and data (Bowker, 2000, 2005; Bowker and Star, 1999; Buckland, 1991, 1997;
Collins, 1985; Collins and Pinch, 1993; Zimmerman, 2007). Yet our field has rarely
considered how data travel among diverse disciplines; as sociologists of science, we
have tended to look under the lamppost of whatever field we happen to know. It’s
interesting (and hard) enough to explicate memory practices within one discipline –
why learn five?
Science studies has developed useful ideas about how theories, concepts, speci-
mens, maps, instruments, and other elements of scientific practice travel across various
divides: from theoretical to experimental subfields, from professionals to amateurs,
from scientists to engineers, and so on. Keystone STS phrases such as ‘boundary
objects’, ‘immutable mobiles’, ‘virtual witnessing’, and ‘trading zones’ help make
sense of these transitions (Galison, 1996; Latour, 1987; Shapin and Schaffer, 1985;
Star and Griesemer, 1989; Strathern, 2004). There is also, of course, a large literature
on the unpacking of data during episodes of scientific or technical controversy (Collins,
1985; Collins and Pinch, 1993; Kevles, 1998; Vaughan, 1996). Yet most of this work

Edwards et al. 669
has focused either on higher-level results, products and artifacts, or on mutable inter-
pretations of evidence, rather than on the travels of data per se: data function as an
actors’ category, as in the cases of collections of instrument readings, field observa-
tions, model outputs, and so on, which represent the daily work of science. As datasets
become increasingly commoditized, ‘mined’, and exchanged among distant disciplines,
this area needs much closer scrutiny.
Our traditional STS approach to data in science resembles the traditional approach of
historians to history. They write national histories because the archives (data) are
national; no matter that many real historical processes stubbornly exceed national bound-
aries (Braudel, 1975; Michelet, 1930; Wallerstein, 1976). And no matter, in our own
case, that much of today’s most interesting and important science operates between
domains. The Comtean hierarchy of physics, chemistry, and biology as driving disci-
plines is long gone, replaced by a massive proliferation of interdisciplines. Nowhere is
this more true than in the Earth and environmental sciences – sciences upon which
humanity relies for its overweening yet unavoidable goal of planetary management.
Unlike previous macro-paradigms of scientific work, in which data were treated as the
private (and closely held) property of individuals or laboratories, in these interdisciplin-
ary domains data need to travel far and wide. It is time for science studies to investigate
how data traverse personal, institutional, and disciplinary divides.
Science friction
Friction resists and impedes. At every interface between two surfaces, friction con-
sumes energy, produces heat, and wears down moving parts. Edwards’ metaphor of
data friction describes what happens at the interfaces between data ‘surfaces’: the
points where data move between people, substrates, organizations, or machines – from
one lab to another, from one discipline to another, from a sensor to a computer, or from
one data format (such as Excel spreadsheets) to another (such as a custom-designed
scientific database) (Edwards, 2010). Every movement of data across an interface
comes at some cost in time, energy, and human attention. Every interface between
groups and organizations, as well as between machines, represents a point of resistance
where data can be garbled, misinterpreted, or lost. In social systems, data friction con-
sumes energy and produces turbulence and heat – that is, conflicts, disagreements, and
inexact, unruly processes.
Data friction leads inevitably to what we call ‘science friction’: the difficulties
encountered when two scientific disciplines working on related problems try to interop-
erate. To take a prominent example, consider the tension between weather forecasting
and climatology, separate fields within the disciplinary landscape of meteorology.
Weather forecasters have been collecting daily observations since the 1850s. In service
of their chief goal – accurate near-term forecasting – their priority is swift communica-
tion and constant improvement of observing and forecasting systems. Even week-old
data have little value for tomorrow’s forecast, so until recent decades forecasters placed
a low priority on storing, cataloguing, and accessing historical weather data.
Meanwhile, climatologists average daily weather data to create long-term climate
statistics. To do this, they need data from the whole world over periods of many decades.

670 Social Studies of Science 41(5)
Some climate data come from instruments and observing stations specifically designed
for climate studies. But the majority of data used by climatologists come from the
weather forecast system. Weather stations frequently change instruments, locations, and
observing techniques; over time, they may operate intermittently, change their proce-
dures, or even change nationality after political upheavals. Throughout the history of
meteorology, weather data from different parts of the Earth encountered friction at politi-
cal borders, institutional boundaries, and technical interfaces between national observing
systems. Many data either never reached central collectors, or reached them only in pro-
cessed forms that turned out to be riddled with errors. Therefore, climatologists regard
data from the forecast system as unstable. To incorporate these sources in ‘climate qual-
ity’ datasets, climate scientists recover their histories and adjust, analyze, and reanalyze
the observations, often down to the level of individual instrument readings. Similarly,
data from satellite instruments designed specifically for weather observation have been
commandeered to measure the temperature of the lower troposphere (through complex
data modeling), creating intense controversy over how such data should be processed
and understood (Edwards, 2010).
This data friction results in enormous expenditures of time, energy, and attention,
which can lead to other kinds of science friction as well. Take the so-called
‘Climategate’ controversy over emails and data stolen from the University of East
Anglia’s Climatic Research Unit (CRU) in November 2009. The uproar revolved
largely around how the CRU adjusted and corrected historical weather and climate
records in order to assemble a comprehensive global climate dataset.
1
The contro-
versy reflected divergent understandings of language and methodology between pro-
fessional climate scientists and the public. Or consider a recent poll showing that
virtually all US climate scientists regard global warming as an established fact, while
a large minority of weather forecasters remain skeptical – attitudes based largely in
the two groups’ differential experiences of data and data models (Maibach et al.,
2010; Oreskes, 2004).
Throughout the sciences, as computer power and computational methods improve,
a rapidly emerging ‘fourth paradigm’ of data-driven, interdisciplinary research is aug-
menting the existing paradigms of experimental, theoretical, and computational sci-
ence (Atkins and National Science Foundation Blue-Ribbon Advisory Panel on
Cyberinfrastructure, 2003; Bell et al., 2009; Hey et al., 2009). The ‘fourth paradigm’
brings science friction to the foreground. In principle, data collected by widely varying
fields can now be assembled and brought to bear upon each other, leading to entirely
new perspectives on ecology, Earth system science, medicine, epidemiology, and almost
any other area (National Research Council, 1997; O’Brien et al., 2004; Zimmerman,
2003). Many scientists would like to have this ability. Many science funders, supercom-
puter centers, and institutions such as national science academies would like it even
more. They believe that more data sharing will reduce redundancy, improve problem
solving, increase research velocity, and cut costs at the same time. And indeed, many
important examples of successful data sharing do exist. Yet in practice, science friction
can make interdisciplinary data sharing maddeningly difficult.
Science friction is in some respects a generic problem of human communication, known
both colloquially and formally as ‘common ground’ or ‘grounding’; of establishing mutual

Citations
More filters

科研数据共享的挑战 (The Conundrum of Sharing Research Data)

TL;DR: Four rationales for sharing data are examined, drawing examples from the sciences, social sciences, and humanities: to reproduce or to verify research, to make results of publicly funded research available to the public, to enable others to ask new questions of extant data, and to advance the state of research and innovation.
Journal ArticleDOI

The conundrum of sharing research data

TL;DR: In this article, the authors examined four rationales for sharing data, drawing examples from the sciences, social sciences, and humanities: (1) to reproduce or to verify research, (2) to make results of publicly funded research available to the public, (3) to enable others to ask new questions of extant data, and (4) to advance the state of research and innovation.
Posted Content

Scientific Data Management in the Coming Decade

TL;DR: Analyzing this data to find the subtle effects missed by previous studies requires algorithms that can simultaneously deal with huge datasets and that can find very subtle effects --- finding both needles in the haystack and finding very small haystacks that were undetected in previous measurements.
Journal ArticleDOI

Algorithms and their others: Algorithmic culture in context

TL;DR: Using Niklaus Wirth's 1975 formulation that “algorithms+ data structures’= programs” as a launching-off point, this paper examines how an algorithmic lens shapes the way in which the authors might inquire into contemporary digital culture.
Journal ArticleDOI

Stuxnet and the Limits of Cyber Warfare

TL;DR: The empirical facts of Stuxnet support an opposite interpretation; cyber capabilities can marginally enhance the power of stronger over weaker actors, the complexity of weaponization makes cyber offense less easy and defense more feasible than generally appreciated, and cyber options are most attractive when deterrence is intact.
References
More filters
Journal ArticleDOI

A simplest systematics for the organization of turn-taking for conversation

TL;DR: Turn-taking is used for the ordering of moves in games, for allocating political office, for regulating traffic at intersections, for the servicing of customers at business establishments, and for talking in interviews, meetings, debates, ceremonies, conversations.
Book

Science in action : how to follow scientists and engineers through society

Bruno Latour
TL;DR: In this article, the quandary of the fact-builder is explored in the context of science and technology in a laboratory setting, and the model of diffusion versus translation is discussed.

and Boundary Objects: Amateurs and Professionals in Berkeley's Museum of Vertebrate Zoology, 1907-39

TL;DR: A model of how one group of actors managed this tension between divergent viewpoints was presented, drawing on the work of amateurs, professionals, administrators and others connected to the Museum of Vertebrate Zoology at the University of California, Berkeley, during its early years.
Journal ArticleDOI

Institutional Ecology, `Translations' and Boundary Objects: Amateurs and Professionals in Berkeley's Museum of Vertebrate Zoology, 1907-39:

TL;DR: In this article, the authors present a model of how one group of actors managed the tension between divergent viewpoints and the need for generalizable findings in scientific work, and distinguish four types of boundary objects: repositories, ideal types, coincident boundaries and standardized forms.
Related Papers (5)
Frequently Asked Questions (5)
Q1. What are the contributions in this paper?

The authors propose an alternative view of metadata, focusing on its role in an ephemeral process of scientific communication, rather than as an enduring outcome or product. The authors report examples of highly useful, yet ad hoc, incomplete, loosely structured, and mutable, descriptions of data found in their ethnographic studies of several large projects in the environmental sciences. 

In service of their chief goal – accurate near-term forecasting – their priority is swift communication and constant improvement of observing and forecasting systems. 

in turn, can be coupled to models of ocean circulation, land surface processes, and sea ice formation to represent the entire Earth system. 

Throughout the history of meteorology, weather data from different parts of the Earth encountered friction at political borders, institutional boundaries, and technical interfaces between national observing systems. 

The data management tools intended to facilitate EML implementation proved unusable due to incompatibility with existing local practices and infrastructures.