scispace - formally typeset
Search or ask a question

Showing papers in "Big Data & Society in 2014"


Journal ArticleDOI
Rob Kitchin1
TL;DR: The authors examines how the availability of Big Data, coupled with new data analytics, challenges established epistemologies across the sciences, social sciences and humanities, and assesses the extent to which they are engendering paradigm shifts across multiple disciplines.
Abstract: This article examines how the availability of Big Data, coupled with new data analytics, challenges established epistemologies across the sciences, social sciences and humanities, and assesses the extent to which they are engendering paradigm shifts across multiple disciplines. In particular, it critically explores new forms of empiricism that declare ‘the end of theory’, the creation of data-driven rather than knowledge-driven science, and the development of digital humanities and computational social sciences that propose radically different ways to make sense of culture, history, economy and society. It is argued that: (1) Big Data and new data analytics are disruptive innovations which are reconfiguring in many instances how research is conducted; and (2) there is an urgent need for wider critical reflection within the academy on the epistemological implications of the unfolding data revolution, a task that has barely begun to be tackled despite the rapid changes in research practices presently taking place. After critically reviewing emerging epistemological positions, it is contended that a potentially fruitful approach would be the development of a situated, reflexive and contextually nuanced epistemology.

1,463 citations


Journal ArticleDOI
David Lyon1
TL;DR: Big Data intensifies certain surveillance trends associated with information technology and networks, and is thus implicated in fresh but fluid configurations, and the ethical turn becomes more urgent as a mode of critique.
Abstract: The Snowden revelations about National Security Agency surveillance, starting in 2013, along with the ambiguous complicity of internet companies and the international controversies that followed pr...

513 citations


Journal ArticleDOI
TL;DR: In this article, the authors discuss the ways Big Data impacts on ethical conceptions and how this will guide scientists, governments, and corporate agencies in handling Big Data, and the lack thereof of ethical choices.
Abstract: The speed of development in Big Data and associated phenomena, such as social media, has surpassed the capacity of the average consumer to understand his or her actions and their knock-on effects. We are moving towards changes in how ethics has to be perceived: away from individual decisions with specific and knowable outcomes, towards actions by many unaware that they may have taken actions with unintended consequences for anyone. Responses will require a rethinking of ethical choices, the lack thereof and how this will guide scientists, governments, and corporate agencies in handling Big Data. This essay elaborates on the ways Big Data impacts on ethical conceptions.

409 citations


Journal ArticleDOI
TL;DR: This article reviews the development of sophisticated ways to disseminate, integrate and re-use data acquired on model organisms over the last three decades of work in experimental biology and focuses on online databases as prominent infrastructures set up to organise and interpret such data.
Abstract: Is Big Data science a whole new way of doing research? And what difference does data quantity make to knowledge production strategies and their outputs? I argue that the novelty of Big Data science does not lie in the sheer quantity of data involved, but rather in (1) the prominence and status acquired by data as commodity and recognised output, both within and outside of the scientific community and (2) the methods, infrastructures, technologies, skills and knowledge developed to handle data. These developments generate the impression that data-intensive research is a new mode of doing science, with its own epistemology and norms. To assess this claim, one needs to consider the ways in which data are actually disseminated and used to generate knowledge. Accordingly, this article reviews the development of sophisticated ways to disseminate, integrate and re-use data acquired on model organisms over the last three decades of work in experimental biology. I focus on online databases as prominent infrastructures set up to organise and interpret such data and examine the wealth and diversity of expertise, resources and conceptual scaffolding that such databases draw upon. This illuminates some of the conditions under which Big Data needs to be curated to support processes of discovery across biological subfields, which in turn highlights the difficulties caused by the lack of adequate curation for the vast majority of data in the life sciences. In closing, I reflect on the difference that data quantity is making to contemporary biology, the methodological and epistemic challenges of identifying and analysing data given these developments, and the opportunities and worries associated with Big Data discourse and methods.

213 citations


Journal ArticleDOI
TL;DR: An original approach to text analysis that combines automatic extractions and manual selection of the key issue-terms is tested and offered some substantial contribution to the understanding of UN-framed climate diplomacy.
Abstract: This article proposes an original analysis of the international debate on climate change through the use of digital methods. Its originality is twofold. First, it examines a corpus of reports covering 18 years of international climate negotiations, a dataset never explored before through digital techniques. This corpus is particularly interesting because it provides the most consistent and detailed reporting of the negotiations of the United Nations Framework Convention on Climate Change. Second, in this paper we test an original approach to text analysis that combines automatic extractions and manual selection of the key issue-terms. Through this mixed approach, we tried to obtain relevant findings without imposing them on our corpus. The originality of our corpus and of our approach encouraged us to question some of the habits of digital research and confront three common misunderstandings about digital methods that we discuss in the first part of the article (section ‘Three misunderstandings on digital methods in social sciences’). In addition to reflecting on methodology, however, we also wanted to offer some substantial contribution to the understanding of UN-framed climate diplomacy. In the second part of the article (section ‘Three maps on climate negotiations’) we will therefore introduce some of the preliminary results of our analysis. By discussing three visualizations, we will analyze the thematic articulation of the climatic negotiations, the rise and fall of these themes over time and the visibility of different countries in the debate.

178 citations


Journal ArticleDOI
TL;DR: This brief paper offers a reflexive and critical reflection on what has become – much to the surprise of its authors – one of the most cited papers in the discipline of sociology in the last decade.
Abstract: 1. Roger Burrows[1][1] 2. Mike Savage[2][2] 1. 1Department of Sociology, Goldsmiths, University of London, London, UK 2. 2Department of Sociology, London School of Economics, London, UK 1. Roger Burrows, Department of Sociology, Goldsmiths, University of London, New Cross, London SE14 6NW, UK. Email: r.burrows{at}gold.ac.uk Google Trends reveals that at the time we were writing our article on ‘The Coming Crisis of Empirical Sociology’ in 2007 almost nobody was searching the internet for ‘Big Data’. It was only towards the very end of 2010 that the term began to register, just ahead of an explosion of interest from 2011 onwards. In this commentary we take the opportunity to reflect back on the claims we made in that original paper in light of more recent discussions about the social scientific implications of the inundation of digital data. Did our paper, with its emphasis on the emergence of, what we termed, ‘social transactional data’ and ‘digital byproduct data’ prefigure contemporary debates that now form the basis and rationale for this excellent new journal? Or was the paper more concerned with broader methodological, theoretical and political debates that have somehow been lost in all of the loud babble that has come to surround Big Data. Using recent work on the BBC Great British Class Survey as an example this brief paper offers a reflexive and critical reflection on what has become – much to the surprise of its authors – one of the most cited papers in the discipline of sociology in the last decade. [1]: #aff-1 [2]: #aff-2

164 citations


Journal ArticleDOI
TL;DR: The authors argue that an adequate response to the implications for governance raised by "Big Data" requires much more attention to agency and reflexivity than theories of "algorithmic power" have so far allowed.
Abstract: This short article argues that an adequate response to the implications for governance raised by ‘Big Data’ requires much more attention to agency and reflexivity than theories of ‘algorithmic power’ have so far allowed. It develops this through two contrasting examples: the sociological study of social actors used of analytics to meet their own social ends (for example, by community organisations) and the study of actors’ attempts to build an economy of information more open to civic intervention than the existing one (for example, in the environmental sphere). The article concludes with a consideration of the broader norms that might contextualise these empirical studies, and proposes that they can be understood in terms of the notion of voice, although the practical implementation of voice as a norm means that voice must sometimes be considered via the notion of transparency.

134 citations


Journal ArticleDOI
TL;DR: The recent Facebook study about emotional contagion has generated a high-profile debate about the ethical and social issues in Big Data research as mentioned in this paper, which is not unprecedented, but the debate is different from ours.
Abstract: The recent Facebook study about emotional contagion has generated a high-profile debate about the ethical and social issues in Big Data research. These issues are not unprecedented, but the debate ...

123 citations


Journal ArticleDOI
TL;DR: In this paper, the notion of the social as a spatial complex of outstincts is re-interpreted as a kind of spatial awareness, and the authors consider how cities might become aware as different kinds of sprites, channelling outspincts in spatially variable ways.
Abstract: The claim is frequently made that, as cities become loaded up with information and communications technology and a resultant profusion of data, so they are becoming sentient. But what might this mean? This paper offers some insights into this claim by, first of all, reworking the notion of the social as a spatial complex of ‘outstincts’. That makes it possible, secondly, to reconsider what a city which is aware of itself might look like, both by examining what kinds of technological practices are becoming commonplace and by considering the particular case of spatial awareness. In turn, this leads to a third rumination on how cities might become aware as different kinds of sprite, channelling outstincts in spatially variable ways. Whatever the case, it is clear that new technical-artistic interventions are required if these sprites are not to become simply servants of the security–entertainment complex. Some of these interventions are examined in the fourth part of the paper.

120 citations


Journal ArticleDOI
TL;DR: This paper presents a review of academic literature, policy documents from government organizations and international agencies, and reports from industries and popular media on the trends in Big Data utilization in key development issues and its worthwhileness, usefulness, and relevance and reviews the uses of Big Data in agriculture and farming activities in developing countries.
Abstract: This paper presents a review of academic literature, policy documents from government organizations and international agencies, and reports from industries and popular media on the trends in Big Data utilization in key development issues and its worthwhileness, usefulness, and relevance. By looking at Big Data deployment in a number of key economic sectors, it seeks to provide a better understanding of the opportunities and challenges of using it for addressing key issues facing the developing world. It reviews the uses of Big Data in agriculture and farming activities in developing countries to assess the capabilities required at various levels to benefit from Big Data. It also provides insights into how the current digital divide is associated with and facilitated by the pattern of Big Data diffusion and its effective use in key development areas. It also discusses the lessons that developing countries can learn from the utilization of Big Data in big corporations as well as in other activities in industrialized countries.

117 citations


Journal ArticleDOI
TL;DR: In this paper, the authors examine perspectives on Big Data across the discipline, the new types of data being used by researchers on economic issues, and the range of responses to this opportunity amongst economists.
Abstract: Although the terminology of Big Data has so far gained little traction in economics, the availability of unprecedentedly rich datasets and the need for new approaches – both epistemological and computational – to deal with them is an emerging issue for the discipline. Using interviews conducted with a cross-section of economists, this paper examines perspectives on Big Data across the discipline, the new types of data being used by researchers on economic issues, and the range of responses to this opportunity amongst economists. First, we outline the areas in which it is being used, including the prediction and ‘nowcasting’ of economic trends; mapping and predicting influence in the context of marketing; and acting as a cheaper or more accurate substitute for existing types of data such as censuses or labour market data. We then analyse the broader current and potential contributions of Big Data to economics, such as the ways in which econometric methodology is being used to shed light on questions beyond economics, how Big Data is improving or changing economic models, and the kinds of collaborations arising around Big Data between economists and other disciplines.

Journal ArticleDOI
TL;DR: Social physics is marked by the belief that large-scale statistical measurement of social variables reveals underlying relational patterns that can be explained by theories and laws found in natural science, and physics in particular.
Abstract: This paper examines one of the historical antecedents of Big Data, the social physics movement. Its origins are in the scientific revolution of the 17th century in Western Europe. But it is not named as such until the middle of the 19th century, and not formally institutionalized until another hundred years later when it is associated with work by George Zipf and John Stewart. Social physics is marked by the belief that large-scale statistical measurement of social variables reveals underlying relational patterns that can be explained by theories and laws found in natural science, and physics in particular. This larger epistemological position is known as monism, the idea that there is only one set of principles that applies to the explanation of both natural and social worlds. Social physics entered geography through the work of the mid-20th-century geographer William Warntz, who developed his own spatial version called ‘‘macrogeography.’’ It involved the computation of large data sets, made ever easier with the contemporaneous development of the computer, joined with the gravitational potential model. Our argument is that Warntz’s concerns with numeracy, large data sets, machine-based computing power, relatively simple mathematical formulas drawn from natural science, and an isomorphism between natural and social worlds became grounds on which Big Data later staked its claim to knowledge; it is a past that has not yet passed.

Journal ArticleDOI
TL;DR: In this paper, the authors reflect on the disciplinary contours of contemporary sociology, and social science more generally, in the age of big and broad social data and suggest how sociology and social sciences may respond to the challenges and opportunities presented by this "data deluge" in ways that are innovative yet sensitive to the social and ethical life of data and methods.
Abstract: In this paper, we reflect on the disciplinary contours of contemporary sociology, and social science more generally, in the age of ‘big and broad’ social data. Our aim is to suggest how sociology and social sciences may respond to the challenges and opportunities presented by this ‘data deluge’ in ways that are innovative yet sensitive to the social and ethical life of data and methods. We begin by reviewing relevant contemporary methodological debates and consider how they relate to the emergence of big and broad social data as a product, reflexive artefact and organizational feature of emerging global digital society. We then explore the challenges and opportunities afforded to social science through the widespread adoption of a new generation of distributed, digital technologies and the gathering momentum of the open data movement, grounding our observations in the work of the Collaborative Online Social Media ObServatory (COSMOS) project. In conclusion, we argue that these challenges and opportunities motivate a renewed interest in the programme for a ‘public sociology’, characterized by the co-production of social scientific knowledge involving a broad range of actors and publics.

Journal ArticleDOI
TL;DR: The rise of Big Data changes the context in which organisations producing official statistics operate, and the role of statistical institutes in the provision of high-quality and impartial statistical information to society may change.
Abstract: The rise of Big Data changes the context in which organisations producing official statistics operate. Big Data provides opportunities, but in order to make optimal use of Big Data, a number of challenges have to be addressed. This stimulates increased collaboration between National Statistical Institutes, Big Data holders, businesses and universities. In time, this may lead to a shift in the role of statistical institutes in the provision of high-quality and impartial statistical information to society. In this paper, the changes in context, the opportunities, the challenges and the way to collaborate are addressed. The collaboration between the various stakeholders will involve each partner building on and contributing different strengths. For national statistical offices, traditional strengths include, on the one hand, the ability to collect data and combine data sources with statistical products and, on the other hand, their focus on quality, transparency and sound methodology. In the Big Data era of competing and multiplying data sources, they continue to have a unique knowledge of official statistical production methods. And their impartiality and respect for privacy as enshrined in law uniquely position them as a trusted third party. Based on this, they may advise on the quality and validity of information of various sources. By thus positioning themselves, they will be able to play their role as key information providers in a changing society.

Journal ArticleDOI
TL;DR: It is argued that a key factor that distinguishes “Big Data” from “lots of data” lies in changes to the traditional, well-established “control zones” that facilitated clear provenance of scientific data, thereby ensuring data integrity and providing the foundation for credible science.
Abstract: Despite all the attention to Big Data and the claims that it represents a ‘‘paradigm shift’’ in science, we lack understanding about what are the qualities of Big Data that may contribute to this revolutionary impact. In this paper, we look beyond the quantitative aspects of Big Data (i.e. lots of data) and examine it from a sociotechnical perspective. We argue that a key factor that distinguishes ‘‘Big Data’’ from ‘‘lots of data’’ lies in changes to the traditional, well-established ‘‘control zones’’ that facilitated clear provenance of scientific data, thereby ensuring data integrity and providing the foundation for credible science. The breakdown of these control zones is a consequence of the manner in which our network technology and culture enable and encourage open, anonymous sharing of information, participation regardless of expertise, and collaboration across geographic, disciplinary, and institutional barriers. We are left with the conundrum—how to reap the benefits of Big Data while re-creating a trust fabric and an accountable chain of responsibility that make credible science possible.

Journal ArticleDOI
TL;DR: It is hypothesize that social relations, as objects of knowledge, depend crucially on the type of measurement device deployed, and expects new interferences and polyphonies to arise at the intersection of Big and Small Data, provided that these are mixed with care.
Abstract: The rise of Big Data in the social realm poses significant questions at the intersection of science, technology, and society, including in terms of how new large-scale social databases are currently changing the methods, epistemologies, and politics of social science. In this commentary, we address such epochal (‘‘large-scale’’) questions by way of a (situated) experiment: at the Danish Technical University in Copenhagen, an interdisciplinary group of computer scientists, physicists, economists, sociologists, and anthropologists (including the authors) is setting up a large-scale data infrastructure, meant to continually record the digital traces of social relations among an entire freshman class of students (N >1000). At the same time, fieldwork is carried out on friendship (and other) relations amongst the same group of students. On this basis, the question we pose is the following: what kind of knowledge is obtained on this social micro-cosmos via the Big (computational, quantitative) and Small (embodied, qualitative) Data, respectively? How do the two relate? Invoking Bohr’s principle of complementarity as analogy, we hypothesize that social relations, as objects of knowledge, depend crucially on the type of measurement device deployed. At the same time, however, we also expect new interferences and polyphonies to arise at the intersection of Big and Small Data, provided that these are, so to speak, mixed with care. These questions, we stress, are important not only for the future of social science methods but also for the type of societal (self-)knowledge that may be expected from new large-scale social databases.

Journal ArticleDOI
TL;DR: This essay makes the case for choosing to examine small subsets of Big Data datasets—making big data small and encourages researchers to embrace an ethical, empirical and epistemological stance on Big Data that includes minorities and outliers as reference categories, rather than the exceptions to statistical norms.
Abstract: In this essay, I make the case for choosing to examine small subsets of Big Data datasets—making big data small. Big Data allows us to produce summaries of human behavior at a scale never before possible. But in the push to produce these summaries, we risk losing sight of a secondary but equally important advantage of Big Data—the plentiful representation of minorities. Women, minorities and statistical outliers have historically been omitted from the scientific record, with problematic consequences. Big Data affords the opportunity to remedy those omissions. However, to do so, Big Data researchers must choose to examine very small subsets of otherwise large datasets. I encourage researchers to embrace an ethical, empirical and epistemological stance on Big Data that includes minorities and outliers as reference categories, rather than the exceptions to statistical norms.

Journal ArticleDOI
TL;DR: Drawing on the political economy of information which explains why the online industry fails to self-regulate, resulting in increasingly insidious web-tracking technologies, online users are left with few alternatives but to enter into unconscionable contracts about the extraction of their personal data when using the Internet for private purposes.
Abstract: Big Data enhances the possibilities for storing personal data extracted from social media and web search on an unprecedented scale. This paper draws on the political economy of information which explains why the online industry fails to self-regulate, resulting in increasingly insidious web-tracking technologies. Content analysis of historical blogs and request for comments on HTTP cookies published by the Internet Engineering Task Force illustrates how cookie technology was introduced in the mid-1990s, amid stark warnings about increased system vulnerabilities and deceptive personal data extractions. In conclusion, online users today are left with few alternatives but to enter into unconscionable contracts about the extraction of their personal data when using the Internet for private purposes.

Journal ArticleDOI
TL;DR: This article relates how visual materials created within social media platforms manifest distinct modes of knowledge production and acquisition and illuminates some of the conditions, challenges, and tensions between former visual structures and current ones, and unfolds the cultural significations of contemporary big visual data.
Abstract: How do the organization and presentation of large-scale social media images recondition the process by which visual knowledge, value, and meaning are made in contemporary conditions? Analyzing fundamental elements in the changing syntax of existing visual software ontology—the ways current social media platforms and aggregators organize and categorize social media images—this article relates how visual materials created within social media platforms manifest distinct modes of knowledge production and acquisition First, I analyze the structure of social media images within data streams as opposed to previous information organization in a structured database While the database has no predefined notions of time and thus challenges traditional linear forms, the data stream re-emphasizes the linearity of a particular data sequence and activates a set of new relations to contemporary temporalities Next, I show how these visual arrangements and temporal principles are manifested and discussed in three artworks: ‘‘Untitled’’ (Perfect Lovers) by

Journal ArticleDOI
TL;DR: In the past three years, an ethnographer and now a PhD student has worked on ad hoc collaborative projects around Wikipedia sources with two data scientists from Minnesota, Dave Musicant and Shilad Sen as discussed by the authors.
Abstract: In the past three years, Heather Ford—an ethnographer and now a PhD student—has worked on ad hoc collaborative projects around Wikipedia sources with two data scientists from Minnesota, Dave Musicant and Shilad Sen. In this essay, she talks about how the three met, how they worked together, and what they gained from the experience. Three themes became apparent through their collaboration: that data scientists and ethnographers have much in common, that their skills are complementary, and that discovering the data together rather than compartmentalizing research activities was key to their success.

Journal ArticleDOI
TL;DR: In this article, the authors present an overview of a year-long project to examine what the abundance of data and proliferation of data-making methods mean for the ordinary person, the person on the street, and what could they come to mean.
Abstract: What does the abundance of data and proliferation of data-making methods mean for the ordinary person, the person on the street? And, what could they come to mean? In this paper, we present an overview of a year-long project to examine just such questions and complicate, in some ways, what it is to ask them. The project is a collective exercise in which we – a mixture of social scientists, designers and makers – and those living and working on one street in Cambridge (UK), Tenison Road, are working to think through how data might be materialised and come to matter. The project aims to better understand the specificities and contingencies that arise when data is produced and used in place. Mid-way through the project, we use this commentary to give some background to the work and detail one or two of the troubles we have encountered in putting locally relevant data to work. We also touch on a methodological standpoint we are working our way into and through, one that we hope complicates the separations between subject and object in datamaking and opens up possibilities for a generative refiguring of the manifold relations.

Journal ArticleDOI
TL;DR: Through the investigation and experimental case study in the growing field of social Twitter analytics, it is found that not only are solutions like Cloudera’s Hadoop feasible, but that they can also enable robust, deep, and fruitful research outcomes in a variety of use-case scenarios across the disciplines.
Abstract: Though full of promise, Big Data research success is often contingent on access to the newest, most advanced, and often expensive hardware systems and the expertise needed to build and implement such systems. As a result, the accessibility of the growing number of Big Data-capable technology solutions has often been the preserve of business analytics. Pay as you store/process services like Amazon Web Services have opened up possibilities for smaller scale Big Data projects. There is high demand for this type of research in the digital humanities and digital sociology, for example. However, scholars are increasingly finding themselves at a disadvantage as available data sets of interest continue to grow in size and complexity. Without a large amount of funding or the ability to form interdisciplinary partnerships, only a select few find themselves in the position to successfully engage Big Data. This article identifies several notable and popular Big Data technologies typically implemented using large and ...

Journal ArticleDOI
Prabhakar Raghavan1
TL;DR: The social sciences are at a remarkable confluence of events, and advances in computing have made it feasible to analyze data at the scale of the population of the world, but can they be combined with the scale and robustness of statistics and computer science?
Abstract: The social sciences are at a remarkable confluence of events. Advances in computing have made it feasible to analyze data at the scale of the population of the world. How can we combine the depth of inquiry in the social sciences with the scale and robustness of statistics and computer science? Can we decompose complex questions in the social sciences into simpler, more robustly testable hypotheses? We discuss these questions and the role of machine learning in the social sciences.

Journal ArticleDOI
TL;DR: This paper examines how large numbers were adjudicated in social decision-making in the Académie des sciences, Paris, when a dispute arose among French urologic surgeons about the importance of large numbers in surgical science.
Abstract: 1. Dennis J Mazur 1. Oregon Health and Science University, USA 1. Dennis J Mazur, Center for Ethics in Health Care, Mailcode: UHN-86, Oregon Health and Science University, 3181 S.W. Sam Jackson Park Rd., Portland, OR 97239-3098, USA. Email: mzrdj11{at}gmail.com “Big Data” in health and medicine in the 21st century differs from “Big Data” used in health and medicine in the 1700s and 1800s. However, the old data sets share one key component: large numbers. The term “Big Data” is not synonymous with large numbers. Large numbers are a key component of Big Data in health and medicine, both for understanding the full range of how a disease presents in a human for diagnosis, and for understanding if one treatment of a disease is better than another treatment or better than just leaving the patient on his or her own without therapy. In this paper, we examine the first considerations of Big Data in medicine in Paris in the early 1800s when urologic surgeon Jean Civiale collected the first large numbers. Civiale collected the large numbers to defend the efficacy of his urologic instrument, the lithotrite, and the surgical procedure he developed, lithotrity, for the removal of bladder stones compared with earlier, more invasive surgical approaches. We examine how large numbers were adjudicated in social decision-making in the Academie des sciences, Paris, when a dispute arose among French urologic surgeons about the importance of large numbers in surgical science. After Civiale’s successful defense of his instrument and procedure in Paris, we examine how his approach to Big Data (large numbers) impacted data collection by George Buchanan in his use of the procedure at the Royal Hospital Infirmary in Glasgow.