scispace - formally typeset
Search or ask a question

Showing papers in "Big Data & Society in 2016"


Journal ArticleDOI
TL;DR: This paper makes three contributions to clarify the ethical importance of algorithmic mediation, including a prescriptive map to organise the debate, and assesses the available literature in order to identify areas requiring further work to develop the ethics of algorithms.
Abstract: In information societies, operations, decisions and choices previously left to humans are increasingly delegated to algorithms, which may advise, if not decide, about how data should be interpreted and what actions should be taken as a result. More and more often, algorithms mediate social processes, business transactions, governmental decisions, and how we perceive, understand, and interact among ourselves and with the environment. Gaps between the design and operation of algorithms and our understanding of their ethical implications can have severe consequences affecting individuals as well as groups and whole societies. This paper makes three contributions to clarify the ethical importance of algorithmic mediation. It provides a prescriptive map to organise the debate. It reviews the current discussion of ethical aspects of algorithms. And it assesses the available literature in order to identify areas requiring further work to develop the ethics of algorithms.

990 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider the issue of opacity as a problem for socially consequential mechanisms of classification and ranking, such as spam filters, credit card fraud detection, search engines, news, etc.
Abstract: This article considers the issue of opacity as a problem for socially consequential mechanisms of classification and ranking, such as spam filters, credit card fraud detection, search engines, news...

915 citations


Journal ArticleDOI
TL;DR: This paper considers the question ‘what makes Big Data, Big Data?’, applying Kitchin’s taxonomy of seven Big Data traits to 26 datasets drawn from seven domains, each of which is considered in the literature to constitute Big Data.
Abstract: Big Data has been variously defined in the literature. In the main, definitions suggest that Big Data possess a suite of key traits: volume, velocity and variety (the 3Vs), but also exhaustivity, resolution, indexicality, relationality, extensionality and scalability. However, these definitions lack ontological clarity, with the term acting as an amorphous, catch-all label for a wide selection of data. In this paper, we consider the question ‘what makes Big Data, Big Data?’, applying Kitchin’s taxonomy of seven Big Data traits to 26 datasets drawn from seven domains, each of which is considered in the literature to constitute Big Data. The results demonstrate that only a handful of datasets possess all seven traits, and some do not possess either volume and/or variety. Instead, there are multiple forms of Big Data. Our analysis reveals that the key definitional boundary markers are the traits of velocity and exhaustivity. We contend that Big Data as an analytical category needs to be unpacked, with the genus of Big Data further delineated and its various species identified. It is only through such ontological work that we will gain conceptual clarity about what constitutes Big Data, formulate how best to make sense of it, and identify how it might be best used to make sense of the world.

364 citations


Journal ArticleDOI
TL;DR: A review of several contentious cases of research harms in data science, including the 2014 Facebook emotional contagion study and the 2016 use of geographical data techniques to identify the pseudonymous artist Banksy, argues data science should be understood as continuous with social sciences in this regard.
Abstract: There are growing discontinuities between the research practices of data science and established tools of research ethics regulation. Some of the core commitments of existing research ethics regula...

276 citations


Journal ArticleDOI
TL;DR: Using Niklaus Wirth's 1975 formulation that “algorithms+ data structures’= programs” as a launching-off point, this paper examines how an algorithmic lens shapes the way in which the authors might inquire into contemporary digital culture.
Abstract: Algorithms, once obscure objects of technical art, have lately been subject to considerable popular and scholarly scrutiny. What does it mean to adopt the algorithm as an object of analytic attention? What is in view, and out of view, when we focus on the algorithm? Using Niklaus Wirth's 1975 formulation that “algorithms + data structures = programs” as a launching-off point, this paper examines how an algorithmic lens shapes the way in which we might inquire into contemporary digital culture.

241 citations


Journal ArticleDOI
TL;DR: Systematically tracing the digital revolution in agriculture, and charting the affordances as well as the limitations of Big Data applied to food and agriculture, should be a broad research goal for Big Data scholarship.
Abstract: Farming is undergoing a digital revolution. Our existing review of current Big Data applications in the agri-food sector has revealed several collection and analytics tools that may have implications for relationships of power between players in the food system (e.g. between farmers and large corporations). For example, Who retains ownership of the data generated by applications like Monsanto Corproation's Weed I.D. “app”? Are there privacy implications with the data gathered by John Deere's precision agricultural equipment? Systematically tracing the digital revolution in agriculture, and charting the affordances as well as the limitations of Big Data applied to food and agriculture, should be a broad research goal for Big Data scholarship. Such a goal brings data scholarship into conversation with food studies and it allows for a focus on the material consequences of big data in society.

240 citations


Journal ArticleDOI
TL;DR: In this introduction to the Big Data & Society CDS special theme, this concept that Big Data should be seen as always-already constituted within wider data assemblages is described.
Abstract: Critical Data Studies (CDS) explore the unique cultural, ethical, and critical challenges posed by Big Data. Rather than treat Big Data as only scientifically empirical and therefore largely neutral phenomena, CDS advocates the view that Big Data should be seen as always-already constituted within wider data assemblages. Assemblages is a concept that helps capture the multitude of ways that already-composed data structures inflect and interact with society, its organization and functioning, and the resulting impact on individuals’ daily lives. CDS questions the many assumptions about Big Data that permeate contemporary literature on information and society by locating instances where Big Data may be naively taken to denote objective and transparent informational entities. In this introduction to the Big Data & Society CDS special theme, we briefly describe CDS work, its orientations, and principles.

238 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigate the claims and complexities involved in the platform-based economics of health and fitness apps and examine a double-edged logic inscribed in these platforms, promising to offer personal solutions to medical problems while also contributing to the public good.
Abstract: This article investigates the claims and complexities involved in the platform-based economics of health and fitness apps. We examine a double-edged logic inscribed in these platforms, promising to offer personal solutions to medical problems while also contributing to the public good. On the one hand, online platforms serve as personalized data-driven services to their customers. On the other hand, they allegedly serve public interests, such as medical research or health education. In doing so, many apps employ a diffuse discourse, hinging on terms like ‘‘sharing,’’ ‘‘open,’’ and ‘‘reuse’’ when they talk about data extraction and distribution. The analytical approach we adopt in this article is situated at the nexus of science and technology studies, political economy, and the sociology of health and illness. The analysis concentrates on two aspects: datafication (the use and reuse of data) and commodification (a platform’s deployment of governance and business models). We apply these analytical categories to three specific platforms: 23andMe, PatientsLikeMe, and Parkinson mPower. The last section will connect these individual examples to the wider implications of health apps’ data flows, governance policies, and business models. Regulatory bodies commonly focus on the (medical) safety and security of apps, but pay scarce attention to health apps’ techno-economic governance. Who owns user-generated health data and who gets to benefit? We argue that it is important to reflect on the societal implications of health data markets. Governments have the duty to provide conceptual clarity in the grand narrative of transforming health care and health research.

175 citations


Journal ArticleDOI
TL;DR: In this paper, the authors argue that there is a significant ambiguity around this kind of anti-surveillance resistance in relation to broader activist practices, and critical responses to the Snowden leaks have been confined within particular expert communities.
Abstract: The Snowden leaks, first published in June 2013, provided unprecedented insights into the operations of state-corporate surveillance, highlighting the extent to which everyday communication is integrated into an extensive regime of control that relies on the ‘datafication’ of social life. Whilst such data-driven forms of governance have significant implications for citizenship and society, resistance to surveillance in the wake of the Snowden leaks has predominantly centred on techno-legal responses relating to the development and use of encryption and policy advocacy around privacy and data protection. Based on in-depth interviews with a range of social justice activists, we argue that there is a significant level of ambiguity around this kind of anti-surveillance resistance in relation to broader activist practices, and critical responses to the Snowden leaks have been confined within particular expert communities. Introducing the notion of ‘data justice’, we therefore go on to make the case that resistance to surveillance needs to be (re)conceptualized on terms that can address the implications of this data-driven form of governance in relation to broader social justice agendas. Such an approach is needed, we suggest, in light of a shift to surveillance capitalism in which the collection, use and analysis of our data increasingly comes to shape the opportunities and possibilities available to us and the kind of society we live in.

170 citations


Journal ArticleDOI
TL;DR: Boyd and Crawford as discussed by the authors discuss the stakes, ideas, responsibilities, and possibilities of critical data studies and explore what kinds of critical approaches to these topics, in theory and practice, could open and make available such approaches to a broader audience.
Abstract: In light of recent technological innovations and discourses around data and algorithmic analytics, scholars of many stripes are attempting to develop critical agendas and responses to these developments (boyd and Crawford 2012). In this mutual interview, three scholars discuss the stakes, ideas, responsibilities, and possibilities of critical data studies. The resulting dialog seeks to explore what kinds of critical approaches to these topics, in theory and practice, could open and make available such approaches to a broader audience.

146 citations


Journal ArticleDOI
TL;DR: It is shown that the process of creating Big Data from local and global sources of knowledge entails the transformation of information as it moves from one distinct group of contributors to the next, and locally based, affected people and often the original ‘crowd’ are excluded from the information flow.
Abstract: The aim of this paper is to critically explore whether crowdsourced Big Data enables an inclusive humanitarian response at times of crisis. We argue that all data, including Big Data, are socially constructed artefacts that reflect the contexts and processes of their creation. To support our argument, we qualitatively analysed the process of ‘Big Data making’ that occurred by way of crowdsourcing through open data platforms, in the context of two specific humanitarian crises, namely the 2010 earthquake in Haiti and the 2015 earthquake in Nepal. We show that the process of creating Big Data from local and global sources of knowledge entails the transformation of information as it moves from one distinct group of contributors to the next. The implication of this transformation is that locally based, affected people and often the original ‘crowd’ are excluded from the information flow, and from the interpretation process of crowdsourced crisis knowledge, as used by formal responding organizations, and are marginalized in their ability to benefit from Big Data in support of their own means. Our paper contributes a critical perspective to the debate on participatory Big Data, by explaining the process of in and exclusion during data making, towards more responsive humanitarian relief.

Journal ArticleDOI
TL;DR: This work considers how environmental data raises different concerns and possibilities in relation to Big Data, and suggests ways in which citizen datasets could generate different practices and interpretive insights that go beyond the usual uses of environmental data for regulation, compliance and modelling to generate expanded data citizenships.
Abstract: Citizen sensing, or the use of low-cost and accessible digital technologies to monitor environments, has contributed to new types of environmental data and data practices. Through a discussion of participatory research into air pollution sensing with residents of northeastern Pennsylvania concerned about the effects of hydraulic fracturing, we examine how new technologies for generating environmental data also give rise to new problems for analysing and making sense of citizen-gathered data. After first outlining the citizen data practices we collaboratively developed with residents for monitoring air quality, we then describe the data stories that we created along with citizens as a method and technique for composing data. We further mobilise the concept of ‘just good enough data’ to discuss the ways in which citizen data gives rise to alternative ways of creating, valuing and interpreting datasets. We specifically consider how environmental data raises different concerns and possibilities in relation to Big Data, which can be distinct from security or social media studies. We then suggest ways in which citizen datasets could generate different practices and interpretive insights that go beyond the usual uses of environmental data for regulation, compliance and modelling to generate expanded data citizenships.

Journal ArticleDOI
TL;DR: This article compares three free-use Twitter application programming interfaces for capturing tweets and enabling analysis and calls for critical social media data analytics combined with traditional, qualitative methods to address the developing ‘data gold rush.’
Abstract: Social media posts are full of potential for data mining and analysis. Recognizing this potential, platform providers increasingly restrict free access to such data. This shift provides new challen...

Journal ArticleDOI
TL;DR: The theoretical development of the data journeys methodology and the application of the approach on a project examining meteorological data on their journey from initial production through to being re-used in climate science and financial markets are discussed.
Abstract: In this paper, we discuss the development and piloting of a new methodology for illuminating the socio-material constitution of data objects and flows as data move between different sites of practice. The data journeys approach contributes to the development of critical, qualitative methodologies that can address the geographic and temporal scale of emerging knowledge infrastructures, and capture the ‘life of data’ from their initial generation through to re-use in different contexts. We discuss the theoretical development of the data journeys methodology and the application of the approach on a project examining meteorological data on their journey from initial production through to being re-used in climate science and financial markets. We then discuss three key conceptual findings from this project about: (1) the socio-material constitution of digital data objects, (2) ‘friction’ in the movement of data through space and time and (3) the mutability of digital data as a material property that contributes to driving the movement of data between different sites of practice.

Journal ArticleDOI
TL;DR: In this paper, the authors propose a two-by-two methodological model of social media analytics, combining two data collection strategies with two analytic modes, and demonstrate each of these four approaches in action to explain how and why they might be used to address various research questions.
Abstract: In the few years since the advent of ‘Big Data’ research, social media analytics has begun to accumulate studies drawing on social media as a resource and tool for research work. Yet, there has been relatively little attention paid to the development of methodologies for handling this kind of data. The few works that exist in this area often reflect upon the implications of ‘grand’ social science methodological concepts for new social media research (i.e. they focus on general issues such as sampling, data validity, ethics, etc.). By contrast, we advance an abductively oriented methodological suite designed to explore the construction of phenomena played out through social media. To do this, we use a software tool – Chorus – to illustrate a visual analytic approach to data. Informed by visual analytic principles, we posit a two-by-two methodological model of social media analytics, combining two data collection strategies with two analytic modes. We go on to demonstrate each of these four approaches ‘in action’, to help clarify how and why they might be used to address various research questions.

Journal ArticleDOI
TL;DR: The authors argue that regulators' faith in numbers can be attributed to a distinct political culture, a representative democracy undermined by pervasive public distrust and uncertainty, and demonstrate how the epistemological claims of Big Data science intersect with specific forms of trust, truth, and objectivity.
Abstract: Recently, there has been renewed interest in so-called evidence-based policy making. Enticed by the grand promises of Big Data, public officials seem increasingly inclined to experiment with more data-driven forms of governance. But while the rise of Big Data and related consequences has been a major issue of concern across different disciplines, attempts to develop a better understanding of the phenomenon's historical foundations have been rare. This short commentary addresses this gap by situating the current push for numerical evidence within a broader socio-political context, demonstrating how the epistemological claims of Big Data science intersect with specific forms of trust, truth, and objectivity. We conclude by arguing that regulators' faith in numbers can be attributed to a distinct political culture, a representative democracy undermined by pervasive public distrust and uncertainty.

Journal ArticleDOI
TL;DR: In this article, the authors argue that sociocultural theory may contribute to understandings of the relationship between humans and digital data, and employ the tropes of companion species, drawn from Haraway and eating data, from Mol, and demonstrate how these may be employed to theorise digital data.
Abstract: This commentary is an attempt to begin to identify and think through some of the ways in which sociocultural theory may contribute to understandings of the relationship between humans and digital data. I develop an argument that rests largely on the work of two scholars in the field of science and technology studies: Donna Haraway and Annemarie Mol. Both authors emphasised materiality and multiple ontologies in their writing. I argue that these concepts have much to offer critical data studies. I employ the tropes of companion species, drawn from Haraway, and eating data, from Mol, and demonstrate how these may be employed to theorise digital data–human assemblages.

Journal ArticleDOI
TL;DR: In this paper, critical data studies and related fields were used to investigate police officer-involved homicide data for Los Angeles County, and they frame police officer involved homicide data as a rheto...
Abstract: This paper draws from critical data studies and related fields to investigate police officer-involved homicide data for Los Angeles County. We frame police officer-involved homicide data as a rheto...

Journal ArticleDOI
TL;DR: Openness and transparency are becoming hallmarks of responsible data practice in science and governance as discussed by the authors, despite concerns about data falsification, erroneous analysis, and misleading presentation of researches.
Abstract: Openness and transparency are becoming hallmarks of responsible data practice in science and governance. Concerns about data falsification, erroneous analysis, and misleading presentation of resear...

Journal ArticleDOI
TL;DR: It is suggested that a profile-based approach can be used for identifying a core set of fake online social network users in a time-efficient manner.
Abstract: In online social networks, the audience size commanded by an organization or an individual is a critical measure of that entity’s popularity and this measure has important economic and/or political...

Journal ArticleDOI
Bart Custers1
TL;DR: This paper suggests expiry dates for consent, not to settle questions, but to put them on the table as a start for further discussion on this topic.
Abstract: The legal basis for processing personal data and some other types of Big Data is often the informed consent of the data subject involved. Many data controllers, such as social network sites, offer terms and conditions, privacy policies or similar documents to which a user can consent when registering as a user. There are many issues with such informed consent: people get too many consent requests to read everything, policy documents are often very long and difficult to understand and users feel they do not have a real choice anyway. Furthermore, in the context of Big Data refusing consent may not prevent predicting missing data. Finally, consent is usually asked for when registering, but rarely is consent renewed. As a result, consenting once often implies consent forever. At the same time, given the rapid changes in Big Data and data analysis, consent may easily get outdated (when earlier consent no longer reflects a user’s preferences). This paper suggests expiry dates for consent, not to settle questio...

Journal ArticleDOI
David Beer1
TL;DR: In this article, a sociologically informed history of Big Data is presented, where the authors argue that the term Big Data has the effect of making-up data and, as such, is powerful in framing our understanding of those data and the possibilities that they afford.
Abstract: Taking its lead from Ian Hacking’s article ‘How should we do the history of statistics?’, this article reflects on how we might develop a sociologically informed history of Big Data. It argues that within the history of social statistics we have a relatively well developed history of the material phenomenon of Big Data. Yet this article argues that we now need to take the concept of ‘Big Data’ seriously, there is a pressing need to explore the type of work that is being done by that concept. The article suggests a programme for work that explores the emergence of the concept of Big Data so as to track the institutional, organisational, political and everyday adoption of this term. It argues that the term Big Data has the effect of making-up data and, as such, is powerful in framing our understanding of those data and the possibilities that they afford.

Journal ArticleDOI
TL;DR: The paper examines some of the processes of the closely knit relationship between Google’s ideologies of neutrality and objectivity and global market dominance and reports from media specialising in search engine optimisation business are analysed.
Abstract: The paper examines some of the processes of the closely knit relationship between Google’s ideologies of neutrality and objectivity and global market dominance. Neutrality construction comprises an...

Journal ArticleDOI
TL;DR: The argues that debates in Critical Data Studies and philosophy of science have neglected the problem of error management and error detection, an especially important feature of the epistemology of Big Data.
Abstract: We address some of the epistemological challenges highlighted by the Critical Data Studies literature by reference to some of the key debates in the philosophy of science concerning computational modeling and simulation. We provide a brief overview of these debates focusing particularly on what Paul Humphreys calls epistemic opacity. We argue that debates in Critical Data Studies and philosophy of science have neglected the problem of error management and error detection. This is an especially important feature of the epistemology of Big Data. In “Error” section we explain the main characteristics of error detection and correction along with the relationship between error and path complexity in software. In this section we provide an overview of conventional statistical methods for error detection and review their limitations when faced with the high degree of conditionality inherent to modern software systems.

Journal ArticleDOI
TL;DR: This article considers how Spatial Big Data is situated and produced through embodied spatial experiences as data processes appear and act in small moments on mobile phone applications and other digital spatial technologies and argues that while spatial calculability has expanded from cartographic reason into data logics, the epistemological universality of SpatialBig Data is constantly being resisted.
Abstract: This article considers how Spatial Big Data is situated and produced through embodied spatial experiences as data processes appear and act in small moments on mobile phone applications and other di...

Journal ArticleDOI
TL;DR: Spatial Big Data has become a pervasive presence in the spaces and practices of everyday life as mentioned in this paper, be this natively geocoded content, geographical metadata, or data that itself refers to spaces and places.
Abstract: Spatial Big Data—be this natively geocoded content, geographical metadata, or data that itself refers to spaces and places—has become a pervasive presence in the spaces and practices of everyday li...

Journal ArticleDOI
TL;DR: In this article, the authors discuss how environmental Big Data is emerging as a parallel area of investigation within studies of Big Data, and how new practices, technologies, actors and issues are concretising that are distinct and specific to the operations of environmental data.
Abstract: While there are now an increasing number of studies that critically and rigorously engage with Big Data discourses and practices, these analyses often focus on social media and other forms of online data typically generated about users. This introduction discusses how environmental Big Data is emerging as a parallel area of investigation within studies of Big Data. New practices, technologies, actors and issues are concretising that are distinct and specific to the operations of environmental data. Situating these developments in relation to the seven contributions to this special collection, the introduction outlines significant characteristics of environmental data practices, data materialisations and data contestations. In these contributions, it becomes evident that processes for validating, distributing and acting on environmental data become key sites of materialisation and contestation, where new engagements with environmental politics and citizenship are worked through and realised.

Journal ArticleDOI
TL;DR: It is argued that it is essential for researchers to share research code, because code sharing enables the elements of reproducible research and enables results to be duplicated and therefore allows the accuracy and validity of analyses to be evaluated.
Abstract: Powerful new social science data resources are emerging. One particularly important source is administrative data, which were originally collected for organisational purposes but often contain info...

Journal ArticleDOI
TL;DR: In this paper, the authors explore the ways in which data centre operators are currently reconfiguring the systems of energy and heat supply in European capitals, replacing conventional forms of heating with da...
Abstract: This article explores the ways in which data centre operators are currently reconfiguring the systems of energy and heat supply in European capitals, replacing conventional forms of heating with da ...

Journal ArticleDOI
TL;DR: The authors examines the "aesthetic" and "prescient" turn in the surveillant assemblage and various ways in which risk technologies in local law enforcement are reshaping the post hoc traditions of the criminal justice system.
Abstract: This article examines the “aesthetic” and “prescient” turn in the surveillant assemblage and the various ways in which risk technologies in local law enforcement are reshaping the post hoc traditions of the criminal justice system. The rise of predictive policing and crime prevention software illustrate not only how the world of risk management solutions for public security is shifting from sovereign borders to inner-city streets but also how the practices of authorization are allowing software systems to become proxy forms of sovereign power. The article also examines how corporate strategies and law enforcement initiatives align themselves through media, connectivity, and consumer-oriented opt-in strategies that endeavor to “mold” and “deputize” ordinary individuals into obedient and patriotic citizens.