scispace - formally typeset
Search or ask a question
Posted Content

Big Data for All: Privacy and User Control in the Age of Analytics

TL;DR: The importance of providing individuals with access to their data in usable format will let individuals share the wealth created by their information and incentivize developers to offer user-side features and applications harnessing the value of big data.
Abstract: We live in an age of “big data.” Data have become the raw material of production, a new source for immense economic and social value. Advances in data mining and analytics and the massive increase in computing power and data storage capacity have expanded by orders of magnitude the scope of information available for businesses and government. Data are now available for analysis in raw form, escaping the confines of structured databases and enhancing researchers’ abilities to identify correlations and conceive of new, unanticipated uses for existing information. In addition, the increasing number of people, devices, and sensors that are now connected by digital networks has revolutionized the ability to generate, communicate, share, and access data. Data creates enormous value for the world economy, driving innovation, productivity, efficiency and growth. At the same time, the “data deluge” presents privacy concerns which could stir a regulatory backlash dampening the data economy and stifling innovation. In order to craft a balance between beneficial uses of data and in individual privacy, policymakers must address some of the most fundamental concepts of privacy law, including the definition of “personally identifiable information”, the role of individual control, and the principles of data minimization and purpose limitation. This article emphasizes the importance of providing individuals with access to their data in usable format. This will let individuals share the wealth created by their information and incentivize developers to offer user-side features and applications harnessing the value of big data. Where individual access to data is impracticable, data are likely to be de-identified to an extent sufficient to diminish privacy concerns. In addition, organizations should be required to disclose their decisional criteria, since in a big data world it is often not the data but rather the inferences drawn from them that give cause for concern.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper makes three contributions to clarify the ethical importance of algorithmic mediation, including a prescriptive map to organise the debate, and assesses the available literature in order to identify areas requiring further work to develop the ethics of algorithms.
Abstract: In information societies, operations, decisions and choices previously left to humans are increasingly delegated to algorithms, which may advise, if not decide, about how data should be interpreted and what actions should be taken as a result. More and more often, algorithms mediate social processes, business transactions, governmental decisions, and how we perceive, understand, and interact among ourselves and with the environment. Gaps between the design and operation of algorithms and our understanding of their ethical implications can have severe consequences affecting individuals as well as groups and whole societies. This paper makes three contributions to clarify the ethical importance of algorithmic mediation. It provides a prescriptive map to organise the debate. It reviews the current discussion of ethical aspects of algorithms. And it assesses the available literature in order to identify areas requiring further work to develop the ethics of algorithms.

990 citations

Journal ArticleDOI
TL;DR: The review reveals that several opportunities are available for utilizing big data in smart cities; however, there are still many issues and challenges to be addressed to achieve better utilization of this technology.
Abstract: Many governments are considering adopting the smart city concept in their cities and implementing big data applications that support smart city components to reach the required level of sustainability and improve the living standards. Smart cities utilize multiple technologies to improve the performance of health, transportation, energy, education, and water services leading to higher levels of comfort of their citizens. This involves reducing costs and resource consumption in addition to more effectively and actively engaging with their citizens. One of the recent technologies that has a huge potential to enhance smart city services is big data analytics. As digitization has become an integral part of everyday life, data collection has resulted in the accumulation of huge amounts of data that can be used in various beneficial application domains. Effective analysis and utilization of big data is a key factor for success in many business and service domains, including the smart city domain. This paper reviews the applications of big data to support smart cities. It discusses and compares different definitions of the smart city and big data and explores the opportunities, challenges and benefits of incorporating big data applications for smart cities. In addition it attempts to identify the requirements that support the implementation of big data applications for smart city services. The review reveals that several opportunities are available for utilizing big data in smart cities; however, there are still many issues and challenges to be addressed to achieve better utilization of this technology.

682 citations


Cites background from "Big Data for All: Privacy and User ..."

  • ...This includes defining “personally identifiable information”, and the role of individual control [34]....

    [...]

Journal ArticleDOI
TL;DR: Several case studies of big data analytics applications in intelligent transportation systems, including road traffic accidents analysis, road traffic flow prediction, public transportation service plan, personal travel route plan, rail transportation management and control, and assets maintenance are introduced.
Abstract: Big data is becoming a research focus in intelligent transportation systems (ITS), which can be seen in many projects around the world. Intelligent transportation systems will produce a large amount of data. The produced big data will have profound impacts on the design and application of intelligent transportation systems, which makes ITS safer, more efficient, and profitable. Studying big data analytics in ITS is a flourishing field. This paper first reviews the history and characteristics of big data and intelligent transportation systems. The framework of conducting big data analytics in ITS is discussed next, where the data source and collection methods, data analytics methods and platforms, and big data analytics application categories are summarized. Several case studies of big data analytics applications in intelligent transportation systems, including road traffic accidents analysis, road traffic flow prediction, public transportation service plan, personal travel route plan, rail transportation management and control, and assets maintenance are introduced. Finally, this paper discusses some open challenges of using big data analytics in ITS.

627 citations


Cites background from "Big Data for All: Privacy and User ..."

  • ...To prevent unauthorized disclosure of the personal private information, governments should develop complete data privacy laws which include what data can be published, the scope of the data publishing and using, the basic principles of data distribution, data availability and other areas [157]....

    [...]

Proceedings ArticleDOI
21 Apr 2018
TL;DR: This work investigates how HCI researchers can help to develop accountable systems by performing a literature analysis of 289 core papers on explanations and explaina-ble systems, as well as 12,412 citing papers.
Abstract: Advances in artificial intelligence, sensors and big data management have far-reaching societal impacts. As these systems augment our everyday lives, it becomes increasing-ly important for people to understand them and remain in control. We investigate how HCI researchers can help to develop accountable systems by performing a literature analysis of 289 core papers on explanations and explaina-ble systems, as well as 12,412 citing papers. Using topic modeling, co-occurrence and network analysis, we mapped the research space from diverse domains, such as algorith-mic accountability, interpretable machine learning, context-awareness, cognitive psychology, and software learnability. We reveal fading and burgeoning trends in explainable systems, and identify domains that are closely connected or mostly isolated. The time is ripe for the HCI community to ensure that the powerful new autonomous systems have intelligible interfaces built-in. From our results, we propose several implications and directions for future research to-wards this goal.

539 citations


Cites background from "Big Data for All: Privacy and User ..."

  • ...Privacy issues due to the advent of big data analytics [147,166] and the contention between transparency and privacy is also an active area of investigation [38, 111]....

    [...]

Journal ArticleDOI
TL;DR: This study describes the value proposition of BDA by delineating its components, then illustrates the framework through BDA applications in practice, and presents a problem-oriented view of the framework—where problems in BDA components can give rise to targeted research questions and areas for future study.
Abstract: Despite the publicity regarding big data and analytics (BDA), the success rate of these projects and strategic value created from them are unclear. Most literature on BDA focuses on how it can be u...

449 citations


Cites background from "Big Data for All: Privacy and User ..."

  • ...While creating value, data deluge presents privacy concerns that may stir a regulatory backlash dampening the data economy and stifling innovation [56]....

    [...]

  • ...tion,” the role of individual control, and the principles of data minimization and purpose limitation [56]....

    [...]

References
More filters
Proceedings ArticleDOI
20 May 2012
TL;DR: In over 20% of cases, the classifiers can correctly identify an anonymous author given a corpus of texts from 100,000 authors; in about 35% of Cases the correct author is one of the top 20 guesses.
Abstract: We study techniques for identifying an anonymous author via linguistic stylometry, i.e., comparing the writing style against a corpus of texts of known authorship. We experimentally demonstrate the effectiveness of our techniques with as many as 100,000 candidate authors. Given the increasing availability of writing samples online, our result has serious implications for anonymity and free speech - an anonymous blogger or whistleblower may be unmasked unless they take steps to obfuscate their writing style. While there is a huge body of literature on authorship recognition based on writing style, almost none of it has studied corpora of more than a few hundred authors. The problem becomes qualitatively different at a large scale, as we show, and techniques from prior work fail to scale, both in terms of accuracy and performance. We study a variety of classifiers, both "lazy" and "eager," and show how to handle the huge number of classes. We also develop novel techniques for confidence estimation of classifier outputs. Finally, we demonstrate stylometric authorship recognition on texts written in different contexts. In over 20% of cases, our classifiers can correctly identify an anonymous author given a corpus of texts from 100,000 authors; in about 35% of cases the correct author is one of the top 20 guesses. If we allow the classifier the option of not making a guess, via confidence estimation we are able to increase the precision of the top guess from 20% to over 80% with only a halving of recall.

290 citations

Posted Content
TL;DR: In this article, the authors argue that the debate about data privacy protection should be grounded in an appreciation of the conditions necessary for individuals to develop and exercise autonomy in fact, and that meaningful autonomy requires a degree of freedom from monitoring, scrutiny, and categorization by others.
Abstract: In the United States, proposals for informational privacy have proved enormously controversial. On a political level, such proposals threaten powerful data processing interests. On a theoretical level, data processors and other data privacy opponents argue that imposing restrictions on the collection, use, and exchange of personal data would ignore established understandings of property, limit individual freedom of choice, violate principles of rational information use, and infringe data processors' freedom of speech. In this article, Professor Julie Cohen explores these theoretical challenges to informational privacy protection. She concludes that categorical arguments from property, choice, truth, and speech lack weight, and mask fundamentally political choices about the allocation of power over information, cost, and opportunity. Each debate, although couched in a rhetoric of individual liberty, effectively reduces individuals to objects of choices and trades made by others. Professor Cohen argues, instead, that the debate about data privacy protection should be grounded in an appreciation of the conditions necessary for individuals to develop and exercise autonomy in fact, and that meaningful autonomy requires a degree of freedom from monitoring, scrutiny, and categorization by others. The article concludes by calling for the design of both legal and technological tools for strong data privacy protection.

228 citations

Journal ArticleDOI
TL;DR: Knott and Tepper as discussed by the authors reviewed most of the German language material cited in this article and found that the concepts of public and private in various societies including 4th century B.C. Athens, ancient Hebrew society as reflected in the Old Testament, and ancient China at the age of the ''hundred philosophers,''.
Abstract: of Pennsylvania Law School on October 28, 1985. The University of Pennsylvania Law Review would like to thank Hermann Knott and Franz Tepper, 1987 LL.M. candidates at the University of Pennsylvania Law School, for reviewing most of the German language material cited in this article. t Professor of Civil and Labor Law, Johann Wolfgang Goethe-Universitit, Frankfurt am Main; Data Protection Commissioner, State of Hesse, Federal Republic of Germany. 1 See, e.g., Griswold v. Connecticut, 381 U.S. 479 (1965) (identifying zones of individual privacy guaranteed by the United States Constitution); Millar v. Taylor, 98 Eng. Rep. 201, 242 (K.B. 1769) (\"It is certain every man has a right to keep his own sentiments, if he pleases: he has certainly a right to judge whether he will make them public, or commit them only to the sight of his friends.\"); B. MOORE, PRIVACY: STUDIES IN SOCIAL AND CULTURAL HISTORY (1984) (examining the concepts of public and private in various societies including 4th century B.C. Athens, ancient Hebrew society as reflected in the Old Testament, and ancient China at the age of the \"hundred philosophers,\" 551 B.C. to 233 B.C.). See generally Warren & Brandeis, The Right to Privacy, 4 HARV. L. REV. 193 (1890) (tracing the development of protection for the individual's person and property and advocating recognition of the right to privacy and of remedies for its violation). , In American law, recent discussion of the individual's right to privacy has arisen in cases involving sexual and reproductive matters, see, e.g., Bowers v. Hardwick, 106 S. Ct. 2841 (1986); Roe v. Wade, 410 U.S. 113 (1973), and in cases concerning collection or disclosure of information, see, e.g., Department of State v. Washington Post Co., 456 U.S. 595 (1982); Nixon v. Administrator of Gen. Servs., 433 U.S. 425 (1977); Whalen v. Roe, 429 U.S. 589 (1977); United States v. Miller, 425 U.S. 435 (1976); Department of the Air Force v. Rose, 425 U.S. 352 (1976). Other cases have involved the individual's right as to personal appearance, Kelley v. Johnson, 425 U.S. 238 (1976), as well as the right of privacy with regard to publicity upon arrest, Paul v. Davis, 424 U.S. 693 (1976), with both cases illustrating the tendency to restrict privacy to family, sexual, and reproductive matters. As to the individual's right of privacy in the context of criminal investigation, \"[clases are legion that condemn violent searches and invasions of an individual's right to the privacy of his dwelling.\" Miller, 425 U.S. at 451 (Brennan, J., dissenting) (quoting Burrows v. Superior Court, 13 Cal. 3d 238, 247, 529 P.2d 590, 596, 118 Cal. Rptr. 166, 172 (1974)). As to statutes, see, e.g., Freedom of Information Act, 5 U.S.C. § 552 (1982), amended by 4 U.S.C. § 402(2) (Supp. III 1985 & West Supp. 1986); Privacy Act of 1974, 5 U.S.C. § 552a (1982), amended by 1 U.S.C. § 107(9) (Supp. III 1985 & West Supp. 1986); Privacy Act, B.C. Stat. ch. 39 (1968) (British Columbia); CODE CIVIL [C. Civ.] art. 9 (Fr.); Loi relative a l'informatique, aux fichiers et aux libert&s, Loi No. 7817 of 6 January 1978, 1978 Journal Officiel de la Rfpublique Francaise [J.O.1 227,

107 citations

Journal ArticleDOI
TL;DR: This work considers the definition and application of the EU 'personal data' concept in the context of anonymisation/pseudonymisation, encryption and data fragmentation in cloud computing, arguing that the definition should be based on the realistic risk of identification, and that the applicability of data protection rules should bebased on the risk of harm and its likely severity.
Abstract: Cloud computing service providers, even those based outside Europe, may become subject to the EU Data Protection Directive's extensive and complex regime purely through their customers' choices, of which they may have no knowledge or control. We consider the definition and application of the EU 'personal data' concept in the context of anonymisation/pseudonymisation, encryption and data fragmentation in cloud computing, arguing that the definition should be based on the realistic risk of identification, and that the applicability of data protection rules should be based on the risk of harm and its likely severity. In particular, the status of encryption and anonymisation/pseudonymisation procedures should be clarified to promote their use as privacy-enhancing techniques; data encrypted and secured to recognised standards should not be considered 'personal data' in the hands of those without access to the decryption key, such as many cloud computing providers; and finally, unlike, for example, social networking sites, Infrastructure as a Service and Platform as a Service providers (and certain Software as a Service providers) offer no more than utility infrastructure services, and may not even know if information processed using their services is 'personal data' (hence, the 'cloud of unknowing'), so it seems inappropriate for such cloud infrastructure providers to become arbitrarily subject to EU data protection regulation due to their customers' choices.

46 citations

01 Jan 2006

22 citations