scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

t-Closeness: Privacy Beyond k-Anonymity and l-Diversity

15 Apr 2007-pp 106-115
TL;DR: T-closeness as mentioned in this paper requires that the distribution of a sensitive attribute in any equivalence class is close to the distributions of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold t).
Abstract: The k-anonymity privacy requirement for publishing microdata requires that each equivalence class (ie, a set of records that are indistinguishable from each other with respect to certain "identifying" attributes) contains at least k records Recently, several authors have recognized that k-anonymity cannot prevent attribute disclosure The notion of l-diversity has been proposed to address this; l-diversity requires that each equivalence class has at least l well-represented values for each sensitive attribute In this paper we show that l-diversity has a number of limitations In particular, it is neither necessary nor sufficient to prevent attribute disclosure We propose a novel privacy notion called t-closeness, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table (ie, the distance between the two distributions should be no more than a threshold t) We choose to use the earth mover distance measure for our t-closeness requirement We discuss the rationale for t-closeness and illustrate its advantages through examples and experiments

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
21 May 2015
TL;DR: A decentralized personal data management system that ensures users own and control their data is described, and a protocol that turns a block chain into an automated access-control manager that does not require trust in a third party is implemented.
Abstract: The recent increase in reported incidents of surveillance and security breaches compromising users' privacy call into question the current model, in which third-parties collect and control massive amounts of personal data. Bit coin has demonstrated in the financial space that trusted, auditable computing is possible using a decentralized network of peers accompanied by a public ledger. In this paper, we describe a decentralized personal data management system that ensures users own and control their data. We implement a protocol that turns a block chain into an automated access-control manager that does not require trust in a third party. Unlike Bit coin, transactions in our system are not strictly financial -- they are used to carry instructions, such as storing, querying and sharing data. Finally, we discuss possible future extensions to block chains that could harness them into a well-rounded solution for trusted computing problems in society.

1,953 citations


Cites background from "t-Closeness: Privacy Beyond k-Anony..."

  • ...Related extensions to k-anonymity include l-diversity, which ensures the sensitive data is represented by a diverse enough set of possible values [15]; and t-closeness, which looks at the distribution of sensitive data [14]....

    [...]

Journal ArticleDOI
TL;DR: This survey will systematically summarize and evaluate different approaches to PPDP, study the challenges in practical data publishing, clarify the differences and requirements that distinguish P PDP from other related problems, and propose future research directions.
Abstract: The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge- and information-based decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange and publication of data among various parties. Data in its original form, however, typically contains sensitive information about individuals, and publishing such data will violate individual privacy. The current practice in data publishing relies mainly on policies and guidelines as to what types of data can be published and on agreements on the use of published data. This approach alone may lead to excessive data distortion or insufficient protection. Privacy-preserving data publishing (PPDP) provides methods and tools for publishing useful information while preserving data privacy. Recently, PPDP has received considerable attention in research communities, and many approaches have been proposed for different data publishing scenarios. In this survey, we will systematically summarize and evaluate different approaches to PPDP, study the challenges in practical data publishing, clarify the differences and requirements that distinguish PPDP from other related problems, and propose future research directions.

1,669 citations


Cites methods from "t-Closeness: Privacy Beyond k-Anony..."

  • ...Machanavajjhala et al. [2006, 2007] modi.ed the bottom-up Incognito [LeFevre et al. 2005] to identify an optimal i-diverse table....

    [...]

  • ...The i-Diversity Incognito operates based on the generalization property, similar to Observation 5.2, that i-diversity is nondecreasing with respect to generalization....

    [...]

  • ...…[Machanava­jjhala et al. 2007]; (a, k)-anonymity [Wong et al. 2006]; (k, e)-anonymity [Zhang et al. 2007]; personalized privacy [Xiao and Tao 2006b]; anatomy [Xiao and Tao 2006a]; t­closeness [Li et al. 2007]; m-invariance [Xiao and Tao 2007]; and (X, Y )-privacy [Wang and Fung 2006]....

    [...]

  • ...Although Incognito signi.cantly out­performs the binary search in ef.ciency [Samarati 2001], the complexity of all three al­gorithms, namely MinGen, binary search, and Incognito, increases exponentially with thesizeof QID....

    [...]

  • ...Incognito: Ef.cient full-domain k-anonymity....

    [...]

Proceedings ArticleDOI
21 Oct 2011
TL;DR: In this article, the authors discuss an emerging field of study: adversarial machine learning (AML), the study of effective machine learning techniques against an adversarial opponent, and give a taxonomy for classifying attacks against online machine learning algorithms.
Abstract: In this paper (expanded from an invited talk at AISEC 2010), we discuss an emerging field of study: adversarial machine learning---the study of effective machine learning techniques against an adversarial opponent. In this paper, we: give a taxonomy for classifying attacks against online machine learning algorithms; discuss application-specific factors that limit an adversary's capabilities; introduce two models for modeling an adversary's capabilities; explore the limits of an adversary's knowledge about the algorithm, feature space, training, and input data; explore vulnerabilities in machine learning algorithms; discuss countermeasures against attacks; introduce the evasion challenge; and discuss privacy-preserving learning techniques.

947 citations

Proceedings ArticleDOI
29 Jun 2009
TL;DR: A new asymmetric scalar-product-preserving encryption (ASPE) that preserves a special type of scalar product and is shown to resist practical attacks of a different background knowledge level, at a different overhead cost.
Abstract: Service providers like Google and Amazon are moving into the SaaS (Software as a Service) business. They turn their huge infrastructure into a cloud-computing environment and aggressively recruit businesses to run applications on their platforms. To enforce security and privacy on such a service model, we need to protect the data running on the platform. Unfortunately, traditional encryption methods that aim at providing "unbreakable" protection are often not adequate because they do not support the execution of applications such as database queries on the encrypted data. In this paper we discuss the general problem of secure computation on an encrypted database and propose a SCONEDB Secure Computation ON an Encrypted DataBase) model, which captures the execution and security requirements. As a case study, we focus on the problem of k-nearest neighbor (kNN) computation on an encrypted database. We develop a new asymmetric scalar-product-preserving encryption (ASPE) that preserves a special type of scalar product. We use APSE to construct two secure schemes that support kNN computation on encrypted data; each of these schemes is shown to resist practical attacks of a different background knowledge level, at a different overhead cost. Extensive performance studies are carried out to evaluate the overhead and the efficiency of the schemes.

801 citations

Book
27 Apr 2015
TL;DR: This textbook explores the different aspects of data mining from the fundamentals to the complex data types and their applications, capturing the wide diversity of problem domains for data mining issues.
Abstract: This textbook explores the different aspects of data mining from the fundamentals to the complex data types and their applications, capturing the wide diversity of problem domains for data mining issues. It goes beyond the traditional focus on data mining problems to introduce advanced data types such as text, time series, discrete sequences, spatial data, graph data, and social networks. Until now, no single book has addressed all these topics in a comprehensive and integrated way. The chapters of this book fall into one of three categories: Fundamental chapters: Data mining has four main problems, which correspond to clustering, classification, association pattern mining, and outlier analysis. These chapters comprehensively discuss a wide variety of methods for these problems. Domain chapters: These chapters discuss the specific methods used for different domains of data such as text data, time-series data, sequence data, graph data, and spatial data. Application chapters: These chapters study important applications such as stream mining, Web mining, ranking, recommendations, social networks, and privacy preservation. The domain chapters also have an applied flavor. Appropriate for both introductory and advanced data mining courses, Data Mining: The Textbook balances mathematical details and intuition. It contains the necessary mathematical details for professors and researchers, but it is presented in a simple and intuitive style to improve accessibility for students and industrial practitioners (including those with a limited mathematical background). Numerous illustrations, examples, and exercises are included, with an emphasis on semantically interpretable examples. Praise for Data Mining: The Textbook - As I read through this book, I have already decided to use it in my classes. This is a book written by an outstanding researcher who has made fundamental contributions to data mining, in a way that is both accessible and up to date. The book is complete with theory and practical use cases. Its a must-have for students and professors alike!" -- Qiang Yang, Chair of Computer Science and Engineering at Hong Kong University of Science and Technology"This is the most amazing and comprehensive text book on data mining. It covers not only the fundamental problems, such as clustering, classification, outliers and frequent patterns, and different data types, including text, time series, sequences, spatial data and graphs, but also various applications, such as recommenders, Web, social network and privacy. It is a great book for graduate students and researchers as well as practitioners." -- Philip S. Yu, UIC Distinguished Professor and Wexler Chair in Information Technology at University of Illinois at Chicago

716 citations

References
More filters
Book
01 Jan 1993
TL;DR: In-depth, self-contained treatments of shortest path, maximum flow, and minimum cost flow problems, including descriptions of polynomial-time algorithms for these core models are presented.
Abstract: A comprehensive introduction to network flows that brings together the classic and the contemporary aspects of the field, and provides an integrative view of theory, algorithms, and applications. presents in-depth, self-contained treatments of shortest path, maximum flow, and minimum cost flow problems, including descriptions of polynomial-time algorithms for these core models. emphasizes powerful algorithmic strategies and analysis tools such as data scaling, geometric improvement arguments, and potential function arguments. provides an easy-to-understand descriptions of several important data structures, including d-heaps, Fibonacci heaps, and dynamic trees. devotes a special chapter to conducting empirical testing of algorithms. features over 150 applications of network flows to a variety of engineering, management, and scientific domains. contains extensive reference notes and illustrations.

8,496 citations


"t-Closeness: Privacy Beyond k-Anony..." refers methods in this paper

  • ...One can calculate EMD using solutions to the transportation problem, such as a min-cost flow[1]; however, these algorithms do not provide an explicit formula....

    [...]

Journal ArticleDOI
TL;DR: The solution provided in this paper includes a formal protection model named k-anonymity and a set of accompanying policies for deployment and examines re-identification attacks that can be realized on releases that adhere to k- anonymity unless accompanying policies are respected.
Abstract: Consider a data holder, such as a hospital or a bank, that has a privately held collection of person-specific, field structured data. Suppose the data holder wants to share a version of the data with researchers. How can a data holder release a version of its private data with scientific guarantees that the individuals who are the subjects of the data cannot be re-identified while the data remain practically useful? The solution provided in this paper includes a formal protection model named k-anonymity and a set of accompanying policies for deployment. A release provides k-anonymity protection if the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appears in the release. This paper also examines re-identification attacks that can be realized on releases that adhere to k- anonymity unless accompanying policies are respected. The k-anonymity protection model is important because it forms the basis on which the real-world systems known as Datafly, µ-Argus and k-Similar provide guarantees of privacy protection.

7,925 citations


"t-Closeness: Privacy Beyond k-Anony..." refers background or methods in this paper

  • ...Samarati and Sweeney [15, 16, 18] introduced the k-anonymity approach and used generalization and suppression techniques to preserve information truthfulness....

    [...]

  • ...To this end, Samarati and Sweeney [15, 16, 18] introduced k-anonymity as the property that each record is indistinguishable with at 1-4244-0803-2/07/$20.00 ©2007 IEEE....

    [...]

  • ...To this end, Samarati and Sweeney [15, 16, 18] introduced k-anonymity as the property that each record is indistinguishable with at...

    [...]

Posted Content
TL;DR: The information deviation between any two finite measures cannot be increased by any statistical operations (Markov morphisms) and is invarient if and only if the morphism is sufficient for these two measures as mentioned in this paper.
Abstract: The information deviation between any two finite measures cannot be increased by any statistical operations (Markov morphisms). It is invarient if and only if the morphism is sufficient for these two measures

5,228 citations


"t-Closeness: Privacy Beyond k-Anony..." refers methods in this paper

  • ...And the Kullback-Leibler (KL) distance [8] is defined as:...

    [...]

Journal ArticleDOI
TL;DR: This paper investigates the properties of a metric between two distributions, the Earth Mover's Distance (EMD), for content-based image retrieval, and compares the retrieval performance of the EMD with that of other distances.
Abstract: We investigate the properties of a metric between two distributions, the Earth Mover's Distance (EMD), for content-based image retrieval. The EMD is based on the minimal cost that must be paid to transform one distribution into the other, in a precise sense, and was first proposed for certain vision problems by Peleg, Werman, and Rom. For image retrieval, we combine this idea with a representation scheme for distributions that is based on vector quantization. This combination leads to an image comparison framework that often accounts for perceptual similarity better than other previously proposed methods. The EMD is based on a solution to the transportation problem from linear optimization, for which efficient algorithms are available, and also allows naturally for partial matching. It is more robust than histogram matching techniques, in that it can operate on variable-length representations of the distributions that avoid quantization and other binning problems typical of histograms. When used to compare distributions with the same overall mass, the EMD is a true metric. In this paper we focus on applications to color and texture, and we compare the retrieval performance of the EMD with that of other distances.

4,593 citations


"t-Closeness: Privacy Beyond k-Anony..." refers background or methods in this paper

  • ...This requirement leads us to the the Earth Mover’s distance (EMD) [14], which is actually a Monge-Kantorovich transportation distance [5] in disguise....

    [...]

  • ...Further, in order to incorporate distances between values of sensitive attributes, we use the Earth Mover Distance metric [14] to measure the distance between the two distributions....

    [...]