t-Closeness: Privacy Beyond k-Anonymity and l-Diversity

doi:10.1109/ICDE.2007.367856

Home
/
Papers
/
t-Closeness: Privacy Beyond k-Anonymity and l-Diversity

Proceedings Article•DOI•

t-Closeness: Privacy Beyond k-Anonymity and l-Diversity

Ninghui Li¹, Tiancheng Li¹, Suresh Venkatasubramanian²•Institutions (2)

Purdue University¹, AT&T Labs²

15 Apr 2007-pp 106-115

TL;DR: T-closeness as mentioned in this paper requires that the distribution of a sensitive attribute in any equivalence class is close to the distributions of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold t).

read less

Abstract: The k-anonymity privacy requirement for publishing microdata requires that each equivalence class (ie, a set of records that are indistinguishable from each other with respect to certain "identifying" attributes) contains at least k records Recently, several authors have recognized that k-anonymity cannot prevent attribute disclosure The notion of l-diversity has been proposed to address this; l-diversity requires that each equivalence class has at least l well-represented values for each sensitive attribute In this paper we show that l-diversity has a number of limitations In particular, it is neither necessary nor sufficient to prevent attribute disclosure We propose a novel privacy notion called t-closeness, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table (ie, the distance between the two distributions should be no more than a threshold t) We choose to use the earth mover distance measure for our t-closeness requirement We discuss the rationale for t-closeness and illustrate its advantages through examples and experiments

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Decentralizing Privacy: Using Blockchain to Protect Personal Data

[...]

Guy Zyskind¹, Oz Nathan², Alex Pentland¹•Institutions (2)

Massachusetts Institute of Technology¹, Tel Aviv University²

21 May 2015

TL;DR: A decentralized personal data management system that ensures users own and control their data is described, and a protocol that turns a block chain into an automated access-control manager that does not require trust in a third party is implemented.

...read moreread less

Abstract: The recent increase in reported incidents of surveillance and security breaches compromising users' privacy call into question the current model, in which third-parties collect and control massive amounts of personal data. Bit coin has demonstrated in the financial space that trusted, auditable computing is possible using a decentralized network of peers accompanied by a public ledger. In this paper, we describe a decentralized personal data management system that ensures users own and control their data. We implement a protocol that turns a block chain into an automated access-control manager that does not require trust in a third party. Unlike Bit coin, transactions in our system are not strictly financial -- they are used to carry instructions, such as storing, querying and sharing data. Finally, we discuss possible future extensions to block chains that could harness them into a well-rounded solution for trusted computing problems in society.

...read moreread less

1,953 citations

Cites background from "t-Closeness: Privacy Beyond k-Anony..."

...Related extensions to k-anonymity include l-diversity, which ensures the sensitive data is represented by a diverse enough set of possible values [15]; and t-closeness, which looks at the distribution of sensitive data [14]....
[...]

Journal Article•DOI•

Privacy-preserving data publishing: A survey of recent developments

[...]

Benjamin C. M. Fung¹, Ke Wang², Rui Chen¹, Philip S. Yu³•Institutions (3)

Concordia University¹, Simon Fraser University², University of Illinois at Chicago³

23 Jun 2010-ACM Computing Surveys

TL;DR: This survey will systematically summarize and evaluate different approaches to PPDP, study the challenges in practical data publishing, clarify the differences and requirements that distinguish P PDP from other related problems, and propose future research directions.

...read moreread less

Abstract: The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge- and information-based decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange and publication of data among various parties. Data in its original form, however, typically contains sensitive information about individuals, and publishing such data will violate individual privacy. The current practice in data publishing relies mainly on policies and guidelines as to what types of data can be published and on agreements on the use of published data. This approach alone may lead to excessive data distortion or insufficient protection. Privacy-preserving data publishing (PPDP) provides methods and tools for publishing useful information while preserving data privacy. Recently, PPDP has received considerable attention in research communities, and many approaches have been proposed for different data publishing scenarios. In this survey, we will systematically summarize and evaluate different approaches to PPDP, study the challenges in practical data publishing, clarify the differences and requirements that distinguish PPDP from other related problems, and propose future research directions.

...read moreread less

1,669 citations

Cites methods from "t-Closeness: Privacy Beyond k-Anony..."

...Machanavajjhala et al. [2006, 2007] modi.ed the bottom-up Incognito [LeFevre et al. 2005] to identify an optimal i-diverse table....
[...]
...The i-Diversity Incognito operates based on the generalization property, similar to Observation 5.2, that i-diversity is nondecreasing with respect to generalization....
[...]
...…[Machanavajjhala et al. 2007]; (a, k)-anonymity [Wong et al. 2006]; (k, e)-anonymity [Zhang et al. 2007]; personalized privacy [Xiao and Tao 2006b]; anatomy [Xiao and Tao 2006a]; tcloseness [Li et al. 2007]; m-invariance [Xiao and Tao 2007]; and (X, Y )-privacy [Wang and Fung 2006]....
[...]
...Although Incognito signi.cantly outperforms the binary search in ef.ciency [Samarati 2001], the complexity of all three algorithms, namely MinGen, binary search, and Incognito, increases exponentially with thesizeof QID....
[...]
...Incognito: Ef.cient full-domain k-anonymity....
[...]

Proceedings Article•DOI•

Adversarial machine learning

[...]

Ling Huang¹, Anthony D. Joseph², Blaine Nelson³, Benjamin I. P. Rubinstein⁴, J. D. Tygar² - Show less +1 more•Institutions (4)

Intel¹, University of California, Berkeley², University of Tübingen³, Microsoft⁴

21 Oct 2011

TL;DR: In this article, the authors discuss an emerging field of study: adversarial machine learning (AML), the study of effective machine learning techniques against an adversarial opponent, and give a taxonomy for classifying attacks against online machine learning algorithms.

...read moreread less

Abstract: In this paper (expanded from an invited talk at AISEC 2010), we discuss an emerging field of study: adversarial machine learning---the study of effective machine learning techniques against an adversarial opponent. In this paper, we: give a taxonomy for classifying attacks against online machine learning algorithms; discuss application-specific factors that limit an adversary's capabilities; introduce two models for modeling an adversary's capabilities; explore the limits of an adversary's knowledge about the algorithm, feature space, training, and input data; explore vulnerabilities in machine learning algorithms; discuss countermeasures against attacks; introduce the evasion challenge; and discuss privacy-preserving learning techniques.

...read moreread less

947 citations

Proceedings Article•DOI•

Secure kNN computation on encrypted databases

[...]

Wai Kit Wong¹, David W. Cheung¹, Ben Kao¹, Nikos Mamoulis¹•Institutions (1)

University of Hong Kong¹

29 Jun 2009

TL;DR: A new asymmetric scalar-product-preserving encryption (ASPE) that preserves a special type of scalar product and is shown to resist practical attacks of a different background knowledge level, at a different overhead cost.

...read moreread less

Abstract: Service providers like Google and Amazon are moving into the SaaS (Software as a Service) business. They turn their huge infrastructure into a cloud-computing environment and aggressively recruit businesses to run applications on their platforms. To enforce security and privacy on such a service model, we need to protect the data running on the platform. Unfortunately, traditional encryption methods that aim at providing "unbreakable" protection are often not adequate because they do not support the execution of applications such as database queries on the encrypted data. In this paper we discuss the general problem of secure computation on an encrypted database and propose a SCONEDB Secure Computation ON an Encrypted DataBase) model, which captures the execution and security requirements. As a case study, we focus on the problem of k-nearest neighbor (kNN) computation on an encrypted database. We develop a new asymmetric scalar-product-preserving encryption (ASPE) that preserves a special type of scalar product. We use APSE to construct two secure schemes that support kNN computation on encrypted data; each of these schemes is shown to resist practical attacks of a different background knowledge level, at a different overhead cost. Extensive performance studies are carried out to evaluate the overhead and the efficiency of the schemes.

...read moreread less

801 citations

Book•

Data Mining: The Textbook

[...]

Charu C. Aggarwal

27 Apr 2015

TL;DR: This textbook explores the different aspects of data mining from the fundamentals to the complex data types and their applications, capturing the wide diversity of problem domains for data mining issues.

...read moreread less

Abstract: This textbook explores the different aspects of data mining from the fundamentals to the complex data types and their applications, capturing the wide diversity of problem domains for data mining issues. It goes beyond the traditional focus on data mining problems to introduce advanced data types such as text, time series, discrete sequences, spatial data, graph data, and social networks. Until now, no single book has addressed all these topics in a comprehensive and integrated way. The chapters of this book fall into one of three categories: Fundamental chapters: Data mining has four main problems, which correspond to clustering, classification, association pattern mining, and outlier analysis. These chapters comprehensively discuss a wide variety of methods for these problems. Domain chapters: These chapters discuss the specific methods used for different domains of data such as text data, time-series data, sequence data, graph data, and spatial data. Application chapters: These chapters study important applications such as stream mining, Web mining, ranking, recommendations, social networks, and privacy preservation. The domain chapters also have an applied flavor. Appropriate for both introductory and advanced data mining courses, Data Mining: The Textbook balances mathematical details and intuition. It contains the necessary mathematical details for professors and researchers, but it is presented in a simple and intuitive style to improve accessibility for students and industrial practitioners (including those with a limited mathematical background). Numerous illustrations, examples, and exercises are included, with an emphasis on semantically interpretable examples. Praise for Data Mining: The Textbook - As I read through this book, I have already decided to use it in my classes. This is a book written by an outstanding researcher who has made fundamental contributions to data mining, in a way that is both accessible and up to date. The book is complete with theory and practical use cases. Its a must-have for students and professors alike!" -- Qiang Yang, Chair of Computer Science and Engineering at Hong Kong University of Science and Technology"This is the most amazing and comprehensive text book on data mining. It covers not only the fundamental problems, such as clustering, classification, outliers and frequent patterns, and different data types, including text, time series, sequences, spatial data and graphs, but also various applications, such as recommenders, Web, social network and privacy. It is a great book for graduate students and researchers as well as practitioners." -- Philip S. Yu, UIC Distinguished Professor and Wexler Chair in Information Technology at University of Illinois at Chicago

...read moreread less

716 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

On Information and Sufficiency

[...]

Solomon Kullback, R. A. Leibler

01 Mar 1951-Annals of Mathematical Statistics

16,176 citations

Book•

Network Flows: Theory, Algorithms, and Applications

[...]

Ravindra K. Ahuja¹, Thomas L. Magnanti², James B. Orlin²•Institutions (2)

Indian Institute of Technology Kanpur¹, Massachusetts Institute of Technology²

01 Jan 1993

TL;DR: In-depth, self-contained treatments of shortest path, maximum flow, and minimum cost flow problems, including descriptions of polynomial-time algorithms for these core models are presented.

...read moreread less

Abstract: A comprehensive introduction to network flows that brings together the classic and the contemporary aspects of the field, and provides an integrative view of theory, algorithms, and applications. presents in-depth, self-contained treatments of shortest path, maximum flow, and minimum cost flow problems, including descriptions of polynomial-time algorithms for these core models. emphasizes powerful algorithmic strategies and analysis tools such as data scaling, geometric improvement arguments, and potential function arguments. provides an easy-to-understand descriptions of several important data structures, including d-heaps, Fibonacci heaps, and dynamic trees. devotes a special chapter to conducting empirical testing of algorithms. features over 150 applications of network flows to a variety of engineering, management, and scientific domains. contains extensive reference notes and illustrations.

...read moreread less

8,496 citations

"t-Closeness: Privacy Beyond k-Anony..." refers methods in this paper

...One can calculate EMD using solutions to the transportation problem, such as a min-cost flow[1]; however, these algorithms do not provide an explicit formula....
[...]

Journal Article•DOI•

k -anonymity: a model for protecting privacy

[...]

Latanya Sweeney¹•Institutions (1)

Carnegie Mellon University¹

01 Oct 2002-International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems

TL;DR: The solution provided in this paper includes a formal protection model named k-anonymity and a set of accompanying policies for deployment and examines re-identification attacks that can be realized on releases that adhere to k- anonymity unless accompanying policies are respected.

...read moreread less

Abstract: Consider a data holder, such as a hospital or a bank, that has a privately held collection of person-specific, field structured data. Suppose the data holder wants to share a version of the data with researchers. How can a data holder release a version of its private data with scientific guarantees that the individuals who are the subjects of the data cannot be re-identified while the data remain practically useful? The solution provided in this paper includes a formal protection model named k-anonymity and a set of accompanying policies for deployment. A release provides k-anonymity protection if the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appears in the release. This paper also examines re-identification attacks that can be realized on releases that adhere to k- anonymity unless accompanying policies are respected. The k-anonymity protection model is important because it forms the basis on which the real-world systems known as Datafly, µ-Argus and k-Similar provide guarantees of privacy protection.

...read moreread less

7,925 citations

"t-Closeness: Privacy Beyond k-Anony..." refers background or methods in this paper

...Samarati and Sweeney [15, 16, 18] introduced the k-anonymity approach and used generalization and suppression techniques to preserve information truthfulness....
[...]
...To this end, Samarati and Sweeney [15, 16, 18] introduced k-anonymity as the property that each record is indistinguishable with at 1-4244-0803-2/07/$20.00 ©2007 IEEE....
[...]
...To this end, Samarati and Sweeney [15, 16, 18] introduced k-anonymity as the property that each record is indistinguishable with at...
[...]

Posted Content•

On Information and Sufficiency

[...]

Huaiyu Zhu

01 Feb 1997-Research Papers in Economics

TL;DR: The information deviation between any two finite measures cannot be increased by any statistical operations (Markov morphisms) and is invarient if and only if the morphism is sufficient for these two measures as mentioned in this paper.

...read moreread less

Abstract: The information deviation between any two finite measures cannot be increased by any statistical operations (Markov morphisms). It is invarient if and only if the morphism is sufficient for these two measures

...read moreread less

5,228 citations

"t-Closeness: Privacy Beyond k-Anony..." refers methods in this paper

...And the Kullback-Leibler (KL) distance [8] is defined as:...
[...]

Journal Article•DOI•

The Earth Mover's Distance as a Metric for Image Retrieval

[...]

Yossi Rubner¹, Carlo Tomasi¹, Leonidas J. Guibas¹•Institutions (1)

Stanford University¹

01 Nov 2000-International Journal of Computer Vision

TL;DR: This paper investigates the properties of a metric between two distributions, the Earth Mover's Distance (EMD), for content-based image retrieval, and compares the retrieval performance of the EMD with that of other distances.

...read moreread less

Abstract: We investigate the properties of a metric between two distributions, the Earth Mover's Distance (EMD), for content-based image retrieval. The EMD is based on the minimal cost that must be paid to transform one distribution into the other, in a precise sense, and was first proposed for certain vision problems by Peleg, Werman, and Rom. For image retrieval, we combine this idea with a representation scheme for distributions that is based on vector quantization. This combination leads to an image comparison framework that often accounts for perceptual similarity better than other previously proposed methods. The EMD is based on a solution to the transportation problem from linear optimization, for which efficient algorithms are available, and also allows naturally for partial matching. It is more robust than histogram matching techniques, in that it can operate on variable-length representations of the distributions that avoid quantization and other binning problems typical of histograms. When used to compare distributions with the same overall mass, the EMD is a true metric. In this paper we focus on applications to color and texture, and we compare the retrieval performance of the EMD with that of other distances.

...read moreread less

4,593 citations

"t-Closeness: Privacy Beyond k-Anony..." refers background or methods in this paper

...This requirement leads us to the the Earth Mover’s distance (EMD) [14], which is actually a Monge-Kantorovich transportation distance [5] in disguise....
[...]
...Further, in order to incorporate distances between values of sensitive attributes, we use the Earth Mover Distance metric [14] to measure the distance between the two distributions....
[...]