scispace - formally typeset
Topic

Data anonymization

About: Data anonymization is a(n) research topic. Over the lifetime, 735 publication(s) have been published within this topic receiving 39463 citation(s). The topic is also known as: anonymization & anonymisation.

...read more

Papers
  More

Journal ArticleDOI: 10.1142/S0218488502001648
Latanya Sweeney1Institutions (1)
Abstract: Consider a data holder, such as a hospital or a bank, that has a privately held collection of person-specific, field structured data. Suppose the data holder wants to share a version of the data with researchers. How can a data holder release a version of its private data with scientific guarantees that the individuals who are the subjects of the data cannot be re-identified while the data remain practically useful? The solution provided in this paper includes a formal protection model named k-anonymity and a set of accompanying policies for deployment. A release provides k-anonymity protection if the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appears in the release. This paper also examines re-identification attacks that can be realized on releases that adhere to k- anonymity unless accompanying policies are respected. The k-anonymity protection model is important because it forms the basis on which the real-world systems known as Datafly, µ-Argus and k-Similar provide guarantees of privacy protection.

...read more

Topics: k-anonymity (67%), Data anonymization (58%), Information privacy (58%) ...read more

7,135 Citations


Journal ArticleDOI: 10.1145/1217299.1217302
Abstract: Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In a k-anonymized dataset, each record is indistinguishable from at least k − 1 other records with respect to certain identifying attributes.In this article, we show using two simple attacks that a k-anonymized dataset has some subtle but severe privacy problems. First, an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. This is a known problem. Second, attackers often have background knowledge, and we show that k-anonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks, and we propose a novel and powerful privacy criterion called e-diversity that can defend against such attacks. In addition to building a formal foundation for e-diversity, we show in an experimental evaluation that e-diversity is practical and can be implemented efficiently.

...read more

Topics: Information privacy (66%), Privacy software (62%), t-closeness (59%) ...read more

3,549 Citations


Book ChapterDOI: 10.1007/11787006_1
Cynthia Dwork1Institutions (1)
10 Jul 2006-
Abstract: In 1977 Dalenius articulated a desideratum for statistical databases: nothing about an individual should be learnable from the database that cannot be learned without access to the database. We give a general impossibility result showing that a formalization of Dalenius' goal along the lines of semantic security cannot be achieved. Contrary to intuition, a variant of the result threatens the privacy even of someone not in the database. This state of affairs suggests a new measure, differential privacy, which, intuitively, captures the increased risk to one's privacy incurred by participating in a database. The techniques developed in a sequence of papers [8, 13, 3], culminating in those described in [12], can achieve any desired level of privacy under this measure. In many cases, extremely accurate information about the database can be provided while simultaneously ensuring very high levels of privacy

...read more

Topics: Differential privacy (58%), t-closeness (57%), Data anonymization (55%)

3,356 Citations


Open accessProceedings ArticleDOI: 10.1109/ICDE.2007.367856
15 Apr 2007-
Abstract: The k-anonymity privacy requirement for publishing microdata requires that each equivalence class (ie, a set of records that are indistinguishable from each other with respect to certain "identifying" attributes) contains at least k records Recently, several authors have recognized that k-anonymity cannot prevent attribute disclosure The notion of l-diversity has been proposed to address this; l-diversity requires that each equivalence class has at least l well-represented values for each sensitive attribute In this paper we show that l-diversity has a number of limitations In particular, it is neither necessary nor sufficient to prevent attribute disclosure We propose a novel privacy notion called t-closeness, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table (ie, the distance between the two distributions should be no more than a threshold t) We choose to use the earth mover distance measure for our t-closeness requirement We discuss the rationale for t-closeness and illustrate its advantages through examples and experiments

...read more

Topics: t-closeness (63%), Information privacy (54%), Equivalence class (52%) ...read more

2,897 Citations


Open accessProceedings ArticleDOI: 10.1109/ICDE.2006.1
03 Apr 2006-
Abstract: Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called \kappa-anonymity has gained popularity. In a \kappa-anonymized dataset, each record is indistinguishable from at least k—1 other records with respect to certain "identifying" attributes. In this paper we show with two simple attacks that a \kappa-anonymized dataset has some subtle, but severe privacy problems. First, we show that an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. Second, attackers often have background knowledge, and we show that \kappa-anonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks and we propose a novel and powerful privacy definition called \ell-diversity. In addition to building a formal foundation for \ell-diversity, we show in an experimental evaluation that \ell-diversity is practical and can be implemented efficiently.

...read more

  • Figure 5. Table T
    Figure 5. Table T
  • Figure 3. Notation used in the Proof of Theorem 3.1
    Figure 3. Notation used in the Proof of Theorem 3.1
  • Figure 4. 3-DiverseInpatient Microdata
    Figure 4. 3-DiverseInpatient Microdata
  • Figure 1. Inpatient Microdata
    Figure 1. Inpatient Microdata
  • Figure 14. Adults Database. Q = {age, gender, race, marital status, education }
    Figure 14. Adults Database. Q = {age, gender, race, marital status, education }
  • + 8

Topics: Information privacy (64%), Privacy software (60%), t-closeness (58%) ...read more

2,339 Citations


Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202155
202074
201988
201859
201762
201663

Top Attributes

Show by:

Topic's top 5 most impactful authors

Fabian Prasser

11 papers, 262 citations

Jordi Soria-Comas

10 papers, 317 citations

Klaus A. Kuhn

10 papers, 203 citations

Josep Domingo-Ferrer

8 papers, 302 citations

Florian Kohlmayer

7 papers, 212 citations

Network Information
Related Topics (5)
Knowledge extraction

20.2K papers, 413.4K citations

80% related
Authentication

74.7K papers, 867.1K citations

80% related
Hash function

31.5K papers, 538.5K citations

80% related
Encryption

98.3K papers, 1.4M citations

79% related
Query optimization

17.6K papers, 474.4K citations

79% related