scispace - formally typeset
Topic

k-anonymity

About: k-anonymity is a(n) research topic. Over the lifetime, 649 publication(s) have been published within this topic receiving 39151 citation(s).

...read more

Papers
  More

Journal ArticleDOI: 10.1142/S0218488502001648
Latanya Sweeney1Institutions (1)
Abstract: Consider a data holder, such as a hospital or a bank, that has a privately held collection of person-specific, field structured data. Suppose the data holder wants to share a version of the data with researchers. How can a data holder release a version of its private data with scientific guarantees that the individuals who are the subjects of the data cannot be re-identified while the data remain practically useful? The solution provided in this paper includes a formal protection model named k-anonymity and a set of accompanying policies for deployment. A release provides k-anonymity protection if the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appears in the release. This paper also examines re-identification attacks that can be realized on releases that adhere to k- anonymity unless accompanying policies are respected. The k-anonymity protection model is important because it forms the basis on which the real-world systems known as Datafly, µ-Argus and k-Similar provide guarantees of privacy protection.

...read more

Topics: k-anonymity (67%), Data anonymization (58%), Information privacy (58%) ...read more

7,135 Citations


Journal ArticleDOI: 10.1145/1217299.1217302
Abstract: Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In a k-anonymized dataset, each record is indistinguishable from at least k − 1 other records with respect to certain identifying attributes.In this article, we show using two simple attacks that a k-anonymized dataset has some subtle but severe privacy problems. First, an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. This is a known problem. Second, attackers often have background knowledge, and we show that k-anonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks, and we propose a novel and powerful privacy criterion called e-diversity that can defend against such attacks. In addition to building a formal foundation for e-diversity, we show in an experimental evaluation that e-diversity is practical and can be implemented efficiently.

...read more

Topics: Information privacy (66%), Privacy software (62%), t-closeness (59%) ...read more

3,549 Citations


Open accessProceedings ArticleDOI: 10.1109/ICDE.2007.367856
15 Apr 2007-
Abstract: The k-anonymity privacy requirement for publishing microdata requires that each equivalence class (ie, a set of records that are indistinguishable from each other with respect to certain "identifying" attributes) contains at least k records Recently, several authors have recognized that k-anonymity cannot prevent attribute disclosure The notion of l-diversity has been proposed to address this; l-diversity requires that each equivalence class has at least l well-represented values for each sensitive attribute In this paper we show that l-diversity has a number of limitations In particular, it is neither necessary nor sufficient to prevent attribute disclosure We propose a novel privacy notion called t-closeness, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table (ie, the distance between the two distributions should be no more than a threshold t) We choose to use the earth mover distance measure for our t-closeness requirement We discuss the rationale for t-closeness and illustrate its advantages through examples and experiments

...read more

Topics: t-closeness (63%), Information privacy (54%), Equivalence class (52%) ...read more

2,897 Citations


Open accessProceedings ArticleDOI: 10.1109/ICDE.2006.1
03 Apr 2006-
Abstract: Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called \kappa-anonymity has gained popularity. In a \kappa-anonymized dataset, each record is indistinguishable from at least k—1 other records with respect to certain "identifying" attributes. In this paper we show with two simple attacks that a \kappa-anonymized dataset has some subtle, but severe privacy problems. First, we show that an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. Second, attackers often have background knowledge, and we show that \kappa-anonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks and we propose a novel and powerful privacy definition called \ell-diversity. In addition to building a formal foundation for \ell-diversity, we show in an experimental evaluation that \ell-diversity is practical and can be implemented efficiently.

...read more

  • Figure 5. Table T
    Figure 5. Table T
  • Figure 3. Notation used in the Proof of Theorem 3.1
    Figure 3. Notation used in the Proof of Theorem 3.1
  • Figure 4. 3-DiverseInpatient Microdata
    Figure 4. 3-DiverseInpatient Microdata
  • Figure 1. Inpatient Microdata
    Figure 1. Inpatient Microdata
  • Figure 14. Adults Database. Q = {age, gender, race, marital status, education }
    Figure 14. Adults Database. Q = {age, gender, race, marital status, education }
  • + 8

Topics: Information privacy (64%), Privacy software (60%), t-closeness (58%) ...read more

2,339 Citations


Journal ArticleDOI: 10.1109/69.971193
Pierangela Samarati1Institutions (1)
Abstract: Today's globally networked society places great demands on the dissemination and sharing of information. While in the past released information was mostly in tabular and statistical form, many situations call for the release of specific data (microdata). In order to protect the anonymity of the entities (called respondents) to which information refers, data holders often remove or encrypt explicit identifiers such as names, addresses, and phone numbers. Deidentifying data, however, provides no guarantee of anonymity. Released information often contains other data, such as race, birth date, sex, and ZIP code, that can be linked to publicly available information to reidentify respondents and inferring information that was not intended for disclosure. In this paper we address the problem of releasing microdata while safeguarding the anonymity of respondents to which the data refer. The approach is based on the definition of k-anonymity. A table provides k-anonymity if attempts to link explicitly identifying information to its content map the information to at least k entities. We illustrate how k-anonymity can be provided without compromising the integrity (or truthfulness) of the information released by using generalization and suppression techniques. We introduce the concept of minimal generalization that captures the property of the release process not distorting the data more than needed to achieve k-anonymity, and present an algorithm for the computation of such a generalization. We also discuss possible preference policies to choose among different minimal generalizations.

...read more

Topics: Data anonymization (57%), k-anonymity (56%), Information protection policy (55%) ...read more

2,135 Citations


Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202127
202038
201940
201833
201758
201651

Top Attributes

Show by:

Topic's top 5 most impactful authors

Traian Marius Truta

10 papers, 745 citations

Jordi Soria-Comas

9 papers, 449 citations

Jiuyong Li

9 papers, 789 citations

Shinsaku Kiyomoto

6 papers, 18 citations

Hua Wang

6 papers, 186 citations

Network Information
Related Topics (5)
Query optimization

17.6K papers, 474.4K citations

81% related
Association rule learning

15.1K papers, 362K citations

80% related
Query language

17.2K papers, 496.2K citations

80% related
Web query classification

11.9K papers, 339.3K citations

79% related
Semantic Web Stack

13.9K papers, 307.2K citations

78% related