Showing papers by "Vitaly Shmatikov published in 2008"

PDF

Open Access

Proceedings Article•DOI•

Robust De-anonymization of Large Sparse Datasets

[...]

18 May 2008

TL;DR: This work applies the de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world's largest online movie rental service, and demonstrates that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber's record in the dataset.

...read moreread less

Abstract: We present a new class of statistical de- anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary's background knowledge We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world's largest online movie rental service We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber's record in the dataset Using the Internet Movie Database as the source of background knowledge, we successfully identified the Netflix records of known users, uncovering their apparent political preferences and other potentially sensitive information

...read moreread less

2,241 citations

Proceedings Article•DOI•

The cost of privacy: destruction of data-mining utility in anonymized data publishing

[...]

Justin Brickell¹, Vitaly Shmatikov¹•Institutions (1)

University of Texas at Austin¹

24 Aug 2008

TL;DR: The results demonstrate that even modest privacy gains require almost complete destruction of the data-mining utility, and suggest that in most cases, trivial sanitization provides equivalent utility and better privacy than k-anonymity, l-diversity, and similar methods based on generalization and suppression.

...read moreread less

Abstract: Re-identification is a major privacy threat to public datasets containing individual records. Many privacy protection algorithms rely on generalization and suppression of "quasi-identifier" attributes such as ZIP code and birthdate. Their objective is usually syntactic sanitization: for example, k-anonymity requires that each "quasi-identifier" tuple appear in at least k records, while l-diversity requires that the distribution of sensitive attributes for each quasi-identifier have high entropy. The utility of sanitized data is also measured syntactically, by the number of generalization steps applied or the number of records with the same quasi-identifier. In this paper, we ask whether generalization and suppression of quasi-identifiers offer any benefits over trivial sanitization which simply separates quasi-identifiers from sensitive attributes. Previous work showed that k-anonymous databases can be useful for data mining, but k-anonymization does not guarantee any privacy. By contrast, we measure the tradeoff between privacy (how much can the adversary learn from the sanitized records?) and utility, measured as accuracy of data-mining algorithms executed on the same sanitized records.For our experimental evaluation, we use the same datasets from the UCI machine learning repository as were used in previous research on generalization and suppression. Our results demonstrate that even modest privacy gains require almost complete destruction of the data-mining utility. In most cases, trivial sanitization provides equivalent utility and better privacy than k-anonymity, l-diversity, and similar methods based on generalization and suppression.

...read moreread less

383 citations

Proceedings Article•DOI•

Towards Practical Privacy for Genomic Computation

[...]

Somesh Jha¹, Luis Kruger¹, Vitaly Shmatikov²•Institutions (2)

University of Wisconsin-Madison¹, University of Texas at Austin²

18 May 2008

TL;DR: This work presents a relatively efficient, privacy-preserving implementation of fundamental genomic computations such as calculating the edit distance and Smith- Waterman similarity scores between two sequences, and evaluates the prototype implementation on sequences from the Pfam database of protein families.

...read moreread less

Abstract: Many basic tasks in computational biology involve operations on individual DNA and protein sequences. These sequences, even when anonymized, are vulnerable to re-identification attacks and may reveal highly sensitive information about individuals. We present a relatively efficient, privacy-preserving implementation of fundamental genomic computations such as calculating the edit distance and Smith- Waterman similarity scores between two sequences. Our techniques are crypto graphically secure and significantly more practical than previous solutions. We evaluate our prototype implementation on sequences from the Pfam database of protein families, and demonstrate that its performance is adequate for solving real-world sequence-alignment and related problems in a privacy- preserving manner. Furthermore, our techniques have applications beyond computational biology. They can be used to obtain efficient, privacy-preserving implementations for many dynamic programming algorithms over distributed datasets.

...read moreread less

228 citations

Proceedings Article•

Proceedings of the 6th ACM workshop on Formal methods in security engineering

[...]

Vitaly Shmatikov¹•Institutions (1)

University of Texas at Austin¹

27 Oct 2008

TL;DR: The papers presented at the FMSE cover a wide variety of topics, including application of formal methods to the analysis of cryptographic protocols, wireless security, secure information sharing, and security policies for virtual-machine monitors.

...read moreread less

Abstract: It is my great pleasure to welcome you to the 6th ACM Workshop on Formal Methods in Security Engineering (FMSE), held in conjunction with the 15th ACM Conference on Computer and Communications Security (CCS 2008). The purpose of FMSE is to bring together researchers and practitioners from both the security and the software engineering communities, from academia and industry, who are working on applying formal methods to the design and validation of large-scale systems. The scope of the workshop includes security requirements and risk analysis, access control, information flow, and trust models, specification and analysis of security properties, computationally sound abstraction, program logics and type systems for security, techniques for verification and static analysis, tool support for the development and analysis of security-critical systems, design and analysis of security protocols, security aspects of operating systems and middleware, and case studies. The program of the workshop was selected by the program committee via a rigorous peer-review process, with fewer than 1/3 of the submitted papers accepted for presentation. The papers presented at the workshop cover a wide variety of topics, including application of formal methods to the analysis of cryptographic protocols, wireless security, secure information sharing, and security policies for virtual-machine monitors. In addition to the peer-reviewed papers, the workshop includes invited talks by Cedric Fournet of Microsoft Research (UK) and the Microsoft Research-INRIA Joint Centre (France) and by Carl Gunter of the University of Illinois at Urbana-Champaign (USA).

...read moreread less

2 citations