scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Non-cryptographic security to data: Distortion based anonymization techniques

02 May 2014-pp 1-5
TL;DR: The strengths and drawbacks of a particular type of distortion approach, Data Anonymization are discussed, which is used to provide security to Health data, finance data and the like.
Abstract: Ensuring privacy to sensitive data is becoming an important criterion in the data access policies of governments and corporations. Providing security to data using non-cryptographic based techniques has been in use for a long time. They are used to provide security to Health data, finance data and the like. Data are distorted using various approaches to hide sensitive information and provide privacy. A whole line of methods from statistical disclosure control to distortion based techniques are in use. This paper discusses the strengths and drawbacks of a particular type of distortion approach, Data Anonymization.
Citations
More filters
Journal Article
TL;DR: The Health Insurance Portability and Accountability Act, also known as HIPAA, was designed to protect health insurance coverage for workers and their families while between jobs and establishes standards for electronic health care transactions.
Abstract: The Health Insurance Portability and Accountability Act, also known as HIPAA, was first delivered to congress in 1996 and consisted of just two Titles. It was designed to protect health insurance coverage for workers and their families while between jobs. It establishes standards for electronic health care transactions and addresses the issues of privacy and security when dealing with Protected Health Information (PHI). HIPAA is applicable only in the United States of America.

561 citations

Book
01 Jan 2008
TL;DR: This thesis examines privacy concerns in online social networks where the private information to be protected is the profile information of a user or the set of individuals in the network that a user interacts with and identifies limits on the amount of lookahead that a social network should provide each user to protect the privacy of its users from hijacking attacks.
Abstract: As computing technologies continue to advance there has been a rapid growth in the amount of digitally available personal information, bringing the privacy concerns of individuals to a forefront. This thesis explores modeling and algorithmic problems that arise from the need to protect the privacy of individuals in a variety of different settings. Specifically, we study three different problems. The first problem we consider is that of online query auditing . The focus here is on an interactive scenario wherein users pose aggregate queries over a statistical database containing private data. The privacy task is to deny queries when answers to the current and past queries may be stitched together by a malicious user to infer private information. We demonstrate an efficient scheme for auditing bags of max and min queries to protect against a certain kind of privacy breach. Additionally, we study, for the first time, the utility of auditing algorithms and provide initial results for the utility of an existing algorithm for auditing sum queries. The second problem we study is that of anonymizing unstructured data. We consider datasets in which each individual is associated with a set of items that constitute private information about the individual. Illustrative datasets include market-basket datasets and search engine query logs. We formalize the notion of k-anonymity for set-valued data as a variant of the k-anonymity model for traditional relational datasets. We define an optimization problem that arises from this definition of anonymity and provide a constant factor approximation algorithm for the same. We experimentally evaluate our algorithms on the America Online query log dataset. In the last problem, we examine privacy concerns in online social networks where the private information to be protected is the profile information of a user or the set of individuals in the network that a user interacts with. We identify limits on the amount of lookahead that a social network should provide each user to protect the privacy of its users from hijacking attacks. The lookahead of a network is essentially the amount of neighborhood visibility a network provides each user. And a hijacking attack is one in which an attacker strategically subverts (hijacks) user accounts in the network to gain access to different local neighborhoods of the network. The goal of the attacker is to piece together these local neighborhoods to build a complete picture of the social network. By analyzing both experimentally and theoretically the feasibility of such attacks as a function of the lookahead of the social network, we make recommendations for what the default lookahead settings of a privacy-conscious social network should be.

3 citations

References
More filters
BookDOI
01 Jan 1990
TL;DR: In this article, an electrical signal transmission system for railway locomotives and rolling stock is proposed, where a basic pulse train is transmitted whereof the pulses are of a selected first amplitude and represent a train axle count, and a spike pulse of greater selected amplitude is transmitted, occurring immediately after the axle count pulse to which it relates, whenever an overheated axle box is detected.
Abstract: An electrical signal transmission system, applicable to the transmission of signals from trackside hot box detector equipment for railroad locomotives and rolling stock, wherein a basic pulse train is transmitted whereof the pulses are of a selected first amplitude and represent a train axle count, and a spike pulse of greater selected amplitude is transmitted, occurring immediately after the axle count pulse to which it relates, whenever an overheated axle box is detected. To enable the signal receiving equipment to determine on which side of a train the overheated box is located, the spike pulses are of two different amplitudes corresponding, respectively, to opposite sides of the train.

9,011 citations

Journal ArticleDOI
TL;DR: The solution provided in this paper includes a formal protection model named k-anonymity and a set of accompanying policies for deployment and examines re-identification attacks that can be realized on releases that adhere to k- anonymity unless accompanying policies are respected.
Abstract: Consider a data holder, such as a hospital or a bank, that has a privately held collection of person-specific, field structured data. Suppose the data holder wants to share a version of the data with researchers. How can a data holder release a version of its private data with scientific guarantees that the individuals who are the subjects of the data cannot be re-identified while the data remain practically useful? The solution provided in this paper includes a formal protection model named k-anonymity and a set of accompanying policies for deployment. A release provides k-anonymity protection if the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appears in the release. This paper also examines re-identification attacks that can be realized on releases that adhere to k- anonymity unless accompanying policies are respected. The k-anonymity protection model is important because it forms the basis on which the real-world systems known as Datafly, µ-Argus and k-Similar provide guarantees of privacy protection.

7,925 citations


"Non-cryptographic security to data:..." refers background in this paper

  • ...A dataset is said to be possess k-Anonymity if for each combination of unique attributes, there exists at least ‘k’(k>1) records with same values[1]....

    [...]

Book ChapterDOI
Cynthia Dwork1
10 Jul 2006
TL;DR: In this article, the authors give a general impossibility result showing that a formalization of Dalenius' goal along the lines of semantic security cannot be achieved, and suggest a new measure, differential privacy, which, intuitively, captures the increased risk to one's privacy incurred by participating in a database.
Abstract: In 1977 Dalenius articulated a desideratum for statistical databases: nothing about an individual should be learnable from the database that cannot be learned without access to the database. We give a general impossibility result showing that a formalization of Dalenius' goal along the lines of semantic security cannot be achieved. Contrary to intuition, a variant of the result threatens the privacy even of someone not in the database. This state of affairs suggests a new measure, differential privacy, which, intuitively, captures the increased risk to one's privacy incurred by participating in a database. The techniques developed in a sequence of papers [8, 13, 3], culminating in those described in [12], can achieve any desired level of privacy under this measure. In many cases, extremely accurate information about the database can be provided while simultaneously ensuring very high levels of privacy

4,134 citations

Journal ArticleDOI
TL;DR: This paper shows with two simple attacks that a \kappa-anonymized dataset has some subtle, but severe privacy problems, and proposes a novel and powerful privacy definition called \ell-diversity, which is practical and can be implemented efficiently.
Abstract: Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In a k-anonymized dataset, each record is indistinguishable from at least k − 1 other records with respect to certain identifying attributes.In this article, we show using two simple attacks that a k-anonymized dataset has some subtle but severe privacy problems. First, an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. This is a known problem. Second, attackers often have background knowledge, and we show that k-anonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks, and we propose a novel and powerful privacy criterion called e-diversity that can defend against such attacks. In addition to building a formal foundation for e-diversity, we show in an experimental evaluation that e-diversity is practical and can be implemented efficiently.

3,780 citations

Proceedings ArticleDOI
15 Apr 2007
TL;DR: T-closeness as mentioned in this paper requires that the distribution of a sensitive attribute in any equivalence class is close to the distributions of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold t).
Abstract: The k-anonymity privacy requirement for publishing microdata requires that each equivalence class (ie, a set of records that are indistinguishable from each other with respect to certain "identifying" attributes) contains at least k records Recently, several authors have recognized that k-anonymity cannot prevent attribute disclosure The notion of l-diversity has been proposed to address this; l-diversity requires that each equivalence class has at least l well-represented values for each sensitive attribute In this paper we show that l-diversity has a number of limitations In particular, it is neither necessary nor sufficient to prevent attribute disclosure We propose a novel privacy notion called t-closeness, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table (ie, the distance between the two distributions should be no more than a threshold t) We choose to use the earth mover distance measure for our t-closeness requirement We discuss the rationale for t-closeness and illustrate its advantages through examples and experiments

3,281 citations


"Non-cryptographic security to data:..." refers background in this paper

  • ...The t-Closeness anonymity algorithm upholds the following feature: the distance between the distribution of a sensitive attribute in a combination and the distribution of the sensitive attribute in the entire dataset is not less than a threshold value ‘t’ [4]....

    [...]