Journal ArticleDOI
Inference Controls for Statistical Databases
TLDR
Some of the controls of the inference problem in on-line, general-purpose database systems allowing both statistical and nonstatistical access are surveyed, divided into two categories: those that place restrictions on the set of allowable queries and those that add "noise" to the data or to the released statistics.Abstract:
The goal of statistical databases is to provide frequencies, averages, and other statistics about groups of persons (or organizations), while protecting the privacy of the individuals represented in the database. This objective is difficult to achieve, since seemingly innocuous statistics contain small vestiges of the data used to compute them. By correlating enough statistics, sensitive data about an individual can be inferred. As a simple example, suppose there is only one female professor in an electrical engineering department. If statistics are released for the total salary of all professors in the department and the total salary of all male professors, the female professor's salary is easily obtained by subtraction. The problem of protecting against such indirect disclosures of sensitive data is called the inference problem. Over the last several decades, census agencies have developed many techniques for controlling inferences in population surveys. These techniques are applied before data are released so that the distributed data are free from disclosure problems. The data are typically released either in the form of microstatistics, which are files of \"sanitized\" records, or in the form of macrostatistics, which are tables of counts, sums, and higher order statistics. Starting with a study by Hoffman and Miller,' computer scientists began to look at the inference problem in on-line, general-purpose database systems allowing both statistical and nonstatistical access. A hospital database, for example, can give doctors direct access to a patient's medical records, while hospital administrators are permitted access only to statistical summaries of the records. Up until the late 1970's, most studies of the inference problem in these systems led to negative results; every conceivable control seemed to be easy to circumvent, to severely restrict the free flow of information, or to be intractable to implement. Recently, the results have become more positive, since we are now discovering controls that can potentially keep security and information loss at acceptable levels for a reasonable cost. This article surveys some of the controls that have been studied, comparing them with respect to their security, information loss, and cost. The controls are divided into two categories: those that place restrictions on the set of allowable queries and those that add \"noise\" to the data or to the released statistics. The controls are described and further classified within the context of a lattice model.read more
Citations
More filters
Book
Preserving Privacy in On-Line Analytical Processing Data Cubes
TL;DR: Ongoing efforts such as the platform for privacy preferences (P3P) help enterprises make promises about keeping private data secret, but they do not provide mechanisms for them to keep the promises.
Book ChapterDOI
Auditing and inference control for privacy preservation in uncertain environments
TL;DR: This paper presents a Bayesian network-based inference control method to prevent privacy-sensitive contexts from being derived from those released in ubiquitous environments.
Journal ArticleDOI
Optimal Augmentation for Bipartite Componentwise Biconnectivity in Linear Time
Tsan-sheng Hsu,Ming-Yang Kao +1 more
TL;DR: In this article, a linear-time algorithm for the problem of adding the smallest number of edges to make a bipartite graph componentwise biconnected while preserving its bipartiteness is presented.
Proceedings Article
An evaluation of two new inference control methods
Y. H. Chin,Weng-Ling Peng +1 more
TL;DR: In this article, an evaluation method is developed to measure the cost/effectiveness of two new inference control methods, which combine the merit of some popular concepts; the first method is based on restriction, and the second on perturbation.
Proceedings ArticleDOI
Reasoning about obfuscated private information: who have lied and how to lie
TL;DR: This paper presents a Bayesian network-based method to reason about the obfuscation that can be used to find if the received information has been obfuscated, and if so, what the true information could be; on the other hand, it can beused to help the obfuscators reasonably obfuscate their private information.
References
More filters
Book
Cryptography and data security
TL;DR: The goal of this book is to introduce the mathematical principles of data security and to show how these principles apply to operating systems, database systems, and computer networks.
Journal ArticleDOI
Data-swapping: A technique for disclosure control
Tore Dalenius,Steven P. Reiss +1 more
TL;DR: Data-swapping is a data transformation technique where the underlying statistics of the data are preserved and can be used as a basis for microdata release or to justify the release of tabulations.
Journal ArticleDOI
Suppression Methodology and Statistical Disclosure Control
TL;DR: In this paper, the authors discuss theory and method of complementary cell suppression and related topics in statistical disclosure control, focusing on the development of methods that are theoretically broad but also practical to implement.
Journal ArticleDOI
Secure databases: protection against user influence
TL;DR: Users may be able to compromise databases by asking a series of questions and then inferring new information from the answers, and the complexity of protecting a database against this technique is discussed here.
Journal ArticleDOI
Secure statistical databases with random sample queries
TL;DR: A new inference control, called random sample queries, is proposed for safeguarding confidential data in on-line statistical databases that deals directly with the basic principle of compromise by making it impossible for a questioner to control precisely the formation of query sets.