scispace - formally typeset
Journal ArticleDOI

Inference Controls for Statistical Databases

Denning, +1 more
- 01 Jul 1983 - 
- Vol. 16, Iss: 7, pp 69-82
TLDR
Some of the controls of the inference problem in on-line, general-purpose database systems allowing both statistical and nonstatistical access are surveyed, divided into two categories: those that place restrictions on the set of allowable queries and those that add "noise" to the data or to the released statistics.
Abstract
The goal of statistical databases is to provide frequencies, averages, and other statistics about groups of persons (or organizations), while protecting the privacy of the individuals represented in the database. This objective is difficult to achieve, since seemingly innocuous statistics contain small vestiges of the data used to compute them. By correlating enough statistics, sensitive data about an individual can be inferred. As a simple example, suppose there is only one female professor in an electrical engineering department. If statistics are released for the total salary of all professors in the department and the total salary of all male professors, the female professor's salary is easily obtained by subtraction. The problem of protecting against such indirect disclosures of sensitive data is called the inference problem. Over the last several decades, census agencies have developed many techniques for controlling inferences in population surveys. These techniques are applied before data are released so that the distributed data are free from disclosure problems. The data are typically released either in the form of microstatistics, which are files of \"sanitized\" records, or in the form of macrostatistics, which are tables of counts, sums, and higher order statistics. Starting with a study by Hoffman and Miller,' computer scientists began to look at the inference problem in on-line, general-purpose database systems allowing both statistical and nonstatistical access. A hospital database, for example, can give doctors direct access to a patient's medical records, while hospital administrators are permitted access only to statistical summaries of the records. Up until the late 1970's, most studies of the inference problem in these systems led to negative results; every conceivable control seemed to be easy to circumvent, to severely restrict the free flow of information, or to be intractable to implement. Recently, the results have become more positive, since we are now discovering controls that can potentially keep security and information loss at acceptable levels for a reasonable cost. This article surveys some of the controls that have been studied, comparing them with respect to their security, information loss, and cost. The controls are divided into two categories: those that place restrictions on the set of allowable queries and those that add \"noise\" to the data or to the released statistics. The controls are described and further classified within the context of a lattice model.

read more

Citations
More filters
Book

Preserving Privacy in On-Line Analytical Processing Data Cubes

TL;DR: Ongoing efforts such as the platform for privacy preferences (P3P) help enterprises make promises about keeping private data secret, but they do not provide mechanisms for them to keep the promises.
Book ChapterDOI

Auditing and inference control for privacy preservation in uncertain environments

TL;DR: This paper presents a Bayesian network-based inference control method to prevent privacy-sensitive contexts from being derived from those released in ubiquitous environments.
Journal ArticleDOI

Optimal Augmentation for Bipartite Componentwise Biconnectivity in Linear Time

TL;DR: In this article, a linear-time algorithm for the problem of adding the smallest number of edges to make a bipartite graph componentwise biconnected while preserving its bipartiteness is presented.
Proceedings Article

An evaluation of two new inference control methods

TL;DR: In this article, an evaluation method is developed to measure the cost/effectiveness of two new inference control methods, which combine the merit of some popular concepts; the first method is based on restriction, and the second on perturbation.
Proceedings ArticleDOI

Reasoning about obfuscated private information: who have lied and how to lie

TL;DR: This paper presents a Bayesian network-based method to reason about the obfuscation that can be used to find if the received information has been obfuscated, and if so, what the true information could be; on the other hand, it can beused to help the obfuscators reasonably obfuscate their private information.
References
More filters
Book

Cryptography and data security

TL;DR: The goal of this book is to introduce the mathematical principles of data security and to show how these principles apply to operating systems, database systems, and computer networks.
Journal ArticleDOI

Data-swapping: A technique for disclosure control

TL;DR: Data-swapping is a data transformation technique where the underlying statistics of the data are preserved and can be used as a basis for microdata release or to justify the release of tabulations.
Journal ArticleDOI

Suppression Methodology and Statistical Disclosure Control

TL;DR: In this paper, the authors discuss theory and method of complementary cell suppression and related topics in statistical disclosure control, focusing on the development of methods that are theoretically broad but also practical to implement.
Journal ArticleDOI

Secure databases: protection against user influence

TL;DR: Users may be able to compromise databases by asking a series of questions and then inferring new information from the answers, and the complexity of protecting a database against this technique is discussed here.
Journal ArticleDOI

Secure statistical databases with random sample queries

TL;DR: A new inference control, called random sample queries, is proposed for safeguarding confidential data in on-line statistical databases that deals directly with the basic principle of compromise by making it impossible for a questioner to control precisely the formation of query sets.