scispace - formally typeset
Journal ArticleDOI

Inference Controls for Statistical Databases

Denning, +1 more
- 01 Jul 1983 - 
- Vol. 16, Iss: 7, pp 69-82
TLDR
Some of the controls of the inference problem in on-line, general-purpose database systems allowing both statistical and nonstatistical access are surveyed, divided into two categories: those that place restrictions on the set of allowable queries and those that add "noise" to the data or to the released statistics.
Abstract
The goal of statistical databases is to provide frequencies, averages, and other statistics about groups of persons (or organizations), while protecting the privacy of the individuals represented in the database. This objective is difficult to achieve, since seemingly innocuous statistics contain small vestiges of the data used to compute them. By correlating enough statistics, sensitive data about an individual can be inferred. As a simple example, suppose there is only one female professor in an electrical engineering department. If statistics are released for the total salary of all professors in the department and the total salary of all male professors, the female professor's salary is easily obtained by subtraction. The problem of protecting against such indirect disclosures of sensitive data is called the inference problem. Over the last several decades, census agencies have developed many techniques for controlling inferences in population surveys. These techniques are applied before data are released so that the distributed data are free from disclosure problems. The data are typically released either in the form of microstatistics, which are files of \"sanitized\" records, or in the form of macrostatistics, which are tables of counts, sums, and higher order statistics. Starting with a study by Hoffman and Miller,' computer scientists began to look at the inference problem in on-line, general-purpose database systems allowing both statistical and nonstatistical access. A hospital database, for example, can give doctors direct access to a patient's medical records, while hospital administrators are permitted access only to statistical summaries of the records. Up until the late 1970's, most studies of the inference problem in these systems led to negative results; every conceivable control seemed to be easy to circumvent, to severely restrict the free flow of information, or to be intractable to implement. Recently, the results have become more positive, since we are now discovering controls that can potentially keep security and information loss at acceptable levels for a reasonable cost. This article surveys some of the controls that have been studied, comparing them with respect to their security, information loss, and cost. The controls are divided into two categories: those that place restrictions on the set of allowable queries and those that add \"noise\" to the data or to the released statistics. The controls are described and further classified within the context of a lattice model.

read more

Citations
More filters
Journal ArticleDOI

On Inference-Proof View Processing of XML Documents

TL;DR: This work presents an algorithm for generating an inference-proof view by weakening the actual XML document, i.e., eliminating confidential information and other information that could be used to infer confidential information.
Proceedings ArticleDOI

Censoring statistical tables to protect sensitive information: easy and hard problems

TL;DR: If sensitive information refers not only to single cells but also to cell sets, it is proved that the problem of minimizing the number of suppressions is NP-hard.
Proceedings ArticleDOI

Preventing Disclosure of Personal Data in IoT Networks

TL;DR: The concept of an Adaptive Inference Discovery Service AID-S is conceived as a service that may support users to prevent this kind of information leakage and that can be integrated into personal data managers.
Journal ArticleDOI

Absolute bounds on set intersection and union sizes from distribution information

TL;DR: In this paper, a catalog of quick closed-form bounds on set intersection and union sizes is presented; they can be expressed as rules, and managed by a rule-based system architecture.
Proceedings ArticleDOI

Classification of technological privacy techniques for LTE-based public safety networks

TL;DR: A classification of technological privacy techniques is proposed in order to protect and enhance privacy in LTE-based PSNs and highlights further requirements and open problems for which available privacy techniques are not sufficient.
References
More filters
Book

Cryptography and data security

TL;DR: The goal of this book is to introduce the mathematical principles of data security and to show how these principles apply to operating systems, database systems, and computer networks.
Journal ArticleDOI

Data-swapping: A technique for disclosure control

TL;DR: Data-swapping is a data transformation technique where the underlying statistics of the data are preserved and can be used as a basis for microdata release or to justify the release of tabulations.
Journal ArticleDOI

Suppression Methodology and Statistical Disclosure Control

TL;DR: In this paper, the authors discuss theory and method of complementary cell suppression and related topics in statistical disclosure control, focusing on the development of methods that are theoretically broad but also practical to implement.
Journal ArticleDOI

Secure databases: protection against user influence

TL;DR: Users may be able to compromise databases by asking a series of questions and then inferring new information from the answers, and the complexity of protecting a database against this technique is discussed here.
Journal ArticleDOI

Secure statistical databases with random sample queries

TL;DR: A new inference control, called random sample queries, is proposed for safeguarding confidential data in on-line statistical databases that deals directly with the basic principle of compromise by making it impossible for a questioner to control precisely the formation of query sets.