Inference Controls for Statistical Databases

doi:10.1109/MC.1983.1654444

Journal ArticleDOI

Inference Controls for Statistical Databases

Denning, +1 more

- 01 Jul 1983 -

IEEE Computer

- Vol. 16, Iss: 7, pp 69-82

TLDR

Some of the controls of the inference problem in on-line, general-purpose database systems allowing both statistical and nonstatistical access are surveyed, divided into two categories: those that place restrictions on the set of allowable queries and those that add "noise" to the data or to the released statistics.

Abstract:

The goal of statistical databases is to provide frequencies, averages, and other statistics about groups of persons (or organizations), while protecting the privacy of the individuals represented in the database. This objective is difficult to achieve, since seemingly innocuous statistics contain small vestiges of the data used to compute them. By correlating enough statistics, sensitive data about an individual can be inferred. As a simple example, suppose there is only one female professor in an electrical engineering department. If statistics are released for the total salary of all professors in the department and the total salary of all male professors, the female professor's salary is easily obtained by subtraction. The problem of protecting against such indirect disclosures of sensitive data is called the inference problem. Over the last several decades, census agencies have developed many techniques for controlling inferences in population surveys. These techniques are applied before data are released so that the distributed data are free from disclosure problems. The data are typically released either in the form of microstatistics, which are files of \"sanitized\" records, or in the form of macrostatistics, which are tables of counts, sums, and higher order statistics. Starting with a study by Hoffman and Miller,' computer scientists began to look at the inference problem in on-line, general-purpose database systems allowing both statistical and nonstatistical access. A hospital database, for example, can give doctors direct access to a patient's medical records, while hospital administrators are permitted access only to statistical summaries of the records. Up until the late 1970's, most studies of the inference problem in these systems led to negative results; every conceivable control seemed to be easy to circumvent, to severely restrict the free flow of information, or to be intractable to implement. Recently, the results have become more positive, since we are now discovering controls that can potentially keep security and information loss at acceptable levels for a reasonable cost. This article surveys some of the controls that have been studied, comparing them with respect to their security, information loss, and cost. The controls are divided into two categories: those that place restrictions on the set of allowable queries and those that add \"noise\" to the data or to the released statistics. The controls are described and further classified within the context of a lattice model.

Inference Controls for Statistical Databases

Citations

On Inference-Proof View Processing of XML Documents

Censoring statistical tables to protect sensitive information: easy and hard problems

Preventing Disclosure of Personal Data in IoT Networks

Absolute bounds on set intersection and union sizes from distribution information

Classification of technological privacy techniques for LTE-based public safety networks

References

Cryptography and data security

Data-swapping: A technique for disclosure control

Suppression Methodology and Statistical Disclosure Control

Secure databases: protection against user influence

Secure statistical databases with random sample queries

Related Papers (5)

Security-control methods for statistical databases: a comparative study

Suppression Methodology and Statistical Disclosure Control

Secure databases: protection against user influence

Cryptography and data security

Privacy-preserving data mining