scispace - formally typeset
Journal ArticleDOI

Inference Controls for Statistical Databases

Denning, +1 more
- 01 Jul 1983 - 
- Vol. 16, Iss: 7, pp 69-82
TLDR
Some of the controls of the inference problem in on-line, general-purpose database systems allowing both statistical and nonstatistical access are surveyed, divided into two categories: those that place restrictions on the set of allowable queries and those that add "noise" to the data or to the released statistics.
Abstract
The goal of statistical databases is to provide frequencies, averages, and other statistics about groups of persons (or organizations), while protecting the privacy of the individuals represented in the database. This objective is difficult to achieve, since seemingly innocuous statistics contain small vestiges of the data used to compute them. By correlating enough statistics, sensitive data about an individual can be inferred. As a simple example, suppose there is only one female professor in an electrical engineering department. If statistics are released for the total salary of all professors in the department and the total salary of all male professors, the female professor's salary is easily obtained by subtraction. The problem of protecting against such indirect disclosures of sensitive data is called the inference problem. Over the last several decades, census agencies have developed many techniques for controlling inferences in population surveys. These techniques are applied before data are released so that the distributed data are free from disclosure problems. The data are typically released either in the form of microstatistics, which are files of \"sanitized\" records, or in the form of macrostatistics, which are tables of counts, sums, and higher order statistics. Starting with a study by Hoffman and Miller,' computer scientists began to look at the inference problem in on-line, general-purpose database systems allowing both statistical and nonstatistical access. A hospital database, for example, can give doctors direct access to a patient's medical records, while hospital administrators are permitted access only to statistical summaries of the records. Up until the late 1970's, most studies of the inference problem in these systems led to negative results; every conceivable control seemed to be easy to circumvent, to severely restrict the free flow of information, or to be intractable to implement. Recently, the results have become more positive, since we are now discovering controls that can potentially keep security and information loss at acceptable levels for a reasonable cost. This article surveys some of the controls that have been studied, comparing them with respect to their security, information loss, and cost. The controls are divided into two categories: those that place restrictions on the set of allowable queries and those that add \"noise\" to the data or to the released statistics. The controls are described and further classified within the context of a lattice model.

read more

Citations
More filters
Book

Security Engineering: A Guide to Building Dependable Distributed Systems

TL;DR: In almost 600 pages of riveting detail, Ross Anderson warns us not to be seduced by the latest defensive technologies, never to underestimate human ingenuity, and always use common sense in defending valuables.
Journal ArticleDOI

Security-control methods for statistical databases: a comparative study

TL;DR: This paper recommends directing future research efforts toward developing new methods that prevent exact disclosure and provide statistical-disclosure control, while at the same time do not suffer from the bias problem and the 0/1 query-set-size problem.
Journal ArticleDOI

State-of-the-art in privacy preserving data mining

TL;DR: An overview of the new and rapidly emerging research area of privacy preserving data mining is provided, and a classification hierarchy that sets the basis for analyzing the work which has been performed in this context is proposed.
Journal ArticleDOI

Privacy Preserving Clustering by Data Transformation

TL;DR: In this article, a family of geometric data transformation methods (GDTMs) is introduced to ensure that the mining process will not violate privacy up to a certain degree of security.
Journal ArticleDOI

Stalking the wily hacker

TL;DR: An astronomer-turned-sleuth traces a German trespasser on military networks, who slipped through operating system security holes and browsed through sensitive databases.
References
More filters
Journal ArticleDOI

The Effectiveness Of Output Modification By Rounding For Protection Of Statistical Data Bases

TL;DR: Analysis of the effects of both rounding and random rounding on data base output indicates that, with suitable choice of base, rounding may be a more secure technique than random rounding.
Proceedings ArticleDOI

Security is partitioned dynamic stastical databases

TL;DR: In this article, a practical partitioning model for dynamic istatistical databases is proposed and the information revealed to the users during the insertions, deletions and updates is characterized and it is showm that under certain conditions the model is secure.
Proceedings ArticleDOI

Memoryless Inference Controls for Statistical Databases

TL;DR: This paper focuses on restriction techniques, which place restrictions on the set of allowable statistics, and controls that add noise to the data or to the released statistics.
Journal ArticleDOI

Database Security

TL;DR: This work presents methods of constructing the set of queries to compromise a database where all the queries in the set return the maximum of a set of k elements, or all queries return the mean of aSet of k Elements.
Proceedings ArticleDOI

Insuring individual's privacy from statistical data base users

TL;DR: The problem considered is to determine the conditions which guarantee that a user who is allowed to ask only statistical queries cannot be successful in obtaining any more information about any individual than he already has.