scispace - formally typeset
Journal ArticleDOI

Inference Controls for Statistical Databases

Denning, +1 more
- 01 Jul 1983 - 
- Vol. 16, Iss: 7, pp 69-82
TLDR
Some of the controls of the inference problem in on-line, general-purpose database systems allowing both statistical and nonstatistical access are surveyed, divided into two categories: those that place restrictions on the set of allowable queries and those that add "noise" to the data or to the released statistics.
Abstract
The goal of statistical databases is to provide frequencies, averages, and other statistics about groups of persons (or organizations), while protecting the privacy of the individuals represented in the database. This objective is difficult to achieve, since seemingly innocuous statistics contain small vestiges of the data used to compute them. By correlating enough statistics, sensitive data about an individual can be inferred. As a simple example, suppose there is only one female professor in an electrical engineering department. If statistics are released for the total salary of all professors in the department and the total salary of all male professors, the female professor's salary is easily obtained by subtraction. The problem of protecting against such indirect disclosures of sensitive data is called the inference problem. Over the last several decades, census agencies have developed many techniques for controlling inferences in population surveys. These techniques are applied before data are released so that the distributed data are free from disclosure problems. The data are typically released either in the form of microstatistics, which are files of \"sanitized\" records, or in the form of macrostatistics, which are tables of counts, sums, and higher order statistics. Starting with a study by Hoffman and Miller,' computer scientists began to look at the inference problem in on-line, general-purpose database systems allowing both statistical and nonstatistical access. A hospital database, for example, can give doctors direct access to a patient's medical records, while hospital administrators are permitted access only to statistical summaries of the records. Up until the late 1970's, most studies of the inference problem in these systems led to negative results; every conceivable control seemed to be easy to circumvent, to severely restrict the free flow of information, or to be intractable to implement. Recently, the results have become more positive, since we are now discovering controls that can potentially keep security and information loss at acceptable levels for a reasonable cost. This article surveys some of the controls that have been studied, comparing them with respect to their security, information loss, and cost. The controls are divided into two categories: those that place restrictions on the set of allowable queries and those that add \"noise\" to the data or to the released statistics. The controls are described and further classified within the context of a lattice model.

read more

Citations
More filters
Book

Security and Privacy in User Modeling

Jorg Schreck
TL;DR: User Modeling and Security - A Practical Guide to Anonymity and Pseudonymity 2.0, with Selected User Modeling Components.
Book ChapterDOI

A Robust Sampling-Based Framework for Privacy Preserving OLAP

TL;DR: A robust sampling-based framework for privacy preserving OLAP is introduced and experimentally assessed, which deals with the problem of preserving the privacy of OLAP aggregations rather than the one of data cube cells, which results in a greater theoretical soundness, and lower computational overheads due to processing massive-in-size data cubes.
ReportDOI

Enhancing Privacy through Negative Representations of Data

TL;DR: It is shown that a database consisting of n, l-bit records can be represented negatively using only O"ln" records, and that reconstructing the database DB represented by a negative database NDB given as input is an NP-hard problem when time complexity is measured as a function of the size of NDB.
Journal ArticleDOI

Privacy Preserving OLAP over Distributed XML Data: A Theoretically-Sound Secure-Multiparty-Computation Approach

TL;DR: A novel Secure Multiparty Computation (SMC)-based privacy preserving OLAP framework for distributed collections of XML documents is proposed, which has many novel features ranging from nice theoretical properties to an effective and efficient protocol, called Secure Distributed OLAP aggregation protocol (SDO).
References
More filters
Book

Cryptography and data security

TL;DR: The goal of this book is to introduce the mathematical principles of data security and to show how these principles apply to operating systems, database systems, and computer networks.
Journal ArticleDOI

Data-swapping: A technique for disclosure control

TL;DR: Data-swapping is a data transformation technique where the underlying statistics of the data are preserved and can be used as a basis for microdata release or to justify the release of tabulations.
Journal ArticleDOI

Suppression Methodology and Statistical Disclosure Control

TL;DR: In this paper, the authors discuss theory and method of complementary cell suppression and related topics in statistical disclosure control, focusing on the development of methods that are theoretically broad but also practical to implement.
Journal ArticleDOI

Secure databases: protection against user influence

TL;DR: Users may be able to compromise databases by asking a series of questions and then inferring new information from the answers, and the complexity of protecting a database against this technique is discussed here.
Journal ArticleDOI

Secure statistical databases with random sample queries

TL;DR: A new inference control, called random sample queries, is proposed for safeguarding confidential data in on-line statistical databases that deals directly with the basic principle of compromise by making it impossible for a questioner to control precisely the formation of query sets.