scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Inference Controls for Statistical Databases

01 Jul 1983-IEEE Computer (IEEE)-Vol. 16, Iss: 7, pp 69-82
TL;DR: Some of the controls of the inference problem in on-line, general-purpose database systems allowing both statistical and nonstatistical access are surveyed, divided into two categories: those that place restrictions on the set of allowable queries and those that add "noise" to the data or to the released statistics.
Abstract: The goal of statistical databases is to provide frequencies, averages, and other statistics about groups of persons (or organizations), while protecting the privacy of the individuals represented in the database. This objective is difficult to achieve, since seemingly innocuous statistics contain small vestiges of the data used to compute them. By correlating enough statistics, sensitive data about an individual can be inferred. As a simple example, suppose there is only one female professor in an electrical engineering department. If statistics are released for the total salary of all professors in the department and the total salary of all male professors, the female professor's salary is easily obtained by subtraction. The problem of protecting against such indirect disclosures of sensitive data is called the inference problem. Over the last several decades, census agencies have developed many techniques for controlling inferences in population surveys. These techniques are applied before data are released so that the distributed data are free from disclosure problems. The data are typically released either in the form of microstatistics, which are files of \"sanitized\" records, or in the form of macrostatistics, which are tables of counts, sums, and higher order statistics. Starting with a study by Hoffman and Miller,' computer scientists began to look at the inference problem in on-line, general-purpose database systems allowing both statistical and nonstatistical access. A hospital database, for example, can give doctors direct access to a patient's medical records, while hospital administrators are permitted access only to statistical summaries of the records. Up until the late 1970's, most studies of the inference problem in these systems led to negative results; every conceivable control seemed to be easy to circumvent, to severely restrict the free flow of information, or to be intractable to implement. Recently, the results have become more positive, since we are now discovering controls that can potentially keep security and information loss at acceptable levels for a reasonable cost. This article surveys some of the controls that have been studied, comparing them with respect to their security, information loss, and cost. The controls are divided into two categories: those that place restrictions on the set of allowable queries and those that add \"noise\" to the data or to the released statistics. The controls are described and further classified within the context of a lattice model.
Citations
More filters
Book
01 Jan 2001
TL;DR: In almost 600 pages of riveting detail, Ross Anderson warns us not to be seduced by the latest defensive technologies, never to underestimate human ingenuity, and always use common sense in defending valuables.
Abstract: Gigantically comprehensive and carefully researched, Security Engineering makes it clear just how difficult it is to protect information systems from corruption, eavesdropping, unauthorized use, and general malice. Better, Ross Anderson offers a lot of thoughts on how information can be made more secure (though probably not absolutely secure, at least not forever) with the help of both technologies and management strategies. His work makes fascinating reading and will no doubt inspire considerable doubt--fear is probably a better choice of words--in anyone with information to gather, protect, or make decisions about. Be aware: This is absolutely not a book solely about computers, with yet another explanation of Alice and Bob and how they exchange public keys in order to exchange messages in secret. Anderson explores, for example, the ingenious ways in which European truck drivers defeat their vehicles' speed-logging equipment. In another section, he shows how the end of the cold war brought on a decline in defenses against radio-frequency monitoring (radio frequencies can be used to determine, at a distance, what's going on in systems--bank teller machines, say), and how similar technology can be used to reverse-engineer the calculations that go on inside smart cards. In almost 600 pages of riveting detail, Anderson warns us not to be seduced by the latest defensive technologies, never to underestimate human ingenuity, and always use common sense in defending valuables. A terrific read for security professionals and general readers alike. --David Wall Topics covered: How some people go about protecting valuable things (particularly, but not exclusively, information) and how other people go about getting it anyway. Mostly, this takes the form of essays (about, for example, how the U.S. Air Force keeps its nukes out of the wrong hands) and stories (one of which tells of an art thief who defeated the latest technology by hiding in a closet). Sections deal with technologies, policies, psychology, and legal matters.

1,852 citations

Journal ArticleDOI
TL;DR: This paper recommends directing future research efforts toward developing new methods that prevent exact disclosure and provide statistical-disclosure control, while at the same time do not suffer from the bias problem and the 0/1 query-set-size problem.
Abstract: This paper considers the problem of providing security to statistical databases against disclosure of confidential information. Security-control methods suggested in the literature are classified into four general approaches: conceptual, query restriction, data perturbation, and output perturbation.Criteria for evaluating the performance of the various security-control methods are identified. Security-control methods that are based on each of the four approaches are discussed, together with their performance with respect to the identified evaluation criteria. A detailed comparative analysis of the most promising methods for protecting dynamic-online statistical databases is also presented.To date no single security-control method prevents both exact and partial disclosures. There are, however, a few perturbation-based methods that prevent exact disclosure and enable the database administrator to exercise "statistical disclosure control." Some of these methods, however introduce bias into query responses or suffer from the 0/1 query-set-size problem (i.e., partial disclosure is possible in case of null query set or a query set of size 1).We recommend directing future research efforts toward developing new methods that prevent exact disclosure and provide statistical-disclosure control, while at the same time do not suffer from the bias problem and the 0/1 query-set-size problem. Furthermore, efforts directed toward developing a bias-correction mechanism and solving the general problem of small query-set-size would help salvage a few of the current perturbation-based methods.

1,082 citations

Journal ArticleDOI
01 Mar 2004
TL;DR: An overview of the new and rapidly emerging research area of privacy preserving data mining is provided, and a classification hierarchy that sets the basis for analyzing the work which has been performed in this context is proposed.
Abstract: We provide here an overview of the new and rapidly emerging research area of privacy preserving data mining. We also propose a classification hierarchy that sets the basis for analyzing the work which has been performed in this context. A detailed review of the work accomplished in this area is also given, along with the coordinates of each work to the classification hierarchy. A brief evaluation is performed, and some initial conclusions are made.

884 citations

Journal ArticleDOI
TL;DR: In this article, a family of geometric data transformation methods (GDTMs) is introduced to ensure that the mining process will not violate privacy up to a certain degree of security.
Abstract: Despite its benefit in a wide range of applications, data mining techniques also have raised a number of ethical issues. Some such issues include those of privacy, data security, intellectual property rights, and many others. In this paper, we address the privacy problem against unauthorized secondary use of information. To do so, we introduce a family of geometric data transformation methods (GDTMs) which ensure that the mining process will not violate privacy up to a certain degree of security. We focus primarily on privacy preserving data clustering, notably on partition-based and hierarchical methods. Our proposed methods distort only confidential numerical attributes to meet privacy requirements, while preserving general features for clustering analysis. Our experiments demonstrate that our methods are effective and provide acceptable values in practice for balancing privacy and accuracy. We report the main results of our performance evaluation and discuss some open research issues.

265 citations

Journal ArticleDOI
TL;DR: An astronomer-turned-sleuth traces a German trespasser on military networks, who slipped through operating system security holes and browsed through sensitive databases.
Abstract: An astronomer-turned-sleuth traces a German trespasser on our military networks, who slipped through operating system security holes and browsed through sensitive databases. Was it espionage?

189 citations

References
More filters
Book
01 Jan 1982
TL;DR: The goal of this book is to introduce the mathematical principles of data security and to show how these principles apply to operating systems, database systems, and computer networks.
Abstract: From the Preface (See Front Matter for full Preface) Electronic computers have evolved from exiguous experimental enterprises in the 1940s to prolific practical data processing systems in the 1980s. As we have come to rely on these systems to process and store data, we have also come to wonder about their ability to protect valuable data. Data security is the science and study of methods of protecting data in computer and communication systems from unauthorized disclosure and modification. The goal of this book is to introduce the mathematical principles of data security and to show how these principles apply to operating systems, database systems, and computer networks. The book is for students and professionals seeking an introduction to these principles. There are many references for those who would like to study specific topics further. Data security has evolved rapidly since 1975. We have seen exciting developments in cryptography: public-key encryption, digital signatures, the Data Encryption Standard (DES), key safeguarding schemes, and key distribution protocols. We have developed techniques for verifying that programs do not leak confidential data, or transmit classified data to users with lower security clearances. We have found new controls for protecting data in statistical databases--and new methods of attacking these databases. We have come to a better understanding of the theoretical and practical limitations to security.

1,937 citations

Journal ArticleDOI
TL;DR: Data-swapping is a data transformation technique where the underlying statistics of the data are preserved and can be used as a basis for microdata release or to justify the release of tabulations.

370 citations

Journal ArticleDOI
TL;DR: In this paper, the authors discuss theory and method of complementary cell suppression and related topics in statistical disclosure control, focusing on the development of methods that are theoretically broad but also practical to implement.
Abstract: This article discusses theory and method of complementary cell suppression and related topics in statistical disclosure control. Emphasis is placed on the development of methods that are theoretically broad but also practical to implement. The approach draws from areas of discrete mathematics and linear optimization theory.

297 citations

Journal ArticleDOI
TL;DR: Users may be able to compromise databases by asking a series of questions and then inferring new information from the answers, and the complexity of protecting a database against this technique is discussed here.
Abstract: Users may be able to compromise databases by asking a series of questions and then inferring new information from the answers. The complexity of protecting a database against this technique is discussed here.

234 citations

Journal ArticleDOI
TL;DR: A new inference control, called random sample queries, is proposed for safeguarding confidential data in on-line statistical databases that deals directly with the basic principle of compromise by making it impossible for a questioner to control precisely the formation of query sets.
Abstract: A new inference control, called random sample queries, is proposed for safeguarding confidential data in on-line statistical databases. The random sample queries control deals directly with the basic principle of compromise by making it impossible for a questioner to control precisely the formation of query sets. Queries for relative frequencies and averages are computed using random samples drawn from the query sets. The sampling strategy permits the release of accurate and timely statistics and can be implemented at very low cost. Analysis shows the relative error in the statistics decreases as the query set size increases; in contrast, the effort required to compromise increases with the query set size due to large absolute errors. Experiments performed on a simulated database support the analysis.

212 citations