scispace - formally typeset
Search or ask a question
Author

Cynthia Dwork

Other affiliations: University of California, Berkeley, IBM, Microsoft  ...read more
Bio: Cynthia Dwork is an academic researcher from Harvard University. The author has contributed to research in topics: Differential privacy & Information privacy. The author has an hindex of 80, co-authored 229 publications receiving 48654 citations. Previous affiliations of Cynthia Dwork include University of California, Berkeley & IBM.


Papers
More filters
Book ChapterDOI
04 Mar 2006
TL;DR: In this article, the authors show that for several particular applications substantially less noise is needed than was previously understood to be the case, and also show the separation results showing the increased value of interactive sanitization mechanisms over non-interactive.
Abstract: We continue a line of research initiated in [10,11]on privacy-preserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function f mapping databases to reals, the so-called true answer is the result of applying f to the database. To protect privacy, the true answer is perturbed by the addition of random noise generated according to a carefully chosen distribution, and this response, the true answer plus noise, is returned to the user. Previous work focused on the case of noisy sums, in which f = ∑ig(xi), where xi denotes the ith row of the database and g maps database rows to [0,1]. We extend the study to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f. Roughly speaking, this is the amount that any single argument to f can change its output. The new analysis shows that for several particular applications substantially less noise is needed than was previously understood to be the case. The first step is a very clean characterization of privacy in terms of indistinguishability of transcripts. Additionally, we obtain separation results showing the increased value of interactive sanitization mechanisms over non-interactive.

6,211 citations

Book
11 Aug 2014
TL;DR: The preponderance of this monograph is devoted to fundamental techniques for achieving differential privacy, and application of these techniques in creative combinations, using the query-release problem as an ongoing example.
Abstract: The problem of privacy-preserving data analysis has a long history spanning multiple disciplines. As electronic data about individuals becomes increasingly detailed, and as technology enables ever more powerful collection and curation of these data, the need increases for a robust, meaningful, and mathematically rigorous definition of privacy, together with a computationally rich class of algorithms that satisfy this definition. Differential Privacy is such a definition.After motivating and discussing the meaning of differential privacy, the preponderance of this monograph is devoted to fundamental techniques for achieving differential privacy, and application of these techniques in creative combinations, using the query-release problem as an ongoing example. A key point is that, by rethinking the computational goal, one can often obtain far better results than would be achieved by methodically replacing each step of a non-private computation with a differentially private implementation. Despite some astonishingly powerful computational results, there are still fundamental limitations — not just on what can be achieved with differential privacy but on what can be achieved with any method that protects against a complete breakdown in privacy. Virtually all the algorithms discussed herein maintain differential privacy against adversaries of arbitrary computational power. Certain algorithms are computationally intensive, others are efficient. Computational complexity for the adversary and the algorithm are both discussed.We then turn from fundamentals to applications other than queryrelease, discussing differentially private methods for mechanism design and machine learning. The vast majority of the literature on differentially private algorithms considers a single, static, database that is subject to many analyses. Differential privacy in other models, including distributed databases and computations on data streams is discussed.Finally, we note that this work is meant as a thorough introduction to the problems and techniques of differential privacy, but is not intended to be an exhaustive survey — there is by now a vast amount of work in differential privacy, and we can cover only a small portion of it.

5,190 citations

Book ChapterDOI
Cynthia Dwork1
10 Jul 2006
TL;DR: In this article, the authors give a general impossibility result showing that a formalization of Dalenius' goal along the lines of semantic security cannot be achieved, and suggest a new measure, differential privacy, which, intuitively, captures the increased risk to one's privacy incurred by participating in a database.
Abstract: In 1977 Dalenius articulated a desideratum for statistical databases: nothing about an individual should be learnable from the database that cannot be learned without access to the database. We give a general impossibility result showing that a formalization of Dalenius' goal along the lines of semantic security cannot be achieved. Contrary to intuition, a variant of the result threatens the privacy even of someone not in the database. This state of affairs suggests a new measure, differential privacy, which, intuitively, captures the increased risk to one's privacy incurred by participating in a database. The techniques developed in a sequence of papers [8, 13, 3], culminating in those described in [12], can achieve any desired level of privacy under this measure. In many cases, extremely accurate information about the database can be provided while simultaneously ensuring very high levels of privacy

4,134 citations

Journal Article
TL;DR: The study is extended to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f, which is the amount that any single argument to f can change its output.
Abstract: We continue a line of research initiated in [10, 11] on privacy-preserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function f mapping databases to reals, the so-called true answer is the result of applying f to the database. To protect privacy, the true answer is perturbed by the addition of random noise generated according to a carefully chosen distribution, and this response, the true answer plus noise, is returned to the user. Previous work focused on the case of noisy sums, in which f = Σ i g(x i ), where x i denotes the ith row of the database and g maps database rows to [0,1]. We extend the study to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f. Roughly speaking, this is the amount that any single argument to f can change its output. The new analysis shows that for several particular applications substantially less noise is needed than was previously understood to be the case. The first step is a very clean characterization of privacy in terms of indistinguishability of transcripts. Additionally, we obtain separation results showing the increased value of interactive sanitization mechanisms over non-interactive.

3,629 citations

Book ChapterDOI
Cynthia Dwork1
25 Apr 2008
TL;DR: This survey recalls the definition of differential privacy and two basic techniques for achieving it, and shows some interesting applications of these techniques, presenting algorithms for three specific tasks and three general results on differentially private learning.
Abstract: Over the past five years a new approach to privacy-preserving data analysis has born fruit [13, 18, 7, 19, 5, 37, 35, 8, 32]. This approach differs from much (but not all!) of the related literature in the statistics, databases, theory, and cryptography communities, in that a formal and ad omnia privacy guarantee is defined, and the data analysis techniques presented are rigorously proved to satisfy the guarantee. The key privacy guarantee that has emerged is differential privacy. Roughly speaking, this ensures that (almost, and quantifiably) no risk is incurred by joining a statistical database. In this survey, we recall the definition of differential privacy and two basic techniques for achieving it. We then show some interesting applications of these techniques, presenting algorithms for three specific tasks and three general results on differentially private learning.

3,314 citations


Cited by
More filters
Book
01 Jan 1996
TL;DR: A valuable reference for the novice as well as for the expert who needs a wider scope of coverage within the area of cryptography, this book provides easy and rapid access of information and includes more than 200 algorithms and protocols.
Abstract: From the Publisher: A valuable reference for the novice as well as for the expert who needs a wider scope of coverage within the area of cryptography, this book provides easy and rapid access of information and includes more than 200 algorithms and protocols; more than 200 tables and figures; more than 1,000 numbered definitions, facts, examples, notes, and remarks; and over 1,250 significant references, including brief comments on each paper.

13,597 citations

Patent
30 Sep 2010
TL;DR: In this article, the authors proposed a secure content distribution method for a configurable general-purpose electronic commercial transaction/distribution control system, which includes a process for encapsulating digital information in one or more digital containers, a process of encrypting at least a portion of digital information, a protocol for associating at least partially secure control information for managing interactions with encrypted digital information and/or digital container, and a process that delivering one or multiple digital containers to a digital information user.
Abstract: PROBLEM TO BE SOLVED: To solve the problem, wherein it is impossible for an electronic content information provider to provide commercially secure and effective method, for a configurable general-purpose electronic commercial transaction/distribution control system. SOLUTION: In this system, having at least one protected processing environment for safely controlling at least one portion of decoding of digital information, a secure content distribution method comprises a process for encapsulating digital information in one or more digital containers; a process for encrypting at least a portion of digital information; a process for associating at least partially secure control information for managing interactions with encrypted digital information and/or digital container; a process for delivering one or more digital containers to a digital information user; and a process for using a protected processing environment, for safely controlling at least a portion of the decoding of the digital information. COPYRIGHT: (C)2006,JPO&NCIPI

7,643 citations

Book ChapterDOI
19 Aug 2001
TL;DR: This work proposes a fully functional identity-based encryption scheme (IBE) based on the Weil pairing that has chosen ciphertext security in the random oracle model assuming an elliptic curve variant of the computational Diffie-Hellman problem.
Abstract: We propose a fully functional identity-based encryption scheme (IBE). The scheme has chosen ciphertext security in the random oracle model assuming an elliptic curve variant of the computational Diffie-Hellman problem. Our system is based on the Weil pairing. We give precise definitions for secure identity based encryption schemes and give several applications for such systems.

7,083 citations

Book ChapterDOI
04 Mar 2006
TL;DR: In this article, the authors show that for several particular applications substantially less noise is needed than was previously understood to be the case, and also show the separation results showing the increased value of interactive sanitization mechanisms over non-interactive.
Abstract: We continue a line of research initiated in [10,11]on privacy-preserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function f mapping databases to reals, the so-called true answer is the result of applying f to the database. To protect privacy, the true answer is perturbed by the addition of random noise generated according to a carefully chosen distribution, and this response, the true answer plus noise, is returned to the user. Previous work focused on the case of noisy sums, in which f = ∑ig(xi), where xi denotes the ith row of the database and g maps database rows to [0,1]. We extend the study to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f. Roughly speaking, this is the amount that any single argument to f can change its output. The new analysis shows that for several particular applications substantially less noise is needed than was previously understood to be the case. The first step is a very clean characterization of privacy in terms of indistinguishability of transcripts. Additionally, we obtain separation results showing the increased value of interactive sanitization mechanisms over non-interactive.

6,211 citations

Posted Content
H. Brendan McMahan1, Eider Moore1, Daniel Ramage1, Seth Hampson, Blaise Aguera y Arcas1 
TL;DR: This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.
Abstract: Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image models can automatically select good photos. However, this rich data is often privacy sensitive, large in quantity, or both, which may preclude logging to the data center and training there using conventional approaches. We advocate an alternative that leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates. We term this decentralized approach Federated Learning. We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets. These experiments demonstrate the approach is robust to the unbalanced and non-IID data distributions that are a defining characteristic of this setting. Communication costs are the principal constraint, and we show a reduction in required communication rounds by 10-100x as compared to synchronized stochastic gradient descent.

5,936 citations