Home
/
Authors
/
Dawn Song

Author

Dawn Song

Other affiliations: FireEye, Inc., University of California, Tsinghua University ...read more

Bio: Dawn Song is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: Computer science & Malware. The author has an hindex of 117, co-authored 460 publications receiving 61572 citations. Previous affiliations of Dawn Song include FireEye, Inc. & University of California.

Topics: Computer science, Malware, Deep learning, Overhead (computing), Web application ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Practical techniques for searches on encrypted data

[...]

Dawn Song¹, David Wagner¹, Adrian Perrig¹•Institutions (1)

University of California, Berkeley¹

14 May 2000

TL;DR: This work describes the cryptographic schemes for the problem of searching on encrypted data and provides proofs of security for the resulting crypto systems, and presents simple, fast, and practical algorithms that are practical to use today.

...read moreread less

Abstract: It is desirable to store data on data storage servers such as mail servers and file servers in encrypted form to reduce security and privacy risks. But this usually implies that one has to sacrifice functionality for security. For example, if a client wishes to retrieve only documents containing certain words, it was not previously known how to let the data storage server perform the search and answer the query, without loss of data confidentiality. We describe our cryptographic schemes for the problem of searching on encrypted data and provide proofs of security for the resulting crypto systems. Our techniques have a number of crucial advantages. They are provably secure: they provide provable secrecy for encryption, in the sense that the untrusted server cannot learn anything about the plaintext when only given the ciphertext; they provide query isolation for searches, meaning that the untrusted server cannot learn anything more about the plaintext than the search result; they provide controlled searching, so that the untrusted server cannot search for an arbitrary word without the user's authorization; they also support hidden queries, so that the user may ask the untrusted server to search for a secret word without revealing the word to the server. The algorithms presented are simple, fast (for a document of length n, the encryption and search algorithms only need O(n) stream cipher and block cipher operations), and introduce almost no space and communication overhead, and hence are practical to use today.

...read moreread less

3,300 citations

Proceedings Article•DOI•

Random key predistribution schemes for sensor networks

[...]

Haowen Chan¹, Adrian Perrig¹, Dawn Song¹•Institutions (1)

Carnegie Mellon University¹

11 May 2003

TL;DR: The random-pairwise keys scheme is presented, which perfectly preserves the secrecy of the rest of the network when any node is captured, and also enables node-to-node authentication and quorum-based revocation.

...read moreread less

Abstract: Key establishment in sensor networks is a challenging problem because asymmetric key cryptosystems are unsuitable for use in resource constrained sensor nodes, and also because the nodes could be physically compromised by an adversary. We present three new mechanisms for key establishment using the framework of pre-distributing a random set of keys to each node. First, in the q-composite keys scheme, we trade off the unlikeliness of a large-scale network attack in order to significantly strengthen random key predistribution's strength against smaller-scale attacks. Second, in the multipath-reinforcement scheme, we show how to strengthen the security between any two nodes by leveraging the security of other links. Finally, we present the random-pairwise keys scheme, which perfectly preserves the secrecy of the rest of the network when any node is captured, and also enables node-to-node authentication and quorum-based revocation.

...read moreread less

3,125 citations

Proceedings Article•DOI•

Provable data possession at untrusted stores

[...]

Giuseppe Ateniese¹, Randal Burns¹, Reza Curtmola¹, Joseph Herring¹, Lea Kissner², Zachary N. J. Peterson¹, Dawn Song³ - Show less +3 more•Institutions (3)

Johns Hopkins University¹, Google², University of California, Berkeley³

28 Oct 2007

TL;DR: The provable data possession (PDP) model as discussed by the authors allows a client that has stored data at an untrusted server to verify that the server possesses the original data without retrieving it.

...read moreread less

Abstract: We introduce a model for provable data possession (PDP) that allows a client that has stored data at an untrusted server to verify that the server possesses the original data without retrieving it. The model generates probabilistic proofs of possession by sampling random sets of blocks from the server, which drastically reduces I/O costs. The client maintains a constant amount of metadata to verify the proof. The challenge/response protocol transmits a small, constant amount of data, which minimizes network communication. Thus, the PDP model for remote data checking supports large data sets in widely-distributed storage system.We present two provably-secure PDP schemes that are more efficient than previous solutions, even when compared with schemes that achieve weaker guarantees. In particular, the overhead at the server is low (or even constant), as opposed to linear in the size of the data. Experiments using our implementation verify the practicality of PDP and reveal that the performance of PDP is bounded by disk I/O and not by cryptographic computation.

...read moreread less

2,238 citations

Journal Article•DOI•

Advances and open problems in federated learning

[...]

Peter Kairouz, H. Brendan McMahan, Brendan Avent¹, Aurélien Bellet², Mehdi Bennis³, Arjun Nitin Bhagoji⁴, Kallista Bonawitz, Zachary Charles, Graham Cormode⁵, Rachel Cummings⁶, Rafael G. L. D'Oliveira⁷, Hubert Eichner, Salim El Rouayheb⁷, David Evans⁸, Josh Gardner⁹, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons¹⁰, Marco Gruteser⁷, Zaid Harchaoui⁹, Chaoyang He¹, Lie He¹¹, Zhouyuan Huo¹², Ben Hutchinson, Justin Hsu¹³, Martin Jaggi¹¹, Tara Javidi¹⁴, Gauri Joshi¹⁰, Mikhail Khodak¹⁰, Jakub Konecní, Aleksandra Korolova¹, Farinaz Koushanfar¹⁴, Sanmi Koyejo¹⁵, Tancrède Lepoint, Yang Liu¹⁶, Prateek Mittal⁴, Mehryar Mohri, Richard Nock¹⁷, Ayfer Ozgur¹⁸, Rasmus Pagh¹⁹, Hang Qi, Daniel Ramage, Ramesh Raskar²⁰, Mariana Raykova, Dawn Song²¹, Weikang Song, Sebastian U. Stich¹¹, Ziteng Sun²², Ananda Theertha Suresh, Florian Tramèr¹⁸, Praneeth Vepakomma²⁰, Jianyu Wang¹⁰, Li Xiong²³, Zheng Xu, Qiang Yang²⁴, Felix X. Yu, Han Yu¹⁶, Sen Zhao - Show less +55 more•Institutions (24)

University of Southern California¹, French Institute for Research in Computer Science and Automation², University of Oulu³, Princeton University⁴, University of Warwick⁵, Georgia Institute of Technology⁶, Rutgers University⁷, University of Virginia⁸, University of Washington⁹, Carnegie Mellon University¹⁰, École Polytechnique Fédérale de Lausanne¹¹, University of Pittsburgh¹², University of Wisconsin-Madison¹³, University of California, San Diego¹⁴, University of Illinois at Urbana–Champaign¹⁵, Nanyang Technological University¹⁶, Australian National University¹⁷, Stanford University¹⁸, IT University of Copenhagen¹⁹, Massachusetts Institute of Technology²⁰, University of California, Berkeley²¹, Cornell University²², Emory University²³, Hong Kong University of Science and Technology²⁴

23 Jun 2021

TL;DR: In this article, the authors describe the state-of-the-art in the field of federated learning from the perspective of distributed optimization, cryptography, security, differential privacy, fairness, compressed sensing, systems, information theory, and statistics.

...read moreread less

Abstract: The term Federated Learning was coined as recently as 2016 to describe a machine learning setting where multiple entities collaborate in solving a machine learning problem, under the coordination of a central server or service provider. Each client’s raw data is stored locally and not exchanged or transferred; instead, focused updates intended for immediate aggregation are used to achieve the learning objective. Since then, the topic has gathered much interest across many different disciplines and the realization that solving many of these interdisciplinary problems likely requires not just machine learning but techniques from distributed optimization, cryptography, security, differential privacy, fairness, compressed sensing, systems, information theory, statistics, and more. This monograph has contributions from leading experts across the disciplines, who describe the latest state-of-the art from their perspective. These contributions have been carefully curated into a comprehensive treatment that enables the reader to understand the work that has been done and get pointers to where effort is required to solve many of the problems before Federated Learning can become a reality in practical systems. Researchers working in the area of distributed systems will find this monograph an enlightening read that may inspire them to work on the many challenging issues that are outlined. This monograph will get the reader up to speed quickly and easily on what is likely to become an increasingly important topic: Federated Learning.

...read moreread less

2,144 citations

Posted Content•

Provable Data Possession at Untrusted Stores.

[...]

Giuseppe Ateniese, Randal Burns, Reza Curtmola, Joseph Herring, Lea Kissner, Zachary N. J. Peterson, Dawn Song - Show less +3 more

01 Jan 2007-IACR Cryptology ePrint Archive

TL;DR: Ateniese et al. as discussed by the authors introduced the provable data possession (PDP) model, which allows a client that has stored data at an untrusted server to verify that the server possesses the original data without retrieving it.

...read moreread less

Abstract: We introduce a model for provable data possession (PDP) that allows a client that has stored data at an untrusted server to verify that the server possesses the original data without retrieving it. The model generates probabilistic proofs of possession by sampling random sets of blocks from the server, which drastically reduces I/O costs. The client maintains a constant amount of metadata to verify the proof. The challenge/response protocol transmits a small, constant amount of data, which minimizes network communication. Thus, the PDP model for remote data checking supports large data sets in widely-distributed storage systems. We present two provably-secure PDP schemes that are more efficient than previous solutions, even when compared with schemes that achieve weaker guarantees. In particular, the overhead at the server is low (or even constant), as opposed to linear in the size of the data. Experiments using our implementation verify the practicality of PDP and reveal that the performance of PDP is bounded by disk I/O and not by cryptographic computation.

...read moreread less

2,127 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•

Language Models are Few-Shot Learners

[...]

Tom B. Brown¹, Benjamin Mann, Nick Ryder², Melanie Subbiah, Jared Kaplan³, Prafulla Dhariwal¹, Arvind Neelakantan⁴, Pranav Shyam, Girish Sastry¹, Amanda Askell¹, Sandhini Agarwal¹, Ariel Herbert-Voss¹, Gretchen Krueger¹, Thomas Henighan¹, Rewon Child¹, Aditya Ramesh¹, Daniel M. Ziegler⁵, Jeffrey Wu¹, Clemens Winter, Christopher Hesse¹, Mark Chen¹, Eric Sigler, Mateusz Litwin, Scott Gray¹, Benjamin Chess¹, Jack Clark¹, Christopher Berner, Samuel McCandlish¹, Alec Radford¹, Ilya Sutskever¹, Dario Amodei¹ - Show less +27 more•Institutions (5)

OpenAI¹, University of California, Berkeley², Johns Hopkins University³, Google⁴, Massachusetts Institute of Technology⁵

28 May 2020

TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

...read moreread less

Abstract: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

...read moreread less

10,132 citations

Proceedings Article•DOI•

Random graphs

[...]

Alan Frieze¹•Institutions (1)

Carnegie Mellon University¹

22 Jan 2006

TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.

...read moreread less

Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

...read moreread less

7,116 citations

Posted Content•

Towards Deep Learning Models Resistant to Adversarial Attacks

[...]

Aleksander Madry¹, Aleksandar Makelov¹, Ludwig Schmidt¹, Dimitris Tsipras¹, Adrian Vladu¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

19 Jun 2017-arXiv: Machine Learning

TL;DR: This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.

...read moreread less

Abstract: Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples---inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. Code and pre-trained models are available at this https URL and this https URL.

...read moreread less

5,789 citations

Journal Article•DOI•

Wireless sensor network survey

[...]

Jennifer Yick¹, Biswanath Mukherjee¹, Dipak Ghosal¹•Institutions (1)

University of California, Davis¹

01 Aug 2008-Computer Networks

TL;DR: This survey presents a comprehensive review of the recent literature since the publication of a survey on sensor networks, and gives an overview of several new applications and then reviews the literature on various aspects of WSNs.

...read moreread less

5,626 citations

Book•

The Algorithmic Foundations of Differential Privacy

[...]

Cynthia Dwork¹, Aaron Roth²•Institutions (2)

Microsoft¹, University of Pennsylvania²

11 Aug 2014

TL;DR: The preponderance of this monograph is devoted to fundamental techniques for achieving differential privacy, and application of these techniques in creative combinations, using the query-release problem as an ongoing example.

...read moreread less

Abstract: The problem of privacy-preserving data analysis has a long history spanning multiple disciplines. As electronic data about individuals becomes increasingly detailed, and as technology enables ever more powerful collection and curation of these data, the need increases for a robust, meaningful, and mathematically rigorous definition of privacy, together with a computationally rich class of algorithms that satisfy this definition. Differential Privacy is such a definition.After motivating and discussing the meaning of differential privacy, the preponderance of this monograph is devoted to fundamental techniques for achieving differential privacy, and application of these techniques in creative combinations, using the query-release problem as an ongoing example. A key point is that, by rethinking the computational goal, one can often obtain far better results than would be achieved by methodically replacing each step of a non-private computation with a differentially private implementation. Despite some astonishingly powerful computational results, there are still fundamental limitations — not just on what can be achieved with differential privacy but on what can be achieved with any method that protects against a complete breakdown in privacy. Virtually all the algorithms discussed herein maintain differential privacy against adversaries of arbitrary computational power. Certain algorithms are computationally intensive, others are efficient. Computational complexity for the adversary and the algorithm are both discussed.We then turn from fundamentals to applications other than queryrelease, discussing differentially private methods for mechanism design and machine learning. The vast majority of the literature on differentially private algorithms considers a single, static, database that is subject to many analyses. Differential privacy in other models, including distributed databases and computations on data streams is discussed.Finally, we note that this work is meant as a thorough introduction to the problems and techniques of differential privacy, but is not intended to be an exhaustive survey — there is by now a vast amount of work in differential privacy, and we can cover only a small portion of it.

...read moreread less

5,190 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse