Home
/
Authors
/
Vitaly Shmatikov

Author

Vitaly Shmatikov

Other affiliations: University of Texas at Austin, French Institute for Research in Computer Science and Automation, SRI International ...read more

Bio: Vitaly Shmatikov is an academic researcher from Cornell University. The author has contributed to research in topics: Anonymity & Information privacy. The author has an hindex of 64, co-authored 148 publications receiving 17801 citations. Previous affiliations of Vitaly Shmatikov include University of Texas at Austin & French Institute for Research in Computer Science and Automation.

Topics: Anonymity, Information privacy, Encryption, The Internet, Cryptographic protocol ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998

Papers

PDF

Open Access

More filters

Posted Content•

Membership Inference Attacks against Machine Learning Models

[...]

Reza Shokri¹, Marco Stronati², Congzheng Song¹, Vitaly Shmatikov¹•Institutions (2)

Cornell University¹, French Institute for Research in Computer Science and Automation²

18 Oct 2016-arXiv: Cryptography and Security

TL;DR: In this paper, a membership inference attack is proposed to determine if a record was in the training dataset of a black-box machine learning model using a black box access to the model.

...read moreread less

Abstract: We quantitatively investigate how machine learning models leak information about the individual data records on which they were trained. We focus on the basic membership inference attack: given a data record and black-box access to a model, determine if the record was in the model's training dataset. To perform membership inference against a target model, we make adversarial use of machine learning and train our own inference model to recognize differences in the target model's predictions on the inputs that it trained on versus the inputs that it did not train on. We empirically evaluate our inference techniques on classification models trained by commercial "machine learning as a service" providers such as Google and Amazon. Using realistic datasets and classification tasks, including a hospital discharge dataset whose membership is sensitive from the privacy perspective, we show that these models can be vulnerable to membership inference attacks. We then investigate the factors that influence this leakage and evaluate mitigation strategies.

...read moreread less

1,030 citations

Proceedings Article•

How To Backdoor Federated Learning.

[...]

Eugene Bagdasaryan¹, Andreas Veit¹, Yiqing Hua¹, Deborah Estrin¹, Vitaly Shmatikov¹ - Show less +1 more•Institutions (1)

Cornell University¹

02 Jul 2018

TL;DR: In this article, a new model-poisoning methodology based on model replacement is proposed to poison a global model in federated learning, which can reach 100% accuracy on the backdoor task.

...read moreread less

Abstract: Federated learning enables thousands of participants to construct a deep learning model without sharing their private training data with each other For example, multiple smartphones can jointly train a next-word predictor for keyboards without revealing what individual users type We demonstrate that any participant in federated learning can introduce hidden backdoor functionality into the joint global model, eg, to ensure that an image classifier assigns an attacker-chosen label to images with certain features, or that a word predictor completes certain sentences with an attacker-chosen word We design and evaluate a new model-poisoning methodology based on model replacement An attacker selected in a single round of federated learning can cause the global model to immediately reach 100% accuracy on the backdoor task We evaluate the attack under different assumptions for the standard federated-learning tasks and show that it greatly outperforms data poisoning Our generic constrain-and-scale technique also evades anomaly detection-based defenses by incorporating the evasion into the attacker's loss function during training

...read moreread less

849 citations

Proceedings Article•DOI•

Airavat: security and privacy for MapReduce

[...]

Indrajit Roy¹, Srinath Setty¹, Ann Kilzer¹, Vitaly Shmatikov¹, Emmett Witchel¹ - Show less +1 more•Institutions (1)

University of Texas at Austin¹

28 Apr 2010

TL;DR: Airavat is a novel integration of mandatory access control and differential privacy, a MapReduce-based system which provides strong security and privacy guarantees for distributed computations on sensitive data.

...read moreread less

Abstract: We present Airavat, a MapReduce-based system which provides strong security and privacy guarantees for distributed computations on sensitive data Airavat is a novel integration of mandatory access control and differential privacy Data providers control the security policy for their sensitive data, including a mathematical bound on potential privacy violations Users without security expertise can perform computations on the data, but Airavat confines these computations, preventing information leakage beyond the data provider's policyOur prototype implementation demonstrates the flexibility of Airavat on several case studies The prototype is efficient, with run times on Amazon's cloud computing infrastructure within 32% of a MapReduce system with no security

...read moreread less

498 citations

Proceedings Article•DOI•

The most dangerous code in the world: validating SSL certificates in non-browser software

[...]

Martin Georgiev¹, Subodh Iyengar², Suman Jana¹, Rishita Anubhai², Dan Boneh², Vitaly Shmatikov¹ - Show less +2 more•Institutions (2)

University of Texas at Austin¹, Stanford University²

16 Oct 2012

TL;DR: It is demonstrated that SSL certificate validation is completely broken in many security-critical applications and libraries and badly designed APIs of SSL implementations and data-transport libraries which present developers with a confusing array of settings and options are analyzed.

...read moreread less

Abstract: SSL (Secure Sockets Layer) is the de facto standard for secure Internet communications. Security of SSL connections against an active network attacker depends on correctly validating public-key certificates presented when the connection is established.We demonstrate that SSL certificate validation is completely broken in many security-critical applications and libraries. Vulnerable software includes Amazon's EC2 Java library and all cloud clients based on it; Amazon's and PayPal's merchant SDKs responsible for transmitting payment details from e-commerce sites to payment gateways; integrated shopping carts such as osCommerce, ZenCart, Ubercart, and PrestaShop; AdMob code used by mobile websites; Chase mobile banking and several other Android apps and libraries; Java Web-services middleware including Apache Axis, Axis 2, Codehaus XFire, and Pusher library for Android and all applications employing this middleware. Any SSL connection from any of these programs is insecure against a man-in-the-middle attack.The root causes of these vulnerabilities are badly designed APIs of SSL implementations (such as JSSE, OpenSSL, and GnuTLS) and data-transport libraries (such as cURL) which present developers with a confusing array of settings and options. We analyze perils and pitfalls of SSL certificate validation in software based on these APIs and present our recommendations.

...read moreread less

492 citations

Proceedings Article•DOI•

Fast dictionary attacks on passwords using time-space tradeoff

[...]

Arvind Narayanan¹, Vitaly Shmatikov¹•Institutions (1)

University of Texas at Austin¹

07 Nov 2005

TL;DR: It is demonstrated that as long as passwords remain human-memorable, they are vulnerable to "smart-dictionary" attacks even when the space of potential passwords is large, calling into question viability of human- Memorable character-sequence passwords as an authentication mechanism.

...read moreread less

Abstract: Human-memorable passwords are a mainstay of computer security. To decrease vulnerability of passwords to brute-force dictionary attacks, many organizations enforce complicated password-creation rules and require that passwords include numerals and special characters. We demonstrate that as long as passwords remain human-memorable, they are vulnerable to "smart-dictionary" attacks even when the space of potential passwords is large.Our first insight is that the distribution of letters in easy-to-remember passwords is likely to be similar to the distribution of letters in the users' native language. Using standard Markov modeling techniques from natural language processing, this can be used to dramatically reduce the size of the password space to be searched. Our second contribution is an algorithm for efficient enumeration of the remaining password space. This allows application of time-space tradeoff techniques, limiting memory accesses to a relatively small table of "partial dictionary" sizes and enabling a very fast dictionary attack.We evaluated our method on a database of real-world user password hashes. Our algorithm successfully recovered 67.6% of the passwords using a 2 x 109 search space. This is a much higher percentage than Oechslin's "rainbow" attack, which is the fastest currently known technique for searching large keyspaces. These results call into question viability of human-memorable character-sequence passwords as an authentication mechanism.

...read moreread less

419 citations

1
2
3
4
5
…
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

I and i

[...]

Kevin Barraclough

08 Dec 2001-BMJ

TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.

...read moreread less

Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

...read moreread less

33,785 citations

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Posted Content•

Communication-Efficient Learning of Deep Networks from Decentralized Data

[...]

H. Brendan McMahan¹, Eider Moore¹, Daniel Ramage¹, Seth Hampson, Blaise Aguera y Arcas¹ - Show less +1 more•Institutions (1)

Google¹

17 Feb 2016-arXiv: Learning

TL;DR: This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.

...read moreread less

Abstract: Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image models can automatically select good photos. However, this rich data is often privacy sensitive, large in quantity, or both, which may preclude logging to the data center and training there using conventional approaches. We advocate an alternative that leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates. We term this decentralized approach Federated Learning. We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets. These experiments demonstrate the approach is robust to the unbalanced and non-IID data distributions that are a defining characteristic of this setting. Communication costs are the principal constraint, and we show a reduction in required communication rounds by 10-100x as compared to synchronized stochastic gradient descent.

...read moreread less

5,936 citations

Book•

The Algorithmic Foundations of Differential Privacy

[...]

Cynthia Dwork¹, Aaron Roth²•Institutions (2)

Microsoft¹, University of Pennsylvania²

11 Aug 2014

TL;DR: The preponderance of this monograph is devoted to fundamental techniques for achieving differential privacy, and application of these techniques in creative combinations, using the query-release problem as an ongoing example.

...read moreread less

Abstract: The problem of privacy-preserving data analysis has a long history spanning multiple disciplines. As electronic data about individuals becomes increasingly detailed, and as technology enables ever more powerful collection and curation of these data, the need increases for a robust, meaningful, and mathematically rigorous definition of privacy, together with a computationally rich class of algorithms that satisfy this definition. Differential Privacy is such a definition.After motivating and discussing the meaning of differential privacy, the preponderance of this monograph is devoted to fundamental techniques for achieving differential privacy, and application of these techniques in creative combinations, using the query-release problem as an ongoing example. A key point is that, by rethinking the computational goal, one can often obtain far better results than would be achieved by methodically replacing each step of a non-private computation with a differentially private implementation. Despite some astonishingly powerful computational results, there are still fundamental limitations — not just on what can be achieved with differential privacy but on what can be achieved with any method that protects against a complete breakdown in privacy. Virtually all the algorithms discussed herein maintain differential privacy against adversaries of arbitrary computational power. Certain algorithms are computationally intensive, others are efficient. Computational complexity for the adversary and the algorithm are both discussed.We then turn from fundamentals to applications other than queryrelease, discussing differentially private methods for mechanism design and machine learning. The vast majority of the literature on differentially private algorithms considers a single, static, database that is subject to many analyses. Differential privacy in other models, including distributed databases and computations on data streams is discussed.Finally, we note that this work is meant as a thorough introduction to the problems and techniques of differential privacy, but is not intended to be an exhaustive survey — there is by now a vast amount of work in differential privacy, and we can cover only a small portion of it.

...read moreread less

5,190 citations

Book Chapter•DOI•

Differential privacy: a survey of results

[...]

Cynthia Dwork¹•Institutions (1)

Microsoft¹

25 Apr 2008

TL;DR: This survey recalls the definition of differential privacy and two basic techniques for achieving it, and shows some interesting applications of these techniques, presenting algorithms for three specific tasks and three general results on differentially private learning.

...read moreread less

Abstract: Over the past five years a new approach to privacy-preserving data analysis has born fruit [13, 18, 7, 19, 5, 37, 35, 8, 32]. This approach differs from much (but not all!) of the related literature in the statistics, databases, theory, and cryptography communities, in that a formal and ad omnia privacy guarantee is defined, and the data analysis techniques presented are rigorously proved to satisfy the guarantee. The key privacy guarantee that has emerged is differential privacy. Roughly speaking, this ensures that (almost, and quantifiably) no risk is incurred by joining a statistical database. In this survey, we recall the definition of differential privacy and two basic techniques for achieving it. We then show some interesting applications of these techniques, presenting algorithms for three specific tasks and three general results on differentially private learning.

...read moreread less

3,314 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse