scispace - formally typeset
Search or ask a question

Showing papers by "Nicholas Carlini published in 2017"


Proceedings ArticleDOI
22 May 2017
TL;DR: In this paper, the authors demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with 100% probability.
Abstract: Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples: given an input x and any target classification t, it is possible to find a new input x' that is similar to x but classified as t. This makes it difficult to apply neural networks in security-critical areas. Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from 95% to 0.5%.In this paper, we demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with 100% probability. Our attacks are tailored to three distance metrics used previously in the literature, and when compared to previous adversarial example generation algorithms, our attacks are often much more effective (and never worse). Furthermore, we propose using high-confidence adversarial examples in a simple transferability test we show can also be used to break defensive distillation. We hope our attacks will be used as a benchmark in future defense attempts to create neural networks that resist adversarial examples.

6,528 citations


Proceedings ArticleDOI
03 Nov 2017
TL;DR: In this paper, the authors survey ten recent proposals for adversarial examples and compare their efficacy, concluding that all can be defeated by constructing new loss functions, and propose several simple guidelines for evaluating future proposed defenses.
Abstract: Neural networks are known to be vulnerable to adversarial examples: inputs that are close to natural inputs but classified incorrectly. In order to better understand the space of adversarial examples, we survey ten recent proposals that are designed for detection and compare their efficacy. We show that all can be defeated by constructing new loss functions. We conclude that adversarial examples are significantly harder to detect than previously appreciated, and the properties believed to be intrinsic to adversarial examples are in fact not. Finally, we propose several simple guidelines for evaluating future proposed defenses.

1,703 citations


Posted Content
TL;DR: It is shown that an adaptive adversary can create adversarial examples successfully with low distortion, implying that ensemble of weak defenses is not sufficient to provide strong defense against adversarialExamples.
Abstract: Ongoing research has proposed several methods to defend neural networks against adversarial examples, many of which researchers have shown to be ineffective. We ask whether a strong defense can be created by combining multiple (possibly weak) defenses. To answer this question, we study three defenses that follow this approach. Two of these are recently proposed defenses that intentionally combine components designed to work well together. A third defense combines three independent defenses. For all the components of these defenses and the combined defenses themselves, we show that an adaptive adversary can create adversarial examples successfully with low distortion. Thus, our work implies that ensemble of weak defenses is not sufficient to provide strong defense against adversarial examples.

268 citations


01 Jan 2017
TL;DR: In this article, the authors show that ensemble of weak defenses is not sufficient to provide strong defense against adversarial examples, and they show that an adaptive adversary can create adversarial samples successfully with low distortion.
Abstract: Ongoing research has proposed several methods to defend neural networks against adversarial examples, many of which researchers have shown to be ineffective. We ask whether a strong defense can be created by combining multiple (possibly weak) defenses. To answer this question, we study three defenses that follow this approach. Two of these are recently proposed defenses that intentionally combine components designed to work well together. A third defense combines three independent defenses. For all the components of these defenses and the combined defenses themselves, we show that an adaptive adversary can create adversarial examples successfully with low distortion. Thus, our work implies that ensemble of weak defenses is not sufficient to provide strong defense against adversarial examples.

231 citations


Posted Content
TL;DR: It is found that adversarial examples can be constructed that defeatMagNet and "Efficient Defenses" with only a slight increase in distortion.
Abstract: MagNet and "Efficient Defenses..." were recently proposed as a defense to adversarial examples. We find that we can construct adversarial examples that defeat these defenses with only a slight increase in distortion.

217 citations


Posted Content
TL;DR: In this article, the authors survey ten recent proposals that are designed for detection of adversarial examples and compare their efficacy, concluding that all of them can be defeated by constructing new loss functions.
Abstract: Neural networks are known to be vulnerable to adversarial examples: inputs that are close to natural inputs but classified incorrectly In order to better understand the space of adversarial examples, we survey ten recent proposals that are designed for detection and compare their efficacy We show that all can be defeated by constructing new loss functions We conclude that adversarial examples are significantly harder to detect than previously appreciated, and the properties believed to be intrinsic to adversarial examples are in fact not Finally, we propose several simple guidelines for evaluating future proposed defenses

164 citations


Posted Content
TL;DR: It is demonstrated that one of the recent ICLR defense proposals, adversarial retraining, provably succeeds at increasing the distortion required to construct adversarial examples by a factor of 4.2.
Abstract: The ability to deploy neural networks in real-world, safety-critical systems is severely limited by the presence of adversarial examples: slightly perturbed inputs that are misclassified by the network. In recent years, several techniques have been proposed for increasing robustness to adversarial examples --- and yet most of these have been quickly shown to be vulnerable to future attacks. For example, over half of the defenses proposed by papers accepted at ICLR 2018 have already been broken. We propose to address this difficulty through formal verification techniques. We show how to construct provably minimally distorted adversarial examples: given an arbitrary neural network and input sample, we can construct adversarial examples which we prove are of minimal distortion. Using this approach, we demonstrate that one of the recent ICLR defense proposals, adversarial retraining, provably succeeds at increasing the distortion required to construct adversarial examples by a factor of 4.2.

115 citations