Top 10 papers published by Nicholas Carlini from Google in 2018

Posted Content•

Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

[...]

Anish Athalye¹, Nicholas Carlini², David Wagner²•Institutions (2)

Massachusetts Institute of Technology¹, University of California, Berkeley²

01 Feb 2018-arXiv: Learning

TL;DR: This work identifies obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples, and develops attack techniques to overcome this effect.

...read moreread less

Abstract: We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimization-based attacks, we find defenses relying on this effect can be circumvented. We describe characteristic behaviors of defenses exhibiting the effect, and for each of the three types of obfuscated gradients we discover, we develop attack techniques to overcome it. In a case study, examining non-certified white-box-secure defenses at ICLR 2018, we find obfuscated gradients are a common occurrence, with 7 of 9 defenses relying on obfuscated gradients. Our new attacks successfully circumvent 6 completely, and 1 partially, in the original threat model each paper considers.

...read moreread less

1,757 citations

Proceedings Article•DOI•

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

[...]

Nicholas Carlini¹, David Wagner¹•Institutions (1)

University of California, Berkeley¹

24 May 2018

TL;DR: A white-box iterative optimization-based attack to Mozilla's implementation DeepSpeech end-to-end has a 100% success rate, and the feasibility of this attack introduce a new domain to study adversarial examples.

...read moreread less

Abstract: We construct targeted audio adversarial examples on automatic speech recognition. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (recognizing up to 50 characters per second of audio). We apply our white-box iterative optimization-based attack to Mozilla's implementation DeepSpeech end-to-end, and show it has a 100% success rate. The feasibility of this attack introduce a new domain to study adversarial examples.

...read moreread less

837 citations

Proceedings Article•

Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

[...]

Anish Athalye¹, Nicholas Carlini², David Wagner²•Institutions (2)

Massachusetts Institute of Technology¹, University of California, Berkeley²

01 Feb 2018

TL;DR: In this article, the authors identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples.

...read moreread less

Abstract: We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimization-based attacks, we find defenses relying on this effect can be circumvented. We describe characteristic behaviors of defenses exhibiting the effect, and for each of the three types of obfuscated gradients we discover, we develop attack techniques to overcome it. In a case study, examining non-certified white-box-secure defenses at ICLR 2018, we find obfuscated gradients are a common occurrence, with 7 of 9 defenses relying on obfuscated gradients. Our new attacks successfully circumvent 6 completely, and 1 partially, in the original threat model each paper considers.

...read moreread less

718 citations

Posted Content•

The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets

[...]

Nicholas Carlini, Chang Liu, Jernej Kos, Úlfar Erlingsson, Dawn Song - Show less +1 more

22 Feb 2018-arXiv: Learning

TL;DR: Exposure, a simple-to-compute metric that can be applied to any deep learning model for measuring the memorization of secrets, is presented and how to extract secrets efficiently using black-box API access is shown.

...read moreread less

Abstract: Machine learning models based on neural networks and deep learning are being rapidly adopted for many purposes. What those models learn, and what they may share, is a significant concern when the training data may contain secrets and the models are public -- e.g., when a model helps users compose text messages using models trained on all users' messages. This paper presents exposure: a simple-to-compute metric that can be applied to any deep learning model for measuring the memorization of secrets. Using this metric, we show how to extract those secrets efficiently using black-box API access. Further, we show that unintended memorization occurs early, is not due to over-fitting, and is a persistent issue across different types of models, hyperparameters, and training strategies. We experiment with both real-world models (e.g., a state-of-the-art translation model) and datasets (e.g., the Enron email dataset, which contains users' credit card numbers) to demonstrate both the utility of measuring exposure and the ability to extract secrets. Finally, we consider many defenses, finding some ineffective (like regularization), and others to lack guarantees. However, by instantiating our own differentially-private recurrent model, we validate that by appropriately investing in the use of state-of-the-art techniques, the problem can be resolved, with high utility.

...read moreread less

182 citations

Posted Content•

On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses

[...]

Anish Athalye, Nicholas Carlini

10 Apr 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This note evaluates the two white-box defenses that appeared at CVPR 2018 and find they are ineffective: when applying existing techniques, one can reduce the accuracy of the defended models to 0%.

...read moreread less

Abstract: Neural networks are known to be vulnerable to adversarial examples. In this note, we evaluate the two white-box defenses that appeared at CVPR 2018 and find they are ineffective: when applying existing techniques, we can reduce the accuracy of the defended models to 0%.

...read moreread less

145 citations

Posted Content•

Unrestricted Adversarial Examples.

[...]

Tom B. Brown, Nicholas Carlini, Chiyuan Zhang, Catherine Olsson, Paul F. Christiano, Ian Goodfellow - Show less +2 more

22 Sep 2018-arXiv: Machine Learning

TL;DR: This work introduces a two-player contest for evaluating the safety and robustness of machine learning systems, with a large prize pool, and shifts the focus to unconstrained adversaries.

...read moreread less

Abstract: We introduce a two-player contest for evaluating the safety and robustness of machine learning systems, with a large prize pool. Unlike most prior work in ML robustness, which studies norm-constrained adversaries, we shift our focus to unconstrained adversaries. Defenders submit machine learning models, and try to achieve high accuracy and coverage on non-adversarial data while making no confident mistakes on adversarial inputs. Attackers try to subvert defenses by finding arbitrary unambiguous inputs where the model assigns an incorrect label with high confidence. We propose a simple unambiguous dataset ("bird-or- bicycle") to use as part of this contest. We hope this contest will help to more comprehensively evaluate the worst-case adversarial risk of machine learning models.

...read moreread less

83 citations

Posted Content•

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

[...]

Nicholas Carlini¹, David Wagner¹•Institutions (1)

University of California, Berkeley¹

05 Jan 2018-arXiv: Learning

TL;DR: In this article, a white-box iterative optimization-based attack was applied to Mozilla's DeepSpeech end-to-end speech recognition system, achieving a 100% success rate.

...read moreread less

Abstract: We construct targeted audio adversarial examples on automatic speech recognition. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (recognizing up to 50 characters per second of audio). We apply our white-box iterative optimization-based attack to Mozilla's implementation DeepSpeech end-to-end, and show it has a 100% success rate. The feasibility of this attack introduce a new domain to study adversarial examples.

...read moreread less

79 citations

Posted Content•

Ground-Truth Adversarial Examples

[...]

Nicholas Carlini, Guy Katz, Clark Barrett¹, David L. Dill¹•Institutions (1)

Stanford University¹

15 Feb 2018

TL;DR: Ground truths are constructed: adversarial examples with a provably-minimal distance from a given input point that can serve to assess the effectiveness of attack techniques and also of defense techniques, by computing the distance to the ground truths before and after the defense is applied, and measuring the improvement.

...read moreread less

Abstract: The ability to deploy neural networks in real-world, safety-critical systems is severely limited by the presence of adversarial examples: slightly perturbed inputs that are misclassified by the network. In recent years, several techniques have been proposed for training networks that are robust to such examples; and each time stronger attacks have been devised, demonstrating the shortcomings of existing defenses. This highlights a key difficulty in designing an effective defense: the inability to assess a network's robustness against future attacks. We propose to address this difficulty through formal verification techniques. We construct ground truths: adversarial examples with a provably-minimal distance from a given input point. We demonstrate how ground truths can serve to assess the effectiveness of attack techniques, by comparing the adversarial examples produced by those attacks to the ground truths; and also of defense techniques, by computing the distance to the ground truths before and after the defense is applied, and measuring the improvement. We use this technique to assess recently suggested attack and defense techniques.

...read moreread less

71 citations

Posted Content•

The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks

[...]

Nicholas Carlini¹, Chang Liu², Úlfar Erlingsson¹, Jernej Kos³, Dawn Song² - Show less +1 more•Institutions (3)

Google¹, University of California, Berkeley², National University of Singapore³

22 Feb 2018-arXiv: Learning

TL;DR: In this article, the authors describe a testing methodology for quantitatively assessing the risk that rare or unique training-data sequences are unintentionally memorized by generative sequence models, a common type of machine learning model.

...read moreread less

Abstract: This paper describes a testing methodology for quantitatively assessing the risk that rare or unique training-data sequences are unintentionally memorized by generative sequence models---a common type of machine-learning model. Because such models are sometimes trained on sensitive data (e.g., the text of users' private messages), this methodology can benefit privacy by allowing deep-learning practitioners to select means of training that minimize such memorization. In experiments, we show that unintended memorization is a persistent, hard-to-avoid issue that can have serious consequences. Specifically, for models trained without consideration of memorization, we describe new, efficient procedures that can extract unique, secret sequences, such as credit card numbers. We show that our testing strategy is a practical and easy-to-use first line of defense, e.g., by describing its application to quantitatively limit data exposure in Google's Smart Compose, a commercial text-completion neural network trained on millions of users' email messages.

...read moreread less

48 citations

Prototypical Examples in Deep Learning: Metrics, Characteristics, and Utility

[...]

Nicholas Carlini, Ulfar Erlingsson, Nicolas Papernot

27 Sep 2018

21 citations

Showing papers by "Nicholas Carlini published in 2018"