MagNet: A Two-Pronged Defense against Adversarial Examples

doi:10.1145/3133956.3134057

Proceedings ArticleDOI

MagNet: A Two-Pronged Defense against Adversarial Examples

Dongyu Meng, +1 more

- pp 135-147

Chats0

TLDR

MagNet, a framework for defending neural network classifiers against adversarial examples, is proposed and it is shown empirically that MagNet is effective against the most advanced state-of-the-art attacks in blackbox and graybox scenarios without sacrificing false positive rate on normal examples.

Abstract:

Deep learning has shown impressive performance on hard perceptual problems. However, researchers found deep learning systems to be vulnerable to small, specially crafted perturbations that are imperceptible to humans. Such perturbations cause deep learning systems to mis-classify adversarial examples, with potentially disastrous consequences where safety or security is crucial. Prior defenses against adversarial examples either targeted specific attacks or were shown to be ineffective. We propose MagNet, a framework for defending neural network classifiers against adversarial examples. MagNet neither modifies the protected classifier nor requires knowledge of the process for generating adversarial examples. MagNet includes one or more separate detector networks and a reformer network. The detector networks learn to differentiate between normal and adversarial examples by approximating the manifold of normal examples. Since they assume no specific process for generating adversarial examples, they generalize well. The reformer network moves adversarial examples towards the manifold of normal examples, which is effective for correctly classifying adversarial examples with small perturbation. We discuss the intrinsic difficulties in defending against whitebox attack and propose a mechanism to defend against graybox attack. Inspired by the use of randomness in cryptography, we use diversity to strengthen MagNet. We show empirically that MagNet is effective against the most advanced state-of-the-art attacks in blackbox and graybox scenarios without sacrificing false positive rate on normal examples.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey

Naveed Akhtar, +1 more

- 19 Feb 2018 -

IEEE Access

TL;DR: A comprehensive survey on adversarial attacks on deep learning in computer vision can be found in this paper, where the authors review the works that design adversarial attack, analyze the existence of such attacks and propose defenses against them.

...read moreread less

Journal ArticleDOI

Adversarial Examples: Attacks and Defenses for Deep Learning

Xiaoyong Yuan, +3 more

- 14 Jan 2019 -

IEEE Transactions on Neural Networks

TL;DR: In this paper, the authors review recent findings on adversarial examples for DNNs, summarize the methods for generating adversarial samples, and propose a taxonomy of these methods.

...read moreread less

Proceedings ArticleDOI

Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks.

Weilin Xu, +2 more

Abstract: Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by \emph{adversarial examples} that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, \emph{feature squeezing}, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives. This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.

...read moreread less

Proceedings ArticleDOI

Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks

Bolun Wang, +6 more

TL;DR: This work presents the first robust and generalizable detection and mitigation system for DNN backdoor attacks, and identifies multiple mitigation techniques via input filters, neuron pruning and unlearning.

...read moreread less

Proceedings ArticleDOI

Trojaning Attack on Neural Networks

Yingqi Liu, +6 more

TL;DR: A trojaning attack on neuron networks that can be successfully triggered without affecting its test accuracy for normal input data, and it only takes a small amount of time to attack a complex neuron network model.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Dissertation

Learning Multiple Layers of Features from Tiny Images

Alex Krizhevsky

TL;DR: In this paper, the authors describe how to train a multi-layer generative model of natural images, using a dataset of millions of tiny colour images, described in the next section.

...read moreread less

Posted Content

Distilling the Knowledge in a Neural Network

Geoffrey E. Hinton, +2 more

- 09 Mar 2015 -

arXiv: Machine Learning

TL;DR: This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.

...read moreread less

Proceedings Article

Intriguing properties of neural networks

Christian Szegedy, +7 more

TL;DR: It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.

...read moreread less

Journal ArticleDOI

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Geoffrey E. Hinton, +10 more

- 18 Oct 2012 -

IEEE Signal Processing Magazine

TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

...read moreread less

Collapse

MagNet: A Two-Pronged Defense against Adversarial Examples

Citations

Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey

Adversarial Examples: Attacks and Defenses for Deep Learning

Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks.

Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks

Trojaning Attack on Neural Networks

References

Deep Residual Learning for Image Recognition

Learning Multiple Layers of Features from Tiny Images

Distilling the Knowledge in a Neural Network

Intriguing properties of neural networks

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Related Papers (5)

Explaining and Harnessing Adversarial Examples

Towards Evaluating the Robustness of Neural Networks

Intriguing properties of neural networks

Towards Deep Learning Models Resistant to Adversarial Attacks.

DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks