scispace - formally typeset
Open AccessProceedings ArticleDOI

Universal Adversarial Perturbations

Reads0
Chats0
TLDR
The surprising existence of universal perturbations reveals important geometric correlations among the high-dimensional decision boundary of classifiers and outlines potential security breaches with the existence of single directions in the input space that adversaries can possibly exploit to break a classifier on most natural images.
Abstract
Given a state-of-the-art deep neural network classifier, we show the existence of a universal (image-agnostic) and very small perturbation vector that causes natural images to be misclassified with high probability We propose a systematic algorithm for computing universal perturbations, and show that state-of-the-art deep neural networks are highly vulnerable to such perturbations, albeit being quasi-imperceptible to the human eye We further empirically analyze these universal perturbations and show, in particular, that they generalize very well across neural networks The surprising existence of universal perturbations reveals important geometric correlations among the high-dimensional decision boundary of classifiers It further outlines potential security breaches with the existence of single directions in the input space that adversaries can possibly exploit to break a classifier on most natural images

read more

Content maybe subject to copyright    Report

Universal adversarial perturbations
Seyed-Mohsen Moosavi-Dezfooli
seyed.moosavi@epfl.ch
Alhussein Fawzi
hussein.fawzi@gmail.com
Omar Fawzi
omar.fawzi@ens-lyon.fr
Pascal Frossard
pascal.frossard@epfl.ch
Abstract
Given a state-of-the-art deep neural network classifier,
we show the existence of a universal
(image-agnostic) and
very small perturbation vector that causes natural images
to be misclassified with high probability. We propose a sys-
tematic algorithm for computing universal perturbations,
and show that state-of-the-art deep neural networks are
highly vulnerable to such perturbations, albeit being quasi-
imperceptible to the human eye. We further empirically an-
alyze these universal perturbations and show, in particular,
that they generalize very well across neural networks. The
surprising existence of universal perturbations reveals im-
portant geometric correlations among the high-dimensional
decision boundary of classifiers. It further outlines poten-
tial security breaches with the existence of single directions
in the input space that adversaries can possibly exploit to
break a classifier on most natural images.
1
1. Introduction
Can we find a single small image perturbation that fools
a state-of-the-art deep neural network classifier on all nat-
ural images? We show in this paper the existence of such
quasi-imperceptible universal perturbation vectors that lead
to misclassify natural images with high probability. Specif-
ically, by adding such a quasi-imperceptible perturbation
to natural images, the label estimated by the deep neu-
ral network is changed with high probability (see Fig.
1).
Such perturbations are dubbed universal, as they are image-
agnostic. The existence of these perturbations is problem-
atic when the classifier is deployed in real-world (and pos-
sibly hostile) environments, as they can be exploited by ad-
versaries to break the classifier. Indeed, the perturbation
The first two authors contributed equally to this work.
´
Ecole Polytechnique F
´
ed
´
erale de Lausanne, Switzerland
ENS de Lyon, LIP, UMR 5668 ENS Lyon - CNRS - UCBL - INRIA,
Universit
´
e de Lyon, France
1
The code is available for download on
https://github.com/
LTS4/universal
. A demo can be found on https://youtu.be/
jhOu5yhe0rc
.
J
o
y
s
t
ick
W
h
i
p
t
ail
l
iza
r
d
Balloon
Lycaenid
Tibetan mastiff
T
h
r
es
h
e
r
Gril
l
e
F
l
ag
p
o
l
e
Fac
e p
o
w
d
er
Labrador
C
hih
ua
h
u
a
C
hi
hu
ah
ua
J
ay
Labrador
La
b
rado
r
Tibetan mastiff
Brabancon griffon
Border terrier
Figure 1: When added to a natural image, a universal per-
turbation image causes the image to be misclassified by the
deep neural network with high probability. Left images:
Original natural images. The labels are shown on top of
each arrow. Central image: Universal perturbation. Right
images: Perturbed images. The estimated labels of the per-
turbed images are shown on top of each arrow.
1765

process involves the mere addition of one very small pertur-
bation to all natural images, and can be relatively straight-
forward to implement by adversaries in real-world environ-
ments, while being relatively difficult to detect as such per-
turbations are very small and thus do not significantly affect
data distributions. The surprising existence of universal per-
turbations further reveals new insights on the topology of
the decision boundaries of deep neural networks. We sum-
marize the main contributions of this paper as follows:
We show the existence of universal image-agnostic
perturbations for state-of-the-art deep neural networks.
We propose an algorithm for finding such perturba-
tions. The algorithm seeks a universal perturbation for
a set of training points, and proceeds by aggregating
atomic perturbation vectors that send successive data-
points to the decision boundary of the classifier.
We show that universal perturbations have a remark-
able generalization property, as perturbations com-
puted for a rather small set of training points fool new
images with high probability.
We show that such perturbations are not only univer-
sal across images, but also generalize well across deep
neural networks. Such perturbations are therefore dou-
bly universal, both with respect to the data and the net-
work architectures.
We explain and analyze the high vulnerability of deep
neural networks to universal perturbations by examin-
ing the geometric correlation between different parts
of the decision boundary.
The robustness of image classifiers to structured and un-
structured perturbations have recently attracted a lot of at-
tention [
2, 20, 17, 21, 4, 5, 13, 14, 15]. Despite the impres-
sive performance of deep neural network architectures on
challenging visual classification benchmarks [7, 10, 22, 11],
these classifiers were shown to be highly vulnerable to per-
turbations. In [
20], such networks are shown to be unsta-
ble to very small and often imperceptible additive adver-
sarial perturbations. Such carefully crafted perturbations
are either estimated by solving an optimization problem
[
20, 12, 1] or through one step of gradient ascent [6], and
result in a perturbation that fools a specific data point. A
fundamental property of these adversarial perturbations is
their intrinsic dependence on datapoints: the perturbations
are specifically crafted for each data point independently.
As a result, the computation of an adversarial perturbation
for a new data point requires solving a data-dependent opti-
mization problem from scratch, which uses the full knowl-
edge of the classification model. This is different from the
universal perturbation considered in this paper, as we seek
a single perturbation vector that fools the network on most
natural images. Perturbing a new datapoint then only in-
volves the mere addition of the universal perturbation to the
image (and does not require solving an optimization prob-
lem/gradient computation). Finally, we emphasize that our
notion of universal perturbation differs from the general-
ization of adversarial perturbations studied in [
20], where
perturbations computed on the MNIST task were shown to
generalize well across different models. Instead, we exam-
ine the existence of universal perturbations that are common
to most data points belonging to the data distribution.
2. Universal perturbations
We formalize in this section the notion of universal per-
turbations, and propose a method for estimating such per-
turbations. Let µ denote a distribution of images in R
d
, and
ˆ
k define a classification function that outputs for each im-
age x R
d
an estimated label
ˆ
k(x). The main focus of this
paper is to seek perturbation vectors v R
d
that fool the
classifier
ˆ
k on almost all datapoints sampled from µ. That
is, we seek a vector v such that
ˆ
k(x + v) 6=
ˆ
k(x) for “most” x µ.
We coin such a perturbation universal, as it represents a
fixed image-agnostic perturbation that causes label change
for most images sampled from the data distribution µ. We
focus here on the case where the distribution µ represents
the set of natural images, hence containing a huge amount
of variability. In that context, we examine the existence of
small universal perturbations (in terms of the
p
norm with
p [1, )) that misclassify most images. The goal is there-
fore to find v that satisfies the following two constraints:
1. kvk
p
ξ,
2. P
xµ
ˆ
k(x + v) 6=
ˆ
k(x)
1 δ.
The parameter ξ controls the magnitude of the perturbation
vector v, and δ quantifies the desired fooling rate for all
images sampled from the distribution µ.
Algorithm. Let X = {x
1
, . . . , x
m
} be a set of images
sampled from the distribution µ. Our proposed algorithm
seeks a universal perturbation v, such that kvk
p
ξ, while
fooling most images in X. The algorithm proceeds itera-
tively over the data in X and gradually builds the universal
perturbation (see Fig.
2). At each iteration, the minimal per-
turbation v
i
that sends the current perturbed point, x
i
+ v,
to the decision boundary of the classifier is computed, and
aggregated to the current instance of the universal perturba-
tion. In more details, provided the current universal pertur-
bation v does not fool data point x
i
, we seek the extra per-
turbation v
i
with minimal norm that allows to fool data
point x
i
by solving the following optimization problem:
v
i
arg min
r
krk
2
s.t.
ˆ
k(x
i
+ v + r) 6=
ˆ
k(x
i
). (1)
1766

v
1
x
1,2,3
R
1
R
2
v
v
2
R
3
Figure 2: Schematic representation of the proposed algo-
rithm used to compute universal perturbations. In this il-
lustration, data points x
1
, x
2
and x
3
are super-imposed, and
the classification regions R
i
(i.e., regions of constant esti-
mated label) are shown in different colors. Our algorithm
proceeds by aggregating sequentially the minimal perturba-
tions sending the current perturbed points x
i
+ v outside of
the corresponding classification region R
i
.
To ensure that the constraint kvk
p
ξ is satisfied, the up-
dated universal perturbation is further projected on the
p
ball of radius ξ and centered at 0. That is, let P
p,ξ
be the
projection operator defined as follows:
P
p,ξ
(v) = arg min
v
kv v
k
2
subject to kv
k
p
ξ.
Then, our update rule is given by v P
p,ξ
(v + v
i
).
Several passes on the data set X are performed to improve
the quality of the universal perturbation. The algorithm is
terminated when the empirical “fooling rate” on the per-
turbed data set X
v
:= {x
1
+ v, . . . , x
m
+ v} exceeds the
target threshold 1 δ. That is, we stop the algorithm when-
ever Err(X
v
) :=
1
m
P
m
i=1
1
ˆ
k(x
i
+v )6=
ˆ
k(x
i
)
1 δ. The de-
tailed algorithm is provided in Algorithm
1. Interestingly,
in practice, the number of data points m in X need not be
large to compute a universal perturbation that is valid for the
whole distribution µ. In particular, we can set m to be much
smaller than the number of training points (see Section
3).
The proposed algorithm involves solving at most m in-
stances of the optimization problem in Eq. (1) for each pass.
While this optimization problem is not convex when
ˆ
k is a
standard classifier (e.g., a deep neural network), several ef-
ficient approximate methods have been devised for solving
this problem [
20, 12, 8]. We use in the following the ap-
proach in [
12] for its efficency. It should further be noticed
that the objective of Algorithm
1 is not to find the smallest
universal perturbation that fools most data points sampled
from the distribution, but rather to find one such perturba-
tion with sufficiently small norm. In particular, different
Algorithm 1 Computation of universal perturbations.
1: input: Data points X, classifier
ˆ
k, desired
p
norm of
the perturbation ξ, desired accuracy on perturbed sam-
ples δ.
2: output: Universal perturbation vector v.
3: Initialize v 0.
4: while Err(X
v
) 1 δ do
5: for each datapoint x
i
X do
6: if
ˆ
k(x
i
+ v) =
ˆ
k(x
i
) then
7: Compute the minimal perturbation that
sends x
i
+ v to the decision boundary:
v
i
arg min
r
krk
2
s.t.
ˆ
k(x
i
+ v + r) 6=
ˆ
k(x
i
).
8: Update the perturbation:
v P
p,ξ
(v + v
i
).
9: end if
10: end for
11: end while
random shufflings of the set X naturally lead to a diverse
set of universal perturbations v satisfying the required con-
straints. The proposed algorithm can therefore be leveraged
to generate multiple universal perturbations for a deep neu-
ral network (see next section for visual examples).
3. Universal perturbations for deep nets
We now analyze the robustness of state-of-the-art deep
neural network classifiers to universal perturbations using
Algorithm
1.
In a first experiment, we assess the estimated universal
perturbations for different recent deep neural networks on
the ILSVRC 2012 [
16] validation set (50,000 images), and
report the fooling ratio, that is the proportion of images that
change labels when perturbed by our universal perturbation.
Results are reported for p = 2 and p = , where we
respectively set ξ = 2000 and ξ = 10. These numerical
values were chosen in order to obtain a perturbation whose
norm is significantly smaller than the image norms, such
that the perturbation is quasi-imperceptible when added to
natural images
2
. Results are listed in Table 1. Each result
is reported on the set X, which is used to compute the per-
turbation, as well as on the validation set (that is not used
in the process of the computation of the universal pertur-
bation). Observe that for all networks, the universal per-
turbation achieves very high fooling rates on the validation
set. Specifically, the universal perturbations computed for
CaffeNet and VGG-F fool more than 90% of the validation
2
For comparison, the average
2
and
norm of an image in the vali-
dation set is respectively 5 × 10
4
and 250.
1767

CaffeNet [9] VGG-F [3] VGG-16 [18] VGG-19 [18] GoogLeNet [19] ResNet-152 [7]
2
X 85.4% 85.9% 90.7% 86.9% 82.9% 89.7%
Val. 85.6% 87.0% 90.3% 84.5% 82.0% 88.5%
X 93.1% 93.8% 78.5% 77.8% 80.8% 85.4%
Val. 93.3% 93.7% 78.3% 77.8% 78.9% 84.0%
Table 1: Fooling ratios on the set X, and the validation set.
set (for p = ). In other words, for any natural image in
the validation set, the mere addition of our universal per-
turbation fools the classifier more than 9 times out of 10.
This result is moreover not specific to such architectures,
as we can also find universal perturbations that cause VGG,
GoogLeNet and ResNet classifiers to be fooled on natural
images with probability edging 80%. These results have an
element of surprise, as they show the existence of single
universal perturbation vectors that cause natural images to
be misclassified with high probability, albeit being quasi-
imperceptible to humans. To verify this latter claim, we
show visual examples of perturbed images in Fig.
3, where
the GoogLeNet architecture is used. These images are ei-
ther taken from the ILSVRC 2012 validation set, or cap-
tured using a mobile phone camera. Observe that in most
cases, the universal perturbation is quasi-imperceptible, yet
this powerful image-agnostic perturbation is able to mis-
classify any image with high probability for state-of-the-art
classifiers. We refer to supp. material for the original (un-
perturbed) images. We visualize the universal perturbations
corresponding to different networks in Fig.
4. It should
be noted that such universal perturbations are not unique,
as many different universal perturbations (all satisfying the
two required constraints) can be generated for the same net-
work. In Fig.
5, we visualize ve different universal per-
turbations obtained by using different random shufflings in
X. Observe that such universal perturbations are different,
although they exhibit a similar pattern. This is moreover
confirmed by computing the normalized inner products be-
tween two pairs of perturbation images, as the normalized
inner products do not exceed 0.1, which shows that one can
find diverse universal perturbations.
While the above universal perturbations are computed
for a set X of 10,000 images from the training set (i.e., in
average 10 images per class), we now examine the influence
of the size of X on the quality of the universal perturbation.
We show in Fig.
6 the fooling rates obtained on the val-
idation set for different sizes of X for GoogLeNet. Note
for example that with a set X containing only 500 images,
we can fool more than 30% of the images on the validation
set. This result is significant when compared to the num-
ber of classes in ImageNet (1000), as it shows that we can
fool a large set of unseen images, even when using a set
X containing less than one image per class! The universal
perturbations computed using Algorithm
1 have therefore a
remarkable generalization power over unseen data points,
and can be computed on a very small set of training images.
Cross-model universality. While the computed pertur-
bations are universal across unseen data points, we now ex-
amine their cross-model universality. That is, we study to
which extent universal perturbations computed for a spe-
cific architecture (e.g., VGG-19) are also valid for another
architecture (e.g., GoogLeNet). Table
2 displays a matrix
summarizing the universality of such perturbations across
six different architectures. For each architecture, we com-
pute a universal perturbation and report the fooling ratios on
all other architectures; we report these in the rows of Table
2. Observe that, for some architectures, the universal pertur-
bations generalize very well across other architectures. For
example, universal perturbations computed for the VGG-19
network have a fooling ratio above 53% for all other tested
architectures. This result shows that our universal perturba-
tions are, to some extent, doubly-universal as they general-
ize well across data points and very different architectures.
It should be noted that, in [
20], adversarial perturbations
were previously shown to generalize well, to some extent,
across different neural networks on the MNIST problem.
Our results are however different, as we show the general-
izability of universal perturbations across different architec-
tures on the ImageNet data set. This result shows that such
perturbations are of practical relevance, as they generalize
well across data points and architectures. In particular, in
order to fool a new image on an unknown neural network, a
simple addition of a universal perturbation computed on the
VGG-19 architecture is likely to misclassify the data point.
Visualization of the effect of universal perturbations.
To gain insights on the effect of universal perturbations on
natural images, we now visualize the distribution of labels
on the ImageNet validation set. Specifically, we build a di-
rected graph G = (V, E), whose vertices denote the labels,
and directed edges e = (i j) indicate that the majority
of images of class i are fooled into label j when applying
the universal perturbation. The existence of edges i j
therefore suggests that the preferred fooling label for im-
ages of class i is j. We construct this graph for GoogLeNet,
and visualize the full graph in the supp. material for space
constraints. The visualization of this graph shows a very pe-
culiar topology. In particular, the graph is a union of disjoint
components, where all edges in one component mostly con-
1768

wool
Indian elephant
Indian elephant
African grey
tabby
African grey
common newt
carousel
grey fox
macaw
three-toed sloth
macaw
Figure 3: Examples of perturbed images and their corresponding labels. The first 8 images belong to the ILSVRC 2012
validation set, and the last 4 are images taken by a mobile phone camera. See supp. material for the original images.
(a) CaffeNet (b) VGG-F (c) VGG-16
(d) VGG-19 (e) GoogLeNet (f) ResNet-152
Figure 4: Universal perturbations computed for different deep neural network architectures. Images generated with p = ,
ξ = 10. The pixel values are scaled for visibility.
nect to one target label. See Fig. 7 for an illustration of two
connected components. This visualization clearly shows the
existence of several dominant labels, and that universal per-
turbations mostly make natural images classified with such
labels. We hypothesize that these dominant labels occupy
large regions in the image space, and therefore represent
good candidate labels for fooling most natural images. Note
that these dominant labels are automatically found and are
not imposed a priori in the computation of perturbations.
Fine-tuning with universal perturbations. We now ex-
amine the effect of fine-tuning the networks with perturbed
images. We use the VGG-F architecture, and fine-tune the
network based on a modified training set where universal
perturbations are added to a fraction of (clean) training sam-
ples: for each training point, a universal perturbation is
added with probability 0.5, and the original sample is pre-
served with probability 0.5.
3
To account for the diversity
3
In this fine-tuning experiment, we use a slightly modified notion of
universal perturbations, where the direction of the universal vector v is
fixed for all data points, while its magnitude is adaptive. That is, for each
1769

Citations
More filters
Proceedings ArticleDOI

Boosting Adversarial Attacks with Momentum

TL;DR: A broad class of momentum-based iterative algorithms to boost adversarial attacks by integrating the momentum term into the iterative process for attacks, which can stabilize update directions and escape from poor local maxima during the iterations, resulting in more transferable adversarial examples.
Journal ArticleDOI

One Pixel Attack for Fooling Deep Neural Networks

TL;DR: This paper proposes a novel method for generating one-pixel adversarial perturbations based on differential evolution (DE), which requires less adversarial information (a black-box attack) and can fool more types of networks due to the inherent features of DE.
Proceedings ArticleDOI

Robust Physical-World Attacks on Deep Learning Visual Classification

TL;DR: This work proposes a general attack algorithm, Robust Physical Perturbations (RP2), to generate robust visual adversarial perturbations under different physical conditions and shows that adversarial examples generated using RP2 achieve high targeted misclassification rates against standard-architecture road sign classifiers in the physical world under various environmental conditions, including viewpoints.
Journal ArticleDOI

Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey

TL;DR: A comprehensive survey on adversarial attacks on deep learning in computer vision can be found in this paper, where the authors review the works that design adversarial attack, analyze the existence of such attacks and propose defenses against them.
Proceedings ArticleDOI

ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models

TL;DR: An effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN is proposed, sparing the need for training substitute models and avoiding the loss in attack transferability.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI

Going deeper with convolutions

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.