scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Membership Inference Attacks Against Machine Learning Models

TL;DR: This work quantitatively investigates how machine learning models leak information about the individual data records on which they were trained and empirically evaluates the inference techniques on classification models trained by commercial "machine learning as a service" providers such as Google and Amazon.
Abstract: We quantitatively investigate how machine learning models leak information about the individual data records on which they were trained. We focus on the basic membership inference attack: given a data record and black-box access to a model, determine if the record was in the model's training dataset. To perform membership inference against a target model, we make adversarial use of machine learning and train our own inference model to recognize differences in the target model's predictions on the inputs that it trained on versus the inputs that it did not train on. We empirically evaluate our inference techniques on classification models trained by commercial "machine learning as a service" providers such as Google and Amazon. Using realistic datasets and classification tasks, including a hospital discharge dataset whose membership is sensitive from the privacy perspective, we show that these models can be vulnerable to membership inference attacks. We then investigate the factors that influence this leakage and evaluate mitigation strategies.

Content maybe subject to copyright    Report

Membership Inference Attacks Against
Machine Learning Models
Reza Shokri
Cornell Tech
shokri@cornell.edu
Marco Stronati
INRIA
marco@stronati.org
Congzheng Song
Cornell
cs2296@cornell.edu
Vitaly Shmatikov
Cornell Tech
shmat@cs.cornell.edu
Abstract—We quantitatively investigate how machine learning
models leak information about the individual data records on
which they were trained. We focus on the basic membership
inference attack: given a data record and black-box access to
a model, determine if the record was in the model’s training
dataset. To perform membership inference against a target model,
we make adversarial use of machine learning and train our own
inference model to recognize differences in the target model’s
predictions on the inputs that it trained on versus the inputs
that it did not train on.
We empirically evaluate our inference techniques on classi-
fication models trained by commercial “machine learning as a
service” providers such as Google and Amazon. Using realistic
datasets and classification tasks, including a hospital discharge
dataset whose membership is sensitive from the privacy perspec-
tive, we show that these models can be vulnerable to membership
inference attacks. We then investigate the factors that influence
this leakage and evaluate mitigation strategies.
I. INTRODUCTION
Machine learning is the foundation of popular Internet
services such as image and speech recognition and natural lan-
guage translation. Many companies also use machine learning
internally, to improve marketing and advertising, recommend
products and services to users, or better understand the data
generated by their operations. In all of these scenarios, ac-
tivities of individual users—their purchases and preferences,
health data, online and offline transactions, photos they take,
commands they speak into their mobile phones, locations they
travel to—are used as the training data.
Internet giants such as Google and Amazon are already
offering “machine learning as a service. Any customer in
possession of a dataset and a data classification task can upload
this dataset to the service and pay it to construct a model.
The service then makes the model available to the customer,
typically as a black-box API. For example, a mobile-app maker
can use such a service to analyze users’ activities and query
the resulting model inside the app to promote in-app purchases
to users when they are most likely to respond. Some machine-
learning services also let data owners expose their models to
external users for querying or even sell them.
Our contributions. We focus on the fundamental question
known as membership inference: given a machine learning
model and a record, determine whether this record was used as
This research was performed while the author was at Cornell Tech.
part of the model’s training dataset or not. We investigate this
question in the most difficult setting, where the adversary’s
access to the model is limited to black-box queries that
return the model’s output on a given input. In summary,
we quantify membership information leakage through the
prediction outputs of machine learning models.
To answer the membership inference question, we turn
machine learning against itself and train an attack model
whose purpose is to distinguish the target model’s behavior
on the training inputs from its behavior on the inputs that it
did not encounter during training. In other words, we turn the
membership inference problem into a classification problem.
Attacking black-box models such as those built by com-
mercial “machine learning as a service” providers requires
more sophistication than attacking white-box models whose
structure and parameters are known to the adversary. To
construct our attack models, we invented a shadow training
technique. First, we create multiple “shadow models” that
imitate the behavior of the target model, but for which we
know the training datasets and thus the ground truth about
membership in these datasets. We then train the attack model
on the labeled inputs and outputs of the shadow models.
We developed several effective methods to generate training
data for the shadow models. The first method uses black-box
access to the target model to synthesize this data. The second
method uses statistics about the population from which the
target’s training dataset was drawn. The third method assumes
that the adversary has access to a potentially noisy version
of the target’s training dataset. The first method does not
assume any prior knowledge about the distribution of the target
model’s training data, while the second and third methods
allow the attacker to query the target model only once before
inferring whether a given record was in its training dataset.
Our inference techniques are generic and not based on any
particular dataset or model type. We evaluate them against
neural networks, as well as black-box models trained using
Amazon ML and Google Prediction API. All of our experi-
ments on Amazon’s and Google’s platforms were done without
knowing the learning algorithms used by these services, nor
the architecture of the resulting models, since Amazon and
Google don’t reveal this information to the customers. For our
evaluation, we use realistic classification tasks and standard
model-training procedures on concrete datasets of images,
retail purchases, location traces, and hospital inpatient stays. In
2017 IEEE Symposium on Security and Privacy
© 2017, Reza Shokri. Under license to IEEE.
DOI 10.1109/SP.2017.41
3

addition to demonstrating that membership inference attacks
are successful, we quantify how their success relates to the
classification tasks and the standard metrics of overfitting.
Inferring information about the model’s training dataset
should not be confused with techniques such as model in-
version that use a model’s output on a hidden input to infer
something about this input [17] or to extract features that
characterize one of the model’s classes [16]. As explained
in [27] and Section IX, model inversion does not produce an
actual member of the model’s training dataset, nor, given a
record, does it infer whether this record was in the training
dataset. By contrast, the membership inference problem we
study in this paper is essentially the same as the well-known
problem of identifying the presence of an individual’s data in a
mixed pool given some statistics about the pool [3], [15], [21],
[29]. In our case, however, the goal is to infer membership
given a black-box API to a model of unknown structure, as
opposed to explicit statistics.
Our experimental results show that models created using
machine-learning-as-a-service platforms can leak a lot of in-
formation about their training datasets. For multi-class clas-
sification models trained on 10,000-record retail transaction
datasets using Google’s and Amazon’s services in default
configurations, our membership inference achieves median
accuracy of 94% and 74%, respectively. Even if we make
no prior assumptions about the distribution of the target
model’s training data and use fully synthetic data for our
shadow models, the accuracy of membership inference against
Google-trained models is 90%. Our results for the Texas
hospital discharge dataset (over 70% accuracy) indicate that
membership inference can present a risk to health-care datasets
if these datasets are used to train machine learning models
and access to the resulting models is open to the public.
Membership in such datasets is highly sensitive.
We discuss the root causes that make these attacks possi-
ble and quantitatively compare mitigation strategies such as
limiting the model’s predictions to top k classes, decreasing
the precision of the prediction vector, increasing its entropy,
or using regularization while training the model.
In summary, this paper demonstrates and quantifies the
problem of machine learning models leaking information
about their training datasets. To create our attack models, we
developed a new shadow learning technique that works with
minimal knowledge about the target model and its training
dataset. Finally, we quantify how the leakage of membership
information is related to model overfitting.
II. M
ACHINE LEARNING BACKGROUND
Machine learning algorithms help us better understand and
analyze complex data. When the model is created using
unsupervised training, the objective is to extract useful features
from the unlabeled data and build a model that explains its
hidden structure. When the model is created using supervised
training, which is the focus of this paper, the training records
(as inputs of the model) are assigned labels or scores (as
outputs of the model). The goal is to learn the relationship
between the data and the labels and construct a model that can
generalize to data records beyond the training set [19]. Model-
training algorithms aim to minimize the model’s prediction er-
ror on the training dataset and thus may overfit to this dataset,
producing models that perform better on the training inputs
than on the inputs drawn from the same population but not
used during the training. Many regularization techniques have
been proposed to prevent models from becoming overfitted
to their training datasets while minimizing their prediction
error [19].
Supervised training is often used for classification and other
prediction tasks. For example, a retailer may train a model
that predicts a customer’s shopping style in order to offer her
suitable incentives, while a medical researcher may train a
model to predict which treatment is most likely to succeed
given a patient’s clinical symptoms or genetic makeup.
Machine learning as a service. Major Internet companies
now offer machine learning as a service on their cloud
platforms. Examples include Google Prediction API,
1
Amazon
Machine Learning (Amazon ML),
2
Microsoft Azure Machine
Learning (Azure ML),
3
and BigML.
4
These platforms provide simple APIs for uploading the data
and for training and querying models, thus making machine
learning technologies available to any customer. For example,
a developer may create an app that gathers data from users,
uploads it into the cloud platform to train a model (or update
an existing model with new data), and then uses the model’s
predictions inside the app to improve its features or better
interact with the users. Some platforms even envision data
holders training a model and then sharing it with others
through the platform’s API for profit.
5
The details of the models and the training algorithms are
hidden from the data owners. The type of the model may be
chosen by the service adaptively, depending on the data and
perhaps accuracy on validation subsets. Service providers do
not warn customers about the consequences of overfitting and
provide little or no control over regularization. For example,
Google Prediction API hides all details, while Amazon ML
provides only a very limited set of pre-defined options (L1- or
L2-norm regularization). The models cannot be downloaded
and are accessed only through the service’s API. Service
providers derive revenue mainly by charging customers for
queries through this API. Therefore, we treat “machine learn-
ing as a service” as a black box. All inference attacks we
demonstrate in this paper are performed entirely through the
services’ standard APIs.
III. P
RIVACY IN MACHINE LEARNING
Before dealing with inference attacks, we need to define
what privacy means in the context of machine learning or,
1
https://cloud.google.com/prediction
2
https://aws.amazon.com/machine-learning
3
https://studio.azureml.net
4
https://bigml.com
5
https://cloud.google.com/prediction/docs/gallery
4

alternatively, what it means for a machine learning model to
breach privacy.
A. Inference about members of the population
A plausible notion of privacy, known in statistical disclosure
control as the “Dalenius desideratum, states that the model
should reveal no more about the input to which it is applied
than would have been known about this input without applying
the model. This cannot be achieved by any useful model [14].
A related notion of privacy appears in prior work on model
inversion [17]: a privacy breach occurs if an adversary can
use the model’s output to infer the values of unintended
(sensitive) attributes used as input to the model. As observed
in [27], it may not be possible to prevent this “breach” if
the model is based on statistical facts about the population.
For example, suppose that training the model has uncovered
a high correlation between a person’s externally observable
phenotype features and their genetic predisposition to a certain
disease. This correlation is now a publicly known scientific
fact that allows anyone to infer information about the person’s
genome after observing that person.
Critically, this correlation applies to all members of a given
population. Therefore, the model breaches “privacy” not just of
the people whose data was used to create the model, but also of
other people from the same population, even those whose data
was not used and whose identities may not even be known to
the model’s creator (i.e., this is “spooky action at a distance”).
Valid models generalize, i.e., they make accurate predictions
on inputs that were not part of their training datasets. This
means that the creator of a generalizable model cannot do
anything to protect “privacy” as defined above because the
correlations on which the model is based—and the inferences
that these correlations enable—hold for the entire population,
regardless of how the training sample was chosen or how the
model was created from this sample.
B. Inference about members of the training dataset
To bypass the difficulties inherent in defining and protecting
privacy of the entire population, we focus on protecting privacy
of the individuals whose data was used to train the model. This
motivation is closely related to the original goals of differential
privacy [13].
Of course, members of the training dataset are members
of the population, too. We investigate what the model reveals
about them beyond what it reveals about an arbitrary member
of the population. Our ultimate goal is to measure the mem-
bership risk that a person incurs if they allow their data to be
used to train a model.
The basic attack in this setting is membership inference,
i.e., determining whether a given data record was part of the
model’s training dataset or not. When a record is fully known
to the adversary, learning that it was used to train a particular
model is an indication of information leakage through the
model. In some cases, it can directly lead to a privacy breach.
For example, knowing that a certain patient’s clinical record
was used to train a model associated with a disease (e.g, to
determine the appropriate medicine dosage or to discover the
genetic basis of the disease) can reveal that the patient has this
disease.
We investigate the membership inference problem in the
black-box scenario where the adversary can only supply inputs
to the model and receive the model’s output(s). In some
situations, the model is available to the adversary indirectly.
For example, an app developer may use a machine-learning
service to construct a model from the data collected by the app
and have the app make API calls to the resulting model. In this
case, the adversary would supply inputs to the app (rather than
directly to the model) and receive the app’s outputs (which are
based on the model’s outputs). The details of internal model
usage vary significantly from app to app. For simplicity and
generality, we will assume that the adversary directly supplies
inputs to and receives outputs from the black-box model.
IV. P
ROBLEM STATEMENT
Consider a set of labeled data records sampled from some
population and partitioned into classes. We assume that a
machine learning algorithm is used to train a classification
model that captures the relationship between the content of
the data records and their labels.
For any input data record, the model outputs the prediction
vector of probabilities, one per class, that the record belongs
to a certain class. We will also refer to these probabilities
as confidence values. The class with the highest confidence
value is selected as the predicted label for the data record.
The accuracy of the model is evaluated by measuring how it
generalizes beyond its training set and predicts the labels of
other data records from the same population.
We assume that the attacker has query access to the model
and can obtain the model’s prediction vector on any data
record. The attacker knows the format of the inputs and
outputs of the model, including their number and the range of
values they can take. We also assume that the attacker either
(1) knows the type and architecture of the machine learning
model, as well as the training algorithm, or (2) has black-box
access to a machine learning oracle (e.g., a “machine learning
as a service” platform) that was used to train the model. In
the latter case, the attacker does not know a priori the model’s
structure or meta-parameters.
The attacker may have some background knowledge about
the population from which the target model’s training dataset
was drawn. For example, he may have independently drawn
samples from the population, disjoint from the target model’s
training dataset. Alternatively, the attacker may know some
general statistics about the population, for example, the
marginal distribution of feature values.
The setting for our inference attack is as follows. The
attacker is given a data record and black-box query access
to the target model. The attack succeeds if the attacker can
correctly determine whether this data record was part of the
model’s training dataset or not. The standard metrics for attack
accuracy are precision (what fraction of records inferred as
members are indeed members of the training dataset) and
5

(data record, class label) Target Model
Attack Model
data training set ?
predict(data)
label
prediction
Fig. 1: Membership inference attack in the black-box setting. The
attacker queries the target model with a data record and obtains
the model’s prediction on that record. The prediction is a vector of
probabilities, one per class, that the record belongs to a certain class.
This prediction vector, along with the label of the target record, is
passed to the attack model, which infers whether the record was in
or out of the target model’s training dataset.
ML API
Private Training Set
Target Model
Shadow Training Set 1
Shadow Model 1
Shadow Training Set 2
Shadow Model 2
...
...
Shadow Training Set k
Shadow Model k
train()
train()
train()
train()
Fig. 2: Training shadow models using the same machine learning
platform as was used to train the target model. The training datasets
of the target and shadow models have the same format but are disjoint.
The training datasets of the shadow models may overlap. All models’
internal parameters are trained independently.
recall (what fraction of the training dataset’s members are
correctly inferred as members by the attacker).
V. M
EMBERSHIP INFERENCE
A. Overview of the attack
Our membership inference attack exploits the observation
that machine learning models often behave differently on the
data that they were trained on versus the data that they “see”
for the first time. Overfitting is a common reason but not the
only one (see Section VII). The objective of the attacker is to
construct an attack model that can recognize such differences
in the target model’s behavior and use them to distinguish
members from non-members of the target model’s training
dataset based solely on the target model’s output.
Our attack model is a collection of models, one for each
output class of the target model. This increases accuracy of the
attack because the target model produces different distributions
over its output classes depending on the input’s true class.
To train our attack model, we build multiple “shadow”
models intended to behave similarly to the target model. In
contrast to the target model, we know the ground truth for each
shadow model, i.e., whether a given record was in its training
dataset or not. Therefore, we can use supervised training on
the inputs and the corresponding outputs (each labeled “in” or
“out”) of the shadow models to teach the attack model how to
distinguish the shadow models’ outputs on members of their
training datasets from their outputs on non-members.
Formally, let f
target
() be the target model, and let D
train
target
be its private training dataset which contains labeled data
records (x
{i}
,y
{i}
)
target
. A data record x
{i}
target
is the input to
the model, and y
{i}
target
is the true label that can take values
from a set of classes of size c
target
. The output of the target
model is a probability vector of size c
target
. The elements of
this vector are in [0, 1] and sum up to 1.
Let f
attack
() be the attack model. Its input x
attack
is com-
posed of a correctly labeled record and a prediction vector
of size c
target
. Since the goal of the attack is decisional
membership inference, the attack model is a binary classifier
with two output classes, “in” and “out.
Figure 1 illustrates our end-to-end attack process. For a
labeled record (x,y), we use the target model to compute
the prediction vector y = f
target
(x). The distribution of y
(classification confidence values) depends heavily on the true
class of x. This is why we pass the true label y of x in
addition to the model’s prediction vector y to the attack
model. Given how the probabilities in y are distributed around
y, the attack model computes the membership probability
Pr{(x,y) D
train
target
}, i.e., the probability that ((x,y), y)
belongs to the “in” class or, equivalently, that x is in the
training dataset of f
target
().
The main challenge is how to train the attack model to
distinguish members from non-members of the target model’s
training dataset when the attacker has no information about the
internal parameters of the target model and only limited query
access to it through the public API. To solve this conundrum,
we developed a shadow training technique that lets us train
the attack model on proxy targets for which we do know the
training dataset and can thus perform supervised training.
B. Shadow models
The attacker creates k shadow models f
i
shadow
(). Each
shadow model i is trained on a dataset D
train
shadow
i
of the same
format as and distributed similarly to the target model’s train-
ing dataset. These shadow training datasets can be generated
using one of methods described in Section V-C. We assume
that the datasets used for training the shadow models are
disjoint from the private dataset used to train the target model
(i, D
train
shadow
i
D
train
target
= ). This is the worst case for the
attacker; the attack will perform even better if the training
datasets happen to overlap.
The shadow models must be trained in a similar way to
the target model. This is easy if the target’s training algorithm
6

Algorithm 1 Data synthesis using the target model
1: procedure SYNTHESIZE(class : c)
2: x RANDRECORD(.) initialize a record randomly
3: y
c
0
4: j 0
5: k k
max
6: for iteration =1···iter
max
do
7: y f
target
(x) query the target model
8: if y
c
y
c
then accept the record
9: if y
c
> conf
min
and c = arg max(y) then
10: if rand() <y
c
then sample
11: return x synthetic data
12: end if
13: end if
14: x
x
15: y
c
y
c
16: j 0
17: else
18: j j +1
19: if j>rej
max
then many consecutive rejects
20: k max(k
min
, k/2)
21: j 0
22: end if
23: end if
24: x RANDRECORD(x
, k) randomize k features
25: end for
26: return failed to synthesize
27: end procedure
(e.g., neural networks, SVM, logistic regression) and model
structure (e.g., the wiring of a neural network) are known.
Machine learning as a service is more challenging. Here the
type and structure of the target model are not known, but
the attacker can use exactly the same service (e.g., Google
Prediction API) to train the shadow model as was used to
train the target model—see Figure 2.
The more shadow models, the more accurate the attack
model will be. As described in Section V-D, the attack model
is trained to recognize differences in shadow models’ behavior
when these models operate on inputs from their own training
datasets versus inputs they did not encounter during training.
Therefore, more shadow models provide more training fodder
for the attack model.
C. Generating training data for shadow models
To train shadow models, the attacker needs training data
that is distributed similarly to the target model’s training data.
We developed several methods for generating such data.
Model-based synthesis. If the attacker does not have real
training data nor any statistics about its distribution, he can
generate synthetic training data for the shadow models using
the target model itself. The intuition is that records that are
classified by the target model with high confidence should
be statistically similar to the target’s training dataset and thus
provide good fodder for shadow models.
The synthesis process runs in two phases: (1) search, using
a hill-climbing algorithm, the space of possible data records
to find inputs that are classified by the target model with high
confidence; (2) sample synthetic data from these records. After
this process synthesizes a record, the attacker can repeat it until
the training dataset for shadow models is full.
See Algorithm 1 for the pseudocode of our synthesis
procedure. First, fix class c for which the attacker wants to
generate synthetic data. The first phase is an iterative process.
Start by randomly initializing a data record x. Assuming that
the attacker knows only the syntactic format of data records,
sample the value for each feature uniformly at random from
among all possible values of that feature. In each iteration,
propose a new record. A proposed record is accepted only
if it increases the hill-climbing objective: the probability of
being classified by the target model as class c.
Each iteration involves proposing a new candidate record by
changing k randomly selected features of the latest accepted
record x
. This is done by flipping binary features or resam-
pling new values for features of other types. We initialize k to
k
max
and divide it by 2 when rej
max
subsequent proposals
are rejected. This controls the diameter of search around the
accepted record in order to propose a new record. We set the
minimum value of k to k
min
. This controls the speed of the
search for new records with a potentially higher classification
probability y
c
.
The second, sampling phase starts when the target model’s
probability y
c
that the proposed data record is classified as
belonging to class c is larger than the probabilities for all
other classes and also larger than a threshold conf
min
. This
ensures that the predicted label for the record is c, and that the
target model is sufficiently confident in its label prediction. We
select such record for the synthetic dataset with probability y
c
and, if selection fails, repeat until a record is selected.
This synthesis procedure works only if the adversary can
efficiently explore the space of possible inputs and discover
inputs that are classified by the target model with high confi-
dence. For example, it may not work if the inputs are high-
resolution images and the target model performs a complex
image classification task.
Statistics-based synthesis. The attacker may have some statis-
tical information about the population from which the target
model’s training data was drawn. For example, the attacker
may have prior knowledge of the marginal distributions of
different features. In our experiments, we generate synthetic
training records for the shadow models by independently
sampling the value of each feature from its own marginal
distribution. The resulting attack models are very effective.
Noisy real data. The attacker may have access to some data
that is similar to the target model’s training data and can be
considered as a “noisy” version thereof. In our experiments
with location datasets, we simulate this by flipping the (bi-
nary) values of 10% or 20% randomly selected features, then
7

Citations
More filters
Proceedings ArticleDOI
30 Oct 2017
TL;DR: In this paper, the authors proposed a secure aggregation of high-dimensional data for federated deep neural networks, which allows a server to compute the sum of large, user-held data vectors from mobile devices in a secure manner without learning each user's individual contribution.
Abstract: We design a novel, communication-efficient, failure-robust protocol for secure aggregation of high-dimensional data. Our protocol allows a server to compute the sum of large, user-held data vectors from mobile devices in a secure manner (i.e. without learning each user's individual contribution), and can be used, for example, in a federated learning setting, to aggregate user-provided model updates for a deep neural network. We prove the security of our protocol in the honest-but-curious and active adversary settings, and show that security is maintained even if an arbitrarily chosen subset of users drop out at any time. We evaluate the efficiency of our protocol and show, by complexity analysis and a concrete implementation, that its runtime and communication overhead remain low even on large data sets and client pools. For 16-bit input values, our protocol offers $1.73 x communication expansion for 210 users and 220-dimensional vectors, and 1.98 x expansion for 214 users and 224-dimensional vectors over sending data in the clear.

1,890 citations

Journal ArticleDOI
TL;DR: It is found that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art.
Abstract: Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.

1,491 citations

Journal ArticleDOI
TL;DR: In this paper, the authors review recent findings on adversarial examples for DNNs, summarize the methods for generating adversarial samples, and propose a taxonomy of these methods.
Abstract: With rapid progress and significant successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks (DNNs) have been recently found vulnerable to well-designed input samples called adversarial examples . Adversarial perturbations are imperceptible to human but can easily fool DNNs in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying DNNs in safety-critical environments. Therefore, attacks and defenses on adversarial examples draw great attention. In this paper, we review recent findings on adversarial examples for DNNs, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods. Under the taxonomy, applications for adversarial examples are investigated. We further elaborate on countermeasures for adversarial examples. In addition, three major challenges in adversarial examples and the potential solutions are discussed.

1,203 citations

Posted Content
TL;DR: Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.
Abstract: Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.

1,107 citations


Cites background from "Membership Inference Attacks Agains..."

  • ...For classic (non-federated) models of computation, understanding a model’s susceptibility to attacks is an active and challenging research area [167, 357, 91, 293]....

    [...]

Proceedings ArticleDOI
19 May 2019
TL;DR: In this article, passive and active inference attacks are proposed to exploit the leakage of information about participants' training data in federated learning, where each participant can infer the presence of exact data points and properties that hold only for a subset of the training data and are independent of the properties of the joint model.
Abstract: Collaborative machine learning and related techniques such as federated learning allow multiple participants, each with his own training dataset, to build a joint model by training locally and periodically exchanging model updates. We demonstrate that these updates leak unintended information about participants' training data and develop passive and active inference attacks to exploit this leakage. First, we show that an adversarial participant can infer the presence of exact data points -- for example, specific locations -- in others' training data (i.e., membership inference). Then, we show how this adversary can infer properties that hold only for a subset of the training data and are independent of the properties that the joint model aims to capture. For example, he can infer when a specific person first appears in the photos used to train a binary gender classifier. We evaluate our attacks on a variety of tasks, datasets, and learning configurations, analyze their limitations, and discuss possible defenses.

1,084 citations

References
More filters
Journal Article
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

33,597 citations


"Membership Inference Attacks Agains..." refers background in this paper

  • ...Regularization techniques such as dropout [31] can help defeat overfitting and also strengthen...

    [...]

Book
28 Jul 2013
TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.
Abstract: During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression and path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.

19,261 citations

Posted Content
TL;DR: This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.
Abstract: A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.

12,857 citations


"Membership Inference Attacks Agains..." refers methods in this paper

  • ...This technique, also used in knowledge distillation and information transfer between models [20], would increase the entropy of the prediction vector....

    [...]

Journal ArticleDOI
TL;DR: The Elements of Statistical Learning: Data Mining, Inference, and Prediction as discussed by the authors is a popular book for data mining and machine learning, focusing on data mining, inference, and prediction.
Abstract: (2004). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Journal of the American Statistical Association: Vol. 99, No. 466, pp. 567-567.

10,549 citations

Book ChapterDOI
04 Mar 2006
TL;DR: In this article, the authors show that for several particular applications substantially less noise is needed than was previously understood to be the case, and also show the separation results showing the increased value of interactive sanitization mechanisms over non-interactive.
Abstract: We continue a line of research initiated in [10,11]on privacy-preserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function f mapping databases to reals, the so-called true answer is the result of applying f to the database. To protect privacy, the true answer is perturbed by the addition of random noise generated according to a carefully chosen distribution, and this response, the true answer plus noise, is returned to the user. Previous work focused on the case of noisy sums, in which f = ∑ig(xi), where xi denotes the ith row of the database and g maps database rows to [0,1]. We extend the study to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f. Roughly speaking, this is the amount that any single argument to f can change its output. The new analysis shows that for several particular applications substantially less noise is needed than was previously understood to be the case. The first step is a very clean characterization of privacy in terms of indistinguishability of transcripts. Additionally, we obtain separation results showing the increased value of interactive sanitization mechanisms over non-interactive.

6,211 citations