What are the contributions mentioned in the paper "Towards making systems forget with machine unlearning" ?

This paper focuses on making learning systems forget, the process of which the authors call machine unlearning, or simply unlearning. The authors present a general, efficient unlearning approach by transforming learning algorithms used by a system into a summation form. To forget a training data sample, their approach simply updates a small number of summations – asymptotically faster than retraining from scratch. Their approach is general, because the summation form is from the statistical query learning in which many machine learning algorithms can be implemented. Their approach also applies to all stages of machine learning, including feature selection and modeling. Their evaluation, on four diverse learning systems and real-world workloads, shows that their approach is general, effective, fast, and easy to use.

What have the authors stated for future works in "Towards making systems forget with machine unlearning" ?

The authors plan to build full-fledged forgetting systems that carefully track data lineage at many levels of granularity, across all operations, and at potentially the Web scale.

How many iterations does the algorithm need to converge?

The number of iterations required for the algorithm to converge depends on the algorithm, the initial state selected, and the training data.

How many PDFs did it have to run?

It obtained up to 104× speedup except for PJScan because its largest data set has only 65 PDFs, so the execution time was dominated by program start and shutdown not learning.

What is the simplest way to unlearning a learning algorithm?

unlearning can simply “resume” the iterative learning algorithm from this state on the updated training data set, and it should take much fewer iterations to converge than restarting from the original or a newly generated initial state.

What is the way to reduce the detection effectiveness of the other three systems?

For each of the other three systems, because there is no known attack, the authors created a new, practical data pollution attack to decrease the detection effectiveness.

Why do the authors expect these scenarios to be rare?

The authors expect these scenarios to be rare because adaptive algorithms need to be robust anyway for convergence during normal operations.

(Open Access) Towards Making Systems Forget with Machine Unlearning (2015) | Yinzhi Cao

Towards Making Systems Forget with Machine Unlearning

Yinzhi Cao and Junfeng Yang

Columbia University

{yzcao, junfeng}@cs.columbia.edu

Abstract—Today’s systems produce a rapidly exploding

amount of data, and the data further derives more data, forming

a complex data propagation network that we call the data’s

lineage. There are many reasons that users want systems to forget

certain data including its lineage. From a privacy perspective,

users who become concerned with new privacy risks of a system

often want the system to forget their data and lineage. From a

security perspective, if an attacker pollutes an anomaly detector

by injecting manually crafted data into the training data set,

the detector must forget the injected data to regain security.

From a usability perspective, a user can remove noise and

incorrect entries so that a recommendation engine gives useful

recommendations. Therefore, we envision forgetting systems,

capable of forgetting certain data and their lineages, completely

and quickly.

This paper focuses on making learning systems forget, the

process of which we call machine unlearning, or simply un-

learning. We present a general, efﬁcient unlearning approach

by transforming learning algorithms used by a system into a

summation form. To forget a training data sample, our approach

simply updates a small number of summations – asymptotically

faster than retraining from scratch. Our approach is general,

because the summation form is from the statistical query learning

in which many machine learning algorithms can be implemented.

Our approach also applies to all stages of machine learning,

including feature selection and modeling. Our evaluation, on four

diverse learning systems and real-world workloads, shows that

our approach is general, effective, fast, and easy to use.

I. INTRODUCTION

A. The Need for Systems to Forget

Today’s systems produce a rapidly exploding amount of

data, ranging from personal photos and ofﬁce documents to

logs of user clicks on a website or mobile device [15]. From

this data, the systems perform a myriad of computations to

derive even more data. For instance, backup systems copy data

from one place (e.g., a mobile device) to another (e.g., the

cloud). Photo storage systems re-encode a photo into different

formats and sizes [23, 53]. Analytics systems aggregate raw

data such as click logs into insightful statistics. Machine learn-

ing systems extract models and properties (e.g., the similarities

of movies) from training data (e.g., historical movie ratings)

using advanced algorithms. This derived data can recursively

derive more data, such as a recommendation system predicting

a user’s rating of a movie based on movie similarities. In short,

a piece of raw data in today’s systems often goes through

a series of computations, “creeping” into many places and

appearing in many forms. The data, computations, and derived

data together form a complex data propagation network that

we call the data’s lineage.

For a variety of reasons, users want a system to forget

certain sensitive data and its complete lineage. Consider pri-

vacy ﬁrst. After Facebook changed its privacy policy, many

users deleted their accounts and the associated data [69].

The iCloud photo hacking incident [8] led to online articles

teaching users how to completely delete iOS photos including

the backups [79]. New privacy research revealed that machine

learning models for personalized warfarin dosing leak patients’

genetic markers [43], and a small set of statistics on genet-

ics and diseases sufﬁces to identify individuals [78]. Users

unhappy with these newfound risks naturally want their data

and its inﬂuence on the models and statistics to be completely

forgotten. System operators or service providers have strong

incentives to honor users’ requests to forget data, both to keep

users happy and to comply with the law [72]. For instance,

Google had removed 171,183 links [50] by October 2014

under the “right to be forgotten” ruling of the highest court in

the European Union.

Security is another reason that users want data to be

forgotten. Consider anomaly detection systems. The security

of these systems hinges on the model of normal behaviors ex-

tracted from the training data. By polluting

the training data,

attackers pollute the model, thus compromising security. For

instance, Perdisci et al. [56] show that PolyGraph [55], a worm

detection engine, fails to generate useful worm signatures if

the training data is injected with well-crafted fake network

ﬂows. Once the polluted data is identiﬁed, the system must

completely forget the data and its lineage to regain security.

Usability is a third reason. Consider the recommendation

or prediction system Google Now [7]. It infers a user’s

preferences from her search history, browsing history, and

other analytics. It then pushes recommendations, such as news

about a show, to the user. Noise or incorrect entries in analytics

can seriously degrade the quality of the recommendation. One

of our lab members experienced this problem ﬁrst-hand. He

loaned his laptop to a friend who searched for a TV show

(“Jeopardy!”) on Google [1]. He then kept getting news about

this show on his phone, even after he deleted the search record

from his search history.

We believe that systems must be designed under the core

principle of completely and quickly forgetting sensitive data

and its lineage for restoring privacy, security, and usability.

Such forgetting systems must carefully track data lineage

even across statistical processing or machine learning, and

make this lineage visible to users. They let users specify

In this paper, we use the term pollute [56] instead of poison [47, 77].

the data to forget with different levels of granularity. For

instance, a privacy-conscious user who accidentally searches

for a sensitive keyword without concealing her identity can

request that the search engine forget that particular search

record. These systems then remove the data and revert its

effects so that all future operations run as if the data had never

existed. They collaborate to forget data if the lineage spans

across system boundaries (e.g., in the context of web mashup

services). This collaborative forgetting potentially scales to

the entire Web. Users trust forgetting systems to comply

with requests to forget, because the aforementioned service

providers have strong incentives to comply, but other trust

models are also possible. The usefulness of forgetting systems

can be evaluated with two metrics: how completely they can

forget data (completeness) and how quickly they can do so

(timeliness). The higher these metrics, the better the systems

at restoring privacy, security, and usability.

We foresee easy adoption of forgetting systems because they

beneﬁt both users and service providers. With the ﬂexibility

to request that systems forget data, users have more control

over their data, so they are more willing to share data with the

systems. More data also beneﬁt the service providers, because

they have more proﬁt opportunities services and fewer legal

risks. In addition, we envision forgetting systems playing a

crucial role in emerging data markets [3, 40, 61] where users

trade data for money, services, or other data because the

mechanism of forgetting enables a user to cleanly cancel a

data transaction or rent out the use rights of her data without

giving up the ownership.

Forgetting systems are complementary to much existing

work [55, 75, 80]. Systems such as Google Search [6] can

forget a user’s raw data upon request, but they ignore the

lineage. Secure deletion [32, 60, 70] prevents deleted data from

being recovered from the storage media, but it largely ignores

the lineage, too. Information ﬂow control [41, 67] can be

leveraged by forgetting systems to track data lineage. However,

it typically tracks only direct data duplication, not statistical

processing or machine learning, to avoid taint explosion.

Differential privacy [75, 80] preserves the privacy of each indi-

vidual item in a data set equally and invariably by restricting

accesses only to the whole data set’s statistics fuzzed with

noise. This restriction is at odds with today’s systems such

as Facebook and Google Search which, authorized by billions

of users, routinely access personal data for accurate results.

Unsurprisingly, it is impossible to strike a balance between

utility and privacy in state-of-the-art implementations [43]. In

contrast, forgetting systems aim to restore privacy on select

data. Although private data may still propagate, the lineage of

this data within the forgetting systems is carefully tracked and

removed completely and in a timely manner upon request. In

addition, this ﬁne-grained data removal caters to an individual

user’s privacy consciousness and the data item’s sensitivity.

Forgetting systems conform to the trust and usage models of

today’s systems, representing a more practical privacy vs util-

ity tradeoff. Researchers also proposed mechanisms to make

systems more robust against training data pollution [27, 55].

...

a small

number of

summations

training data

samples

Machine Learning Model

Learn

Machine Learning Model

...

g1,g2:

transformations

Fig. 1: Unlearning idea. Instead of making a model directly depend

on each training data sample (left), we convert the learning algorithm

into a summation form (right). Speciﬁcally, each summation is the

sum of transformed data samples, where the transformation functions

are efﬁciently computable. There are only a small number of

summations, and the learning algorithm depends only on summations.

To forget a data sample, we simply update the summations and then

compute the updated model. This approach is asymptotically much

faster than retraining from scratch.

However, despite these mechanisms (and the others discussed

so far such as differential privacy), users may still request

systems to forget data due to, for example, policy changes and

new attacks against the mechanisms [43, 56]. These requests

can be served only by forgetting systems.

B. Machine Unlearning

While there are numerous challenges in making systems

forget, this paper focuses on one of the most difﬁcult chal-

lenges: making machine learning systems forget. These sys-

tems extract features and models from training data to answer

questions about new data. They are widely used in many

areas of science [25, 35, 37, 46, 55, 63–65]. To forget a piece

of training data completely, these systems need to revert the

effects of the data on the extracted features and models. We

call this process machine unlearning, or unlearning for short.

A na

ıve approach to unlearning is to retrain the features

and models from scratch after removing the data to forget.

However, when the set of training data is large, this approach

is quite slow, increasing the timing window during which the

system is vulnerable. For instance, with a real-world data set

from Huawei (see §VII), it takes Zozzle [35], a JavaScript

malware detector, over a day to retrain and forget a polluted

sample.

We present a general approach to efﬁcient unlearning, with-

out retraining from scratch, for a variety of machine learning

algorithms widely used in real-world systems. To prepare for

unlearning, we transform learning algorithms in a system to

a form consisting of a small number of summations [33].

Each summation is the sum of some efﬁciently computable

transformation of the training data samples. The learning

algorithms depend only on the summations, not individual

data. These summations are saved together with the trained

model. (The rest of the system may still ask for individual data

and there is no injected noise as there is in differential privacy.)

Then, in the unlearning process, we subtract the data to forget

from each summation, and then update the model. As Figure 1

illustrates, forgetting a data item now requires recomputing

only a small number of terms, asymptotically faster than

retraining from scratch by a factor equal to the size of the

training data set. For the aforementioned Zozzle example, our

unlearning approach takes only less than a second compared to

a day for retraining. It is general because the summation form

is from statistical query (SQ) learning [48]. Many machine

learning algorithms, such as na

ıve Bayes classiﬁers, support

vector machines, and k-means clustering, can be implemented

as SQ learning. Our approach also applies to all stages of

machine learning, including feature selection and modeling.

We evaluated our unlearning approach on four diverse

learning systems including (1) LensKit [39], an open-source

recommendation system used by several websites for confer-

ence [5], movie [14], and book [4] recommendations; (2) an

independent re-implementation of Zozzle, the aforementioned

closed-source JavaScript malware detector whose algorithm

was adopted by Microsoft Bing [42]; (3) an open-source online

social network (OSN) spam ﬁlter [46]; and (4) PJScan, an

open-source PDF malware detector [51]. We also used real-

world workloads such as more than 100K JavaScript malware

samples from Huawei. Our evaluation shows:

• All four systems are prone to attacks targeting learn-

ing. For LensKit, we reproduced an existing privacy

attack [29]. For each of the other three systems, because

there is no known attack, we created a new, practical data

pollution attack to decrease the detection effectiveness.

One particular attack requires careful injection of mul-

tiple features in the training data set to mislead feature

selection and model training (see §VII).

• Our unlearning approach applies to all learning algo-

rithms in LensKit, Zozzle, and PJScan. In particular,

enabled by our approach, we created the ﬁrst efﬁ-

cient unlearning algorithm for normalized cosine similar-

ity [37, 63] commonly used by recommendation systems

(e.g., LensKit) and for one-class support vector machine

(SVM) [71] commonly used by classiﬁcation/anomaly

detection systems (e.g., PJScan uses it to learn a model of

malicious PDFs). We show analytically that, for all these

algorithms, our approach is both complete (completely

removing a data sample’s lineage) and timely (asymptot-

ically much faster than retraining). For the OSN spam

ﬁlter, we leveraged existing techniques for unlearning.

• Using real-world data, we show empirically that unlearn-

ing prevents the attacks and the speedup over retraining

is often huge, matching our analytical results.

• Our approach is easy to use. It is straightforward to

modify the systems to support unlearning. For each

system, we modiﬁed from 20 – 300 lines of code, less

than 1% of the system.

C. Contributions and Paper Organization

This paper makes four main contributions:

• The concept of forgetting systems that restore privacy, se-

curity, and usability by forgetting data lineage completely

and quickly;

• A general unlearning approach that converts learning al-

gorithms into a summation form for efﬁciently forgetting

data lineage;

• An evaluation of our approach on real-world systems/al-

gorithms demonstrating that it is practical, complete, fast,

and easy to use; and

• The practical data pollution attacks we created against

real-world systems/algorithms.

While prior work proposed incremental machine learning

for several speciﬁc learning algorithms [31, 62, 73], the key

difference in our work is that we propose a general efﬁcient

unlearning approach applicable to any algorithm that can be

converted to the summation form, including some that cur-

rently have no incremental versions, such as normalized cosine

similarity and one-class SVM. In addition, our unlearning

approach handles all stages of learning, including feature

selection and modeling. We also demonstrated our approach

on real systems.

Our unlearning approach is inspired by prior work on speed-

ing up machine learning algorithms with MapReduce [33]. We

believe we are the ﬁrst to establish the connection between

unlearning and the summation form. In addition, we are the

ﬁrst to convert non-standard real-world learning algorithms

such as normalized cosine similarity to the summation form.

The conversion is complex and challenging (see §VI). In con-

trast, the prior work converts nine standard machine learning

algorithms using only simple transformations.

The rest of the paper is organized as follows. In §II, we

present some background on machine learning systems and

the extended motivation of unlearning. In §III, we present the

goals and work ﬂow of unlearning. In §IV, we present the core

approach of unlearning, i.e., transforming a system into the

summation form, and its formal backbone. In §V, we overview

our evaluation methodology and summarize results. In §VI–

§IX, we report detailed case studies on four real-world learning

systems. In §X and §XI, we discuss some issues in unlearning

and related work, and in §XII, we conclude.

II. BACKGROUND AND ADVERSARIAL MODEL

This section presents some background on machine learning

(§II-A) and the extended motivation of unlearning (§II-B).

A. Machine Learning Background

Figure 2 shows that a general machine learning system with

three processing stages.

• Feature selection. During this stage, the system selects,

from all features of the training data, a set of features

most crucial for classifying data. The selected feature

set is typically small to make later stages more accurate

and efﬁcient. Feature selection can be (1) manual where

system builders carefully craft the feature set or (2) au-

tomatic where the system runs some learning algorithms

Feature

Selection

Model

Training

Prediction

Unknown sample

Feature

Set

Model

Training Data Set

Result

Fig. 2: A General Machine Learning System. Given a set of training

data including both malicious (+) and benign (−) samples, the system

ﬁrst selects a set of features most crucial for classifying data. It then

uses the training data to construct a model. To process an unknown

sample, the system examines the features in the sample and uses

the model to predict the sample as malicious or benign. The lineage

of the training data thus ﬂows to the feature set, the model, and the

prediction results. An attacker can feed different samples to the model

and observe the results to steal private information from every step

along the lineage, including the training data set (system inference

attack). She can pollute the training data and subsequently every step

along the lineage to alter prediction results (training data pollution

attack).

such as clustering and chi-squared test to compute how

crucial the features are and select the most crucial ones.

• Model training. The system extracts the values of the

selected features from each training data sample into

a feature vector. It feeds the feature vectors and the

malicious or benign labels of all training data samples

into some machine learning algorithm to construct a

succinct model.

• Prediction. When the system receives an unknown data

sample, it extracts the sample’s feature vector and uses

the model to predict whether the sample is malicious or

benign.

Note that a learning system may or may not contain all

three stages, work with labeled training data, or classify data

as malicious or benign. We present the system in Figure 2 be-

cause it matches many machine learning systems for security

purposes such as Zozzle. Without loss of generality, we refer

to this system as an example in the later sections of the paper.

B. Adversarial Model

To further motivate the need for unlearning, we describe

several practical attacks in the literature that target learning

systems. They either violate privacy by inferring private in-

formation in the trained models (§II-B1), or reduce security

by polluting the prediction (detection) results of anomaly

detection systems (§II-B2).

1) System Inference Attacks: The training data sets, such

as movie ratings, online purchase histories, and browsing

histories, often contain private data. As shown in Figure 2,

the private data lineage ﬂows through the machine learning

algorithms into the feature set, the model, and the prediction

results. By exploiting this lineage, an attacker gains an oppor-

tunity to infer private data by feeding samples into the system

and observing the prediction results. Such an attack is called

a system inference attack [29].

Consider a recommendation system that uses item-item

collaborative ﬁltering which learns item-item similarities from

users’ purchase histories and recommends to a user the items

most similar to the ones she previously purchased. Calandrino

et al. [29] show that once an attacker learns (1) the item-item

similarities, (2) the list of recommended items for a user before

she purchased an item, and (3) the list after, the attacker can

accurately infer what the user purchased by essentially invert-

ing the computation done by the recommendation algorithm.

For example, on LibraryThing [12], a book cataloging service

and recommendation engine, this attack successfully inferred

six book purchases per user with 90% accuracy for over one

million users!

Similarly, consider a personalized warfarin dosing system

that guides medical treatments based on a patient’s genotype

and background. Fredrikson et al. [43] show that with the

model and some demographic information about a patient,

an attacker can infer the genetic markers of the patient with

accuracy as high as 75%.

2) Training Data Pollution Attacks: Another way to exploit

the lineage in Figure 2 is using training data pollution attacks.

An attacker injects carefully polluted data samples into a

learning system, misleading the algorithms to compute an in-

correct feature set and model. Subsequently, when processing

unknown samples, the system may ﬂag a big number of benign

samples as malicious and generate too many false positives,

or it may ﬂag a big number of malicious as benign so the true

malicious samples evade detection.

Unlike system inference in which an attacker exploits an

easy-to-access public interface of a learning system, data

pollution requires an attacker to tackle two relatively difﬁcult

issues. First, the attacker must trick the learning system into

including the polluted samples in the training data set. There

are a number of reported ways to do so [54, 56, 77]. For

instance, she may sign up as a crowdsourcing worker and

intentionally mislabel benign emails as spams [77]. She may

also attack the honeypots or other baiting traps intended for

collecting malicious samples, such as sending polluted emails

to a spamtrap [17], or compromising a machine in a honeynet

and sending packets with polluted protocol header ﬁelds [56].

Second, the attacker must carefully pollute enough data to

mislead the machine learning algorithms. In the crowdsourcing

case, she, the administrator of the crowdsourcing sites, directly

pollutes the labels of some training data [77]. 3% mislabeled

training data turned out to be enough to signiﬁcantly decrease

detection efﬁcacy. In the honeypot cases [17, 56], the attacker

cannot change the labels of the polluted data samples because

the honeypot automatically labels them as malicious. However,

In this paper, we use system inference instead of model inversion [43].

she controls what features appear in the samples, so she

can inject benign features into these samples, misleading the

system into relying on these features for detecting malicious

samples. For instance, Nelson et al. injected words that also

occur in benign emails into the emails sent to a spamtrap,

causing a spam detector to classify 60% of the benign emails

as spam. Perdisci et al. injected many packets with the same

randomly generated strings into a honeynet, so that true

malicious packets without these strings evade detection.

III. OVERVIEW

This section presents the goals (§III-A) and work ﬂow

(§III-B) of machine learning.

A. Unlearning Goals

Recall that forgetting systems have two goals: (1) com-

pleteness, or how completely they can forget data; and (2)

timeliness, or how quickly they can forget. We discuss what

these goals mean in the context of unlearning.

1) Completeness: Intuitively, completeness requires that

once a data sample is removed, all its effects on the feature set

and the model are also cleanly reversed. It essentially captures

how consistent an unlearned system is with the system that

has been retrained from scratch. If, for every possible sample,

the unlearned system gives the same prediction result as the

retrained system, then an attacker, operator, or user has no

way of discovering that the unlearned data and its lineage

existed in the system by feeding input samples to the unlearned

system or even observing its features, model, and training

data. Such unlearning is complete. To empirically measure

completeness, we quantify the percentage of input samples that

receive the same prediction results from both the unlearned

and the retrained system using a representative test data set.

The higher the percentage, the more complete the unlearning.

Note that completeness does not depend on the correctness

of prediction results: an incorrect but consistent prediction by

both systems does not decrease completeness.

Our notion of completeness is subject to such factors as

how representative the test data set is and whether the learning

algorithm is randomized. In particular, given the same training

data set, the same randomized learning algorithm may compute

different models which subsequently predict differently. Thus,

we consider unlearning complete as long as the unlearned

system is consistent with one of the retrained systems.

2) Timeliness: Timeliness in unlearning captures how much

faster unlearning is than retraining at updating the features

and the model in the system. The more timely the unlearning,

the faster the system is at restoring privacy, security, and

usability. Analytically, unlearning updates only a small number

of summations and then runs a learning algorithm on these

summations, whereas retraining runs the learning algorithm

on the entire training data set, so unlearning is asymptotically

faster by a factor of the training data size. To empirically mea-

sure timeliness, we quantify the speedup of unlearning over

retraining. Unlearning does not replace retraining. Unlearning

works better when the data to forget is small compared to the

training set. This case is quite common. For instance, a single

user’s private data is typically small compared to the whole

training data of all users. Similarly, an attacker needs only a

small amount of data to pollute a learning system (e.g., 1.75%

in the OSN spam ﬁlter [46] as shown in §VIII). When the data

to forget becomes large, retraining may work better.

B. Unlearning Work Flow

Given a training data sample to forget, unlearning updates

the system in two steps, following the learning process shown

in Figure 2. First, it updates the set of selected features. The

inputs at this step are the sample to forget, the old feature

set, and the summations previously computed for deriving the

old feature set. The outputs are the updated feature set and

summations. For example, Zozzle selects features using the

chi-squared test, which scores a feature based on four counts

(the simplest form of summations): how many malicious or

benign samples contain or do not contain this feature. To

support unlearning, we augmented Zozzle to store the score

and these counts for each feature. To unlearn a sample,

we update these counts to exclude this sample, re-score the

features, and select the top scored features as the updated

feature set. This process does not depend on the training data

set, and is much faster than retraining which has to inspect

each sample for each feature. The updated feature set in our

experiments is very similar to the old one with a couple of

features removed and added.

Second, unlearning updates the model. The inputs at this

step are the sample to forget, the old feature set, the updated

feature set, the old model, and the summations previously

computed for deriving the old model. The outputs are the

updated model and summations. If a feature is removed from

the feature set, we simply splice out the feature’s data from

the model. If a feature is added, we compute its data in the

model. In addition, we update summations that depend on

the sample to forget, and update the model accordingly. For

Zozzle which classiﬁes data as malicious or benign using na

ıve

Bayes, the summations are probabilities (e.g., the probability

that a training data sample is malicious given that it contains

a certain feature) computed using the counts recorded in the

ﬁrst step. Updating the probabilities and the model is thus

straightforward, and much faster than retraining.

IV. UNLEARNING APPROACH

As previously depicted in Figure 1, our unlearning approach

introduces a layer of a small number of summations between

the learning algorithm and the training data to break down

the dependencies. Now, the learning algorithm depends only

on the summations, each of which is the sum of some

efﬁciently computable transformations of the training data

samples. Chu et al. [33] show that many popular machine

learning algorithms, such as na

ıve Bayes, can be represented

in this form. To remove a data sample, we simply remove

the transformations of this data sample from the summations

that depend on this sample, which has O(1) complexity, and

Towards Making Systems Forget with Machine Unlearning

Figures

Citations

Trojaning Attack on Neural Networks

DeepXplore: Automated Whitebox Testing of Deep Learning Systems

DeepXplore: Automated Whitebox Testing of Deep Learning Systems

LEMNA: Explaining Deep Learning based Security Applications

Auror: defending against poisoning attacks in collaborative deep learning systems

References

Induction of Decision Trees

Item-based collaborative filtering recommendation algorithms

TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones

Support Vector Data Description

Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning

Related Papers (5)

Membership Inference Attacks Against Machine Learning Models

Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures

The Algorithmic Foundations of Differential Privacy

Deep Learning with Differential Privacy

Deep Residual Learning for Image Recognition

Frequently Asked Questions (13)

Q1. What are the contributions mentioned in the paper "Towards making systems forget with machine unlearning" ?

Q2. What have the authors stated for future works in "Towards making systems forget with machine unlearning" ?

Q3. How do they exploit the lineage in Figure 2?

Q4. How many iterations does the algorithm need to converge?

Q5. What is the way to make a system forget?

Q6. How can an attacker infer what a user purchased?

Q7. How long does it take to forget a polluted sample?

Q8. How many PDFs did it have to run?

Q9. What is the simplest way to unlearning a learning algorithm?

Q10. What is the way to reduce the detection effectiveness of the other three systems?

Q11. Why do the authors expect these scenarios to be rare?

Q12. How much data does an attacker need to pollute a learning system?

Q13. How many links were removed by Google by October 2014?

Trending Questions (1)