Towards Making Systems Forget with Machine Unlearning
read more
Citations
Trojaning Attack on Neural Networks
DeepXplore: Automated Whitebox Testing of Deep Learning Systems
DeepXplore: Automated Whitebox Testing of Deep Learning Systems
LEMNA: Explaining Deep Learning based Security Applications
Auror: defending against poisoning attacks in collaborative deep learning systems
References
Induction of Decision Trees
Item-based collaborative filtering recommendation algorithms
TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones
Support Vector Data Description
Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning
Related Papers (5)
Frequently Asked Questions (13)
Q2. What have the authors stated for future works in "Towards making systems forget with machine unlearning" ?
The authors plan to build full-fledged forgetting systems that carefully track data lineage at many levels of granularity, across all operations, and at potentially the Web scale.
Q3. How do they exploit the lineage in Figure 2?
Fredrikson et al. [43] show that with the model and some demographic information about a patient, an attacker can infer the genetic markers of the patient with accuracy as high as 75%.2) Training Data Pollution Attacks: Another way to exploit the lineage in Figure 2 is using training data pollution attacks.
Q4. How many iterations does the algorithm need to converge?
The number of iterations required for the algorithm to converge depends on the algorithm, the initial state selected, and the training data.
Q5. What is the way to make a system forget?
To forget a piece of training data completely, these systems need to revert the effects of the data on the extracted features and models.
Q6. How can an attacker infer what a user purchased?
Calandrino et al. [29] show that once an attacker learns (1) the item-item similarities, (2) the list of recommended items for a user before she purchased an item, and (3) the list after, the attacker can accurately infer what the user purchased by essentially inverting the computation done by the recommendation algorithm.
Q7. How long does it take to forget a polluted sample?
For instance, with a real-world data set from Huawei (see §VII), it takes Zozzle [35], a JavaScript malware detector, over a day to retrain and forget a polluted sample.
Q8. How many PDFs did it have to run?
It obtained up to 104× speedup except for PJScan because its largest data set has only 65 PDFs, so the execution time was dominated by program start and shutdown not learning.
Q9. What is the simplest way to unlearning a learning algorithm?
unlearning can simply “resume” the iterative learning algorithm from this state on the updated training data set, and it should take much fewer iterations to converge than restarting from the original or a newly generated initial state.
Q10. What is the way to reduce the detection effectiveness of the other three systems?
For each of the other three systems, because there is no known attack, the authors created a new, practical data pollution attack to decrease the detection effectiveness.
Q11. Why do the authors expect these scenarios to be rare?
The authors expect these scenarios to be rare because adaptive algorithms need to be robust anyway for convergence during normal operations.
Q12. How much data does an attacker need to pollute a learning system?
an attacker needs only a small amount of data to pollute a learning system (e.g., 1.75% in the OSN spam filter [46] as shown in §VIII).
Q13. How many links were removed by Google by October 2014?
For instance, Google had removed 171,183 links [50] by October 2014 under the “right to be forgotten” ruling of the highest court in the European Union.