Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses.

Home
/
Papers
/
Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses.

Posted Content•

Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses.

Micah Goldblum, Dimitris Tsipras, Chulin Xie, Xinyun Chen, Avi Schwarzschild, Dawn Song, Aleksander Madry, Bo Li, Tom Goldstein - Show less +5 more

18 Dec 2020-arXiv: Learning-

TL;DR: In this article, the authors systematically categorize and discuss a wide range of dataset vulnerabilities and exploits, approaches for defending against these threats, and an array of open problems in this space.

read less

Abstract: As machine learning systems grow in scale, so do their training data requirements, forcing practitioners to automate and outsource the curation of training data in order to achieve state-of-the-art performance. The absence of trustworthy human supervision over the data collection process exposes organizations to security vulnerabilities; training data can be manipulated to control and degrade the downstream behaviors of learned models. The goal of this work is to systematically categorize and discuss a wide range of dataset vulnerabilities and exploits, approaches for defending against these threats, and an array of open problems in this space. In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.

...read moreread less

Citations

PDF

Open Access

More filters

Posted Content•

Evaluating Large Language Models Trained on Code

[...]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth A. Barnes, Ariel Herbert-Voss, William H. Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew M. Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Samuel McCandlish, Ilya Sutskever, Wojciech Zaremba - Show less +54 more

07 Jul 2021-arXiv: Learning

TL;DR: Codex as discussed by the authors is a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities, showing that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts.

...read moreread less

Abstract: We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.

...read moreread less

57 citations

Proceedings Article•DOI•

Strong Data Augmentation Sanitizes Poisoning and Backdoor Attacks Without an Accuracy Tradeoff

[...]

Eitan Borgnia¹, Valeriia Cherepanova¹, Liam Fowl¹, Amin Ghiasi¹, Jonas Geiping², Micah Goldblum¹, Tom Goldstein¹, Arjun Gupta¹ - Show less +4 more•Institutions (2)

University of Maryland, College Park¹, University of Siegen²

06 Jun 2021

TL;DR: In this paper, strong data augmentations, such as mixup and CutMix, can significantly diminish the threat of poisoning and backdoor attacks without trading off performance, and they further verify the effectiveness of this simple defense against adaptive poisoning methods, and compare to baselines including the popular differentially private SGD (DP-SGD) defense.

...read moreread less

Abstract: Data poisoning and backdoor attacks manipulate victim models by maliciously modifying training data. In light of this growing threat, a recent survey of industry professionals revealed heightened fear in the private sector regarding data poisoning. Many previous defenses against poisoning either fail in the face of increasingly strong attacks, or they significantly degrade performance. However, we find that strong data augmentations, such as mixup and CutMix, can significantly diminish the threat of poisoning and backdoor attacks without trading off performance. We further verify the effectiveness of this simple defense against adaptive poisoning methods, and we compare to baselines including the popular differentially private SGD (DP-SGD) defense. In the context of backdoors, CutMix greatly mitigates the attack while simultaneously increasing validation accuracy by 9%.

...read moreread less

22 citations

Posted Content•

Property Inference From Poisoning

[...]

Melissa Chase¹, Esha Ghosh¹, Saeed Mahloujifar²•Institutions (2)

Microsoft¹, Princeton University²

26 Jan 2021-arXiv: Learning

TL;DR: In this paper, the authors proposed a poisoning attack that allows the adversary to learn the prevalence in the training data of any property it chooses, which can boost the information leakage significantly and should be considered as a stronger threat model in sensitive applications.

...read moreread less

Abstract: Property inference attacks consider an adversary who has access to the trained model and tries to extract some global statistics of the training data. In this work, we study property inference in scenarios where the adversary can maliciously control part of the training data (poisoning data) with the goal of increasing the leakage. Previous work on poisoning attacks focused on trying to decrease the accuracy of models either on the whole population or on specific sub-populations or instances. Here, for the first time, we study poisoning attacks where the goal of the adversary is to increase the information leakage of the model. Our findings suggest that poisoning attacks can boost the information leakage significantly and should be considered as a stronger threat model in sensitive applications where some of the data sources may be malicious. We describe our \emph{property inference poisoning attack} that allows the adversary to learn the prevalence in the training data of any property it chooses. We theoretically prove that our attack can always succeed as long as the learning algorithm used has good generalization properties. We then verify the effectiveness of our attack by experimentally evaluating it on two datasets: a Census dataset and the Enron email dataset. We were able to achieve above $90\%$ attack accuracy with $9-10\%$ poisoning in all of our experiments.

...read moreread less

15 citations

Journal Article•DOI•

Adversarial XAI Methods in Cybersecurity

[...]

Aditya Kuppa¹, Nhien-An Le-Khac¹•Institutions (1)

University College Dublin¹

01 Oct 2021-IEEE Transactions on Information Forensics and Security

TL;DR: In this paper, a black-box attack that leverages explainable artificial intelligence (XAI) methods to compromise the confidentiality and privacy properties of underlying classifiers is proposed, which can also facilitate powerful evasion attacks such as poisoning and back door attacks.

...read moreread less

Abstract: Machine Learning methods are playing a vital role in combating ever-evolving threats in the cybersecurity domain. Explanation methods that shed light on the decision process of black-box classifiers are one of the biggest drivers in the successful adoption of these models. Explaining predictions that address ‘Why?/Why Not?’ questions help users/stakeholders/analysts understand and accept the predicted outputs with confidence and build trust. Counterfactual explanations are gaining popularity as an alternative method to help users to not only understand the decisions of black-box models (why?) but also to provide a mechanism to highlight mutually exclusive data instances that would change the outcomes (why not?). Recent Explainable Artificial Intelligence literature has focused on three main areas: (a) creating and improving explainability methods that help users better understand how the internal of ML models work as well as their outputs; (b) attacks on interpreters with a white-box setting; (c) defining the relevant properties, metrics of explanations generated by models. Nevertheless, there is no thorough study of how the model explanations can introduce new attack surfaces to the underlying systems. A motivated adversary can leverage the information provided by explanations to launch membership inference, and model extraction attacks to compromise the overall privacy of the system. Similarly, explanations can also facilitate powerful evasion attacks such as poisoning and back door attacks. In this paper, we cover this gap by tackling various cybersecurity properties and threat models related to counterfactual explanations. We propose a new black-box attack that leverages Explainable Artificial Intelligence (XAI) methods to compromise the confidentiality and privacy properties of underlying classifiers. We validate our approach with datasets and models used in the cyber security domain to demonstrate that our method achieves the attacker’s goal under threat models which reflect the real-world settings.

...read moreread less

12 citations

Posted Content•

What Doesn't Kill You Makes You Robust(er): Adversarial Training against Poisons and Backdoors.

[...]

Jonas Geiping, Liam Fowl, Gowthami Somepalli, Micah Goldblum, Michael Moeller, Tom Goldstein - Show less +2 more

26 Feb 2021-arXiv: Learning

TL;DR: In this paper, the authors extend the adversarial training framework to instead defend against (training-time) poisoning and backdoor attacks, and they show that this defense withstands adaptive attacks, generalizes to diverse threat models, and incurs a better performance trade-off than previous defenses.

...read moreread less

Abstract: Data poisoning is a threat model in which a malicious actor tampers with training data to manipulate outcomes at inference time. A variety of defenses against this threat model have been proposed, but each suffers from at least one of the following flaws: they are easily overcome by adaptive attacks, they severely reduce testing performance, or they cannot generalize to diverse data poisoning threat models. Adversarial training, and its variants, is currently considered the only empirically strong defense against (inference-time) adversarial attacks. In this work, we extend the adversarial training framework to instead defend against (training-time) poisoning and backdoor attacks. Our method desensitizes networks to the effects of poisoning by creating poisons during training and injecting them into training batches. We show that this defense withstands adaptive attacks, generalizes to diverse threat models, and incurs a better performance trade-off than previous defenses.

...read moreread less

10 citations

1
2
3
4
…
5
6
7
8
9

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Generative Adversarial Nets

[...]

Ian Goodfellow¹, Jean Pouget-Abadie¹, Mehdi Mirza¹, Bing Xu¹, David Warde-Farley¹, Sherjil Ozair², Aaron Courville¹, Yoshua Bengio¹ - Show less +4 more•Institutions (2)

Université de Montréal¹, Indian Institute of Technology Delhi²

08 Dec 2014

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

Abstract: We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to ½ everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

...read moreread less

38,211 citations

Dissertation•

Learning Multiple Layers of Features from Tiny Images

[...]

Alex Krizhevsky¹•Institutions (1)

University of Toronto¹

01 Jan 2009

TL;DR: In this paper, the authors describe how to train a multi-layer generative model of natural images, using a dataset of millions of tiny colour images, described in the next section.

...read moreread less

Abstract: In this work we describe how to train a multi-layer generative model of natural images. We use a dataset of millions of tiny colour images, described in the next section. This has been attempted by several groups but without success. The models on which we focus are RBMs (Restricted Boltzmann Machines) and DBNs (Deep Belief Networks). These models learn interesting-looking filters, which we show are more useful to a classifier than the raw pixels. We train the classifier on a labeled subset that we have collected and call the CIFAR-10 dataset.

...read moreread less

15,005 citations

Proceedings Article•

Intriguing properties of neural networks

[...]

Christian Szegedy¹, Wojciech Zaremba², Ilya Sutskever¹, Joan Bruna², Dumitru Erhan¹, Ian Goodfellow³, Rob Fergus², Rob Fergus⁴ - Show less +4 more•Institutions (4)

Google¹, New York University², Université de Montréal³, Facebook⁴

01 Jan 2014

TL;DR: It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.

...read moreread less

Abstract: Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such properties. First, we find that there is no distinction between individual high level units and random linear combinations of high level units, according to various methods of unit analysis. It suggests that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks. Second, we find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extend. We can cause the network to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.

...read moreread less

9,561 citations

Proceedings Article•DOI•

Extracting and composing robust features with denoising autoencoders

[...]

Pascal Vincent¹, Hugo Larochelle¹, Yoshua Bengio¹, Pierre-Antoine Manzagol¹•Institutions (1)

Université de Montréal¹

05 Jul 2008

TL;DR: This work introduces and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern.

...read moreread less

Abstract: Previous work has shown that the difficulties in learning deep generative or discriminative models can be overcome by an initial unsupervised learning step that maps inputs to useful intermediate representations. We introduce and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern. This approach can be used to train autoencoders, and these denoising autoencoders can be stacked to initialize deep architectures. The algorithm can be motivated from a manifold learning and information theoretic perspective or from a generative model perspective. Comparative experiments clearly show the surprising advantage of corrupting the input of autoencoders on a pattern classification benchmark suite.

...read moreread less

6,816 citations

Book Chapter•DOI•

Calibrating noise to sensitivity in private data analysis

[...]

Cynthia Dwork¹, Frank McSherry¹, Kobbi Nissim², Adam Smith³•Institutions (3)

Microsoft¹, Ben-Gurion University of the Negev², Weizmann Institute of Science³

04 Mar 2006

TL;DR: In this article, the authors show that for several particular applications substantially less noise is needed than was previously understood to be the case, and also show the separation results showing the increased value of interactive sanitization mechanisms over non-interactive.

...read moreread less

Abstract: We continue a line of research initiated in [10,11]on privacy-preserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function f mapping databases to reals, the so-called true answer is the result of applying f to the database. To protect privacy, the true answer is perturbed by the addition of random noise generated according to a carefully chosen distribution, and this response, the true answer plus noise, is returned to the user. Previous work focused on the case of noisy sums, in which f = ∑ig(xi), where xi denotes the ith row of the database and g maps database rows to [0,1]. We extend the study to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f. Roughly speaking, this is the amount that any single argument to f can change its output. The new analysis shows that for several particular applications substantially less noise is needed than was previously understood to be the case. The first step is a very clean characterization of privacy in terms of indistinguishability of transcripts. Additionally, we obtain separation results showing the increased value of interactive sanitization mechanisms over non-interactive.

...read moreread less

6,211 citations