scispace - formally typeset
Search or ask a question
Posted Content

Learning under $p$-Tampering Attacks

TL;DR: It is shown that PAC learning is possible under $p-tampering poisoning attacks essentially whenever it is possible in the realizable setting without the attacks, and PAC learning under `no-mistake' adversarial noise is not possible.
Abstract: Mahloujifar and Mahmoody (TCC'17) studied attacks against learning algorithms using a special case of Valiant's malicious noise, called $p$-tampering, in which the adversary could change training examples with independent probability $p$ but only using correct labels. They showed the power of such attacks by increasing the error probability in the so called `targeted' poisoning model in which the adversary's goal is to increase the loss of the generated hypothesis over a particular test example. At the heart of their attack was an efficient algorithm to bias the average output of any bounded real-valued function through $p$-tampering. In this work, we present new attacks for biasing the average output of bounded real-valued functions, improving upon the biasing attacks of MM16. Our improved biasing attacks, directly imply improved $p$-tampering attacks against learners in the targeted poisoning model. As a bonus, our attacks come with considerably simpler analysis compared to previous attacks. We also study the possibility of PAC learning under $p$-tampering attacks in the \emph{non-targeted} (aka indiscriminate) setting where the adversary's goal is to increase the risk of the generated hypothesis (for a random test example). We show that PAC learning is \emph{possible} under $p$-tampering poisoning attacks essentially whenever it is possible in the realizable setting without the attacks. We further show that PAC learning under `no-mistake' adversarial noise is \emph{not} possible, if the adversary could choose the (still limited to only $p$ fraction of) tampered examples that she substitutes with adversarially chosen ones. Our formal model for such `bounded-budget' tampering attackers is inspired by the notions of (strong) adaptive corruption in secure multi-party computation.
Citations
More filters
Posted Content
TL;DR: In this article, the authors present an optimization-based method for crafting poisons, and show that just one single poison image can control classifier behavior when transfer learning is used, and demonstrate their method by generating poisoned frog images from CIFAR dataset and using them to manipulate image classifiers.
Abstract: Data poisoning is an attack on machine learning models wherein the attacker adds examples to the training set to manipulate the behavior of the model at test time. This paper explores poisoning attacks on neural nets. The proposed attacks use "clean-labels"; they don't require the attacker to have any control over the labeling of training data. They are also targeted; they control the behavior of the classifier on a $\textit{specific}$ test instance without degrading overall classifier performance. For example, an attacker could add a seemingly innocuous image (that is properly labeled) to a training set for a face recognition engine, and control the identity of a chosen person at test time. Because the attacker does not need to control the labeling function, poisons could be entered into the training set simply by leaving them on the web and waiting for them to be scraped by a data collection bot. We present an optimization-based method for crafting poisons, and show that just one single poison image can control classifier behavior when transfer learning is used. For full end-to-end training, we present a "watermarking" strategy that makes poisoning reliable using multiple ($\approx$50) poisoned training instances. We demonstrate our method by generating poisoned frog images from the CIFAR dataset and using them to manipulate image classifiers.

260 citations

Posted Content
TL;DR: A new "polytope attack" is proposed in which poison images are designed to surround the targeted image in feature space, and it is demonstrated that using Dropout during poison creation helps to enhance transferability of this attack.
Abstract: Clean-label poisoning attacks inject innocuous looking (and "correctly" labeled) poison images into training data, causing a model to misclassify a targeted image after being trained on this data. We consider transferable poisoning attacks that succeed without access to the victim network's outputs, architecture, or (in some cases) training data. To achieve this, we propose a new "polytope attack" in which poison images are designed to surround the targeted image in feature space. We also demonstrate that using Dropout during poison creation helps to enhance transferability of this attack. We achieve transferable attack success rates of over 50% while poisoning only 1% of the training set.

160 citations

Posted Content
TL;DR: This work leverages adversarial perturbations and generative models to execute efficient, yet label-consistent, backdoor attacks, based on injecting inputs that appear plausible, yet are hard to classify, hence causing the model to rely on the (easier-to-learn) backdoor trigger.
Abstract: Deep neural networks have been demonstrated to be vulnerable to backdoor attacks. Specifically, by injecting a small number of maliciously constructed inputs into the training set, an adversary is able to plant a backdoor into the trained model. This backdoor can then be activated during inference by a backdoor trigger to fully control the model's behavior. While such attacks are very effective, they crucially rely on the adversary injecting arbitrary inputs that are---often blatantly---mislabeled. Such samples would raise suspicion upon human inspection, potentially revealing the attack. Thus, for backdoor attacks to remain undetected, it is crucial that they maintain label-consistency---the condition that injected inputs are consistent with their labels. In this work, we leverage adversarial perturbations and generative models to execute efficient, yet label-consistent, backdoor attacks. Our approach is based on injecting inputs that appear plausible, yet are hard to classify, hence causing the model to rely on the (easier-to-learn) backdoor trigger.

156 citations


Cites background from "Learning under $p$-Tampering Attack..."

  • ...Attacks restricted to only using correctly label poisoned samples have been explored in prior work, being referred to as “defensible” (Mahloujifar & Mahmoody, 2017; Mahloujifar et al., 2017), “plausible” (Mahloujifar et al....

    [...]

  • ...…to only using correctly label poisoned samples have been explored in prior work, being referred to as “defensible” (Mahloujifar & Mahmoody, 2017; Mahloujifar et al., 2017), “plausible” (Mahloujifar et al., 2018; Mahloujifar & Mahmoody, 2018), “visually indistinguishable” (Koh & Liang, 2017),…...

    [...]

Posted Content
TL;DR: This work undertake a rigorous study of defenses against data poisoning for online learning, and studies four standard defenses in a powerful threat model, and provides conditions under which they can allow or resist rapid poisoning.
Abstract: Data poisoning attacks -- where an adversary can modify a small fraction of training data, with the goal of forcing the trained classifier to high loss -- are an important threat for machine learning in many applications. While a body of prior work has developed attacks and defenses, there is not much general understanding on when various attacks and defenses are effective. In this work, we undertake a rigorous study of defenses against data poisoning for online learning. First, we study four standard defenses in a powerful threat model, and provide conditions under which they can allow or resist rapid poisoning. We then consider a weaker and more realistic threat model, and show that the success of the adversary in the presence of data poisoning defenses there depends on the "ease" of the learning problem.

5 citations

Journal ArticleDOI
TL;DR: This survey summarize and categorize existing attack methods and corresponding defenses, as well as demonstrate compelling application scenarios, thus providing a unified framework to analyze poisoning attacks and lay the foundation for a more standardized approach to reproducible studies.
Abstract: Machine learning (ML) has been universally adopted for automated decisions in a variety of fields, including recognition and classification applications, recommendation systems, natural language processing, and so on. However, in light of high expenses on training data and computing resources, recent years have witnessed a rapid increase in outsourced ML training, either partially or completely, which provides vulnerabilities for adversaries to exploit. A prime threat in training phase is called poisoning attack, where adversaries strive to subvert the behavior of machine learning systems by poisoning training data or other means of interference. Although a growing number of relevant studies have been proposed, the research among poisoning attack is still overly scattered, with each paper focusing on a particular task in a specific domain. In this survey, we summarize and categorize existing attack methods and corresponding defenses, as well as demonstrate compelling application scenarios, thus providing a unified framework to analyze poisoning attacks. Besides, we also discuss the main limitations of current works, along with the corresponding future directions to facilitate further researches. Our ultimate motivation is to provide a comprehensive and self-contained survey of this growing field of research and lay the foundation for a more standardized approach to reproducible studies.

5 citations

References
More filters
Proceedings Article
01 Jan 2014
TL;DR: It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.
Abstract: Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such properties. First, we find that there is no distinction between individual high level units and random linear combinations of high level units, according to various methods of unit analysis. It suggests that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks. Second, we find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extend. We can cause the network to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.

9,561 citations

Book ChapterDOI
TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.
Abstract: Upper bounds are derived for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt. It is assumed that the range of each summand of S is bounded or bounded above. The bounds for Pr {S – ES ≥ nt} depend only on the endpoints of the ranges of the summands and the mean, or the mean and the variance of S. These results are then used to obtain analogous inequalities for certain sums of dependent random variables such as U statistics and the sum of a random sample without replacement from a finite population.

8,655 citations

Proceedings Article
20 Mar 2015
TL;DR: It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.
Abstract: Several machine learning models, including neural networks, consistently misclassify adversarial examples---inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence. Early attempts at explaining this phenomenon focused on nonlinearity and overfitting. We argue instead that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature. This explanation is supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets. Moreover, this view yields a simple and fast method of generating adversarial examples. Using this approach to provide examples for adversarial training, we reduce the test set error of a maxout network on the MNIST dataset.

7,994 citations


"Learning under $p$-Tampering Attack..." refers background in this paper

  • ...Such resulting misclassified perturbed instances are called adversarial examples and attacks aimed at finding such examples are called evasion attacks [8,13,22,28,30,44]....

    [...]

Proceedings ArticleDOI
05 Nov 1984
TL;DR: This paper regards learning as the phenomenon of knowledge acquisition in the absence of explicit programming, and gives a precise methodology for studying this phenomenon from a computational viewpoint.
Abstract: Humans appear to be able to learn new concepts without needing to be programmed explicitly in any conventional sense. In this paper we regard learning as the phenomenon of knowledge acquisition in the absence of explicit programming. We give a precise methodology for studying this phenomenon from a computational viewpoint. It consists of choosing an appropriate information gathering mechanism, the learning protocol, and exploring the class of concepts that can be learnt using it in a reasonable (polynomial) number of steps. We find that inherent algorithmic complexity appears to set serious limits to the range of concepts that can be so learnt. The methodology and results suggest concrete principles for designing realistic learning systems.

5,311 citations


"Learning under $p$-Tampering Attack..." refers background or methods in this paper

  • ...In his seminal work [39], Valiant introduced the Probably Approximately Correct (PAC) model of learning that triggered a significant amount of work in the the-...

    [...]

  • ...For example, properly learning monomials [39], or using 3-CNF formulae to learn 3-term DNF formulae [31]; the latter Definition 11 (Efficient Realizability)....

    [...]

Proceedings ArticleDOI
27 Jun 2016
TL;DR: DeepFool as discussed by the authors proposes the DeepFool algorithm to efficiently compute perturbations that fool deep networks, and thus reliably quantify the robustness of these classifiers by making them more robust.
Abstract: State-of-the-art deep neural networks have achieved impressive results on many image classification tasks. However, these same architectures have been shown to be unstable to small, well sought, perturbations of the images. Despite the importance of this phenomenon, no effective methods have been proposed to accurately compute the robustness of state-of-the-art deep classifiers to such perturbations on large-scale datasets. In this paper, we fill this gap and propose the DeepFool algorithm to efficiently compute perturbations that fool deep networks, and thus reliably quantify the robustness of these classifiers. Extensive experimental results show that our approach outperforms recent methods in the task of computing adversarial perturbations and making classifiers more robust.1

4,505 citations