Adversarial Training with Rectified Rejection

Open AccessPosted Content

Adversarial Training with Rectified Rejection

Tianyu Pang, +7 more

- 29 Sep 2021 -

arXiv: Learning

Chats0

TLDR

This paper proposed to use true confidence (T-Con) as a certainty oracle, and learn to predict T-Con by rectifying confidence, and showed that under mild conditions, a rectified confidence rejector and a confident rejector can be coupled to distinguish any wrongly classified input from correctly classified ones.

Abstract:

Adversarial training (AT) is one of the most effective strategies for promoting model robustness, whereas even the state-of-the-art adversarially trained models struggle to exceed 60% robust test accuracy on CIFAR-10 without additional data, which is far from practical. A natural way to break this accuracy bottleneck is to introduce a rejection option, where confidence is a commonly used certainty proxy. However, the vanilla confidence can overestimate the model certainty if the input is wrongly classified. To this end, we propose to use true confidence (T-Con) (i.e., predicted probability of the true class) as a certainty oracle, and learn to predict T-Con by rectifying confidence. We prove that under mild conditions, a rectified confidence (R-Con) rejector and a confidence rejector can be coupled to distinguish any wrongly classified input from correctly classified ones, even under adaptive attacks. We also quantify that training R-Con to be aligned with T-Con could be an easier task than learning robust classifiers. In our experiments, we evaluate our rectified rejection (RR) module on CIFAR-10, CIFAR-10-C, and CIFAR-100 under several attacks, and demonstrate that the RR module is well compatible with different AT frameworks on improving robustness, with little extra computation.

Citations

PDF

Open Access

More filters

Posted Content

RobustBench: a standardized adversarial robustness benchmark.

Francesco Croce, +7 more

- 19 Oct 2020 -

arXiv: Learning

TL;DR: This work evaluates robustness of models for their benchmark with AutoAttack, an ensemble of white- and black-box attacks which was recently shown in a large-scale study to improve almost all robustness evaluations compared to the original publications.

...read moreread less

Posted Content

Long-term Cross Adversarial Training: A Robust Meta-learning Method for Few-shot Classification Tasks

Fan Liu, +3 more

- 22 Jun 2021 -

arXiv: Learning

TL;DR: Long-term cross adversarial training (LCAT) as mentioned in this paper is a meta-learning method on the adversarially robust neural network called Long-term Cross Adversarial Training, which can update the model parameters cross along the natural and adversarial sample distribution with long-term to improve both adversarial and clean few-shot classification accuracy.

...read moreread less

Posted Content

Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them

Florian Tramèr

- 24 Jul 2021 -

arXiv: Learning

TL;DR: In this article, a general hardness reduction between detection and classification of adversarial examples is proposed, where given a robust detector for attacks at distance ω(n) in some metric, a similarly robust (but inefficient) classifier for attacks ω (n)/2.

...read moreread less

Posted Content

Accumulative Poisoning Attacks on Real-time Data

Tianyu Pang, +4 more

- 18 Jun 2021 -

arXiv: Learning

TL;DR: In this article, Zhao et al. proposed an attack strategy that associates an accumulative phase with poisoning attacks to secretly magnify the destructive effect of a (poisoned) trigger batch.

...read moreread less

Posted Content

Machine Learning with a Reject Option: A survey.

Kilian Hendrickx, +4 more

- 23 Jul 2021 -

arXiv: Learning

TL;DR: A survey on machine learning with a reject option can be found in this paper, where the authors define the conditions leading to two types of rejection, ambiguity and novelty rejection, and describe the standard learning strategies to train such models and relate traditional machine learning techniques to rejection.

...read moreread less

References

PDF

Open Access

More filters

Journal Article

Visualizing Data using t-SNE

Laurens van der Maaten, +1 more

- 01 Jan 2008 -

Journal of Machine Learning Research

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.

...read moreread less

Dissertation

Learning Multiple Layers of Features from Tiny Images

Alex Krizhevsky

TL;DR: In this paper, the authors describe how to train a multi-layer generative model of natural images, using a dataset of millions of tiny colour images, described in the next section.

...read moreread less

Proceedings Article

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, +20 more

TL;DR: This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.

...read moreread less

Proceedings Article

Intriguing properties of neural networks

Christian Szegedy, +7 more

TL;DR: It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.

...read moreread less

Proceedings Article

Explaining and Harnessing Adversarial Examples

Ian Goodfellow, +2 more

TL;DR: It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.

...read moreread less