scispace - formally typeset
Open AccessPosted Content

Adversarial Training with Rectified Rejection

Reads0
Chats0
TLDR
This paper proposed to use true confidence (T-Con) as a certainty oracle, and learn to predict T-Con by rectifying confidence, and showed that under mild conditions, a rectified confidence rejector and a confident rejector can be coupled to distinguish any wrongly classified input from correctly classified ones.
Abstract
Adversarial training (AT) is one of the most effective strategies for promoting model robustness, whereas even the state-of-the-art adversarially trained models struggle to exceed 60% robust test accuracy on CIFAR-10 without additional data, which is far from practical. A natural way to break this accuracy bottleneck is to introduce a rejection option, where confidence is a commonly used certainty proxy. However, the vanilla confidence can overestimate the model certainty if the input is wrongly classified. To this end, we propose to use true confidence (T-Con) (i.e., predicted probability of the true class) as a certainty oracle, and learn to predict T-Con by rectifying confidence. We prove that under mild conditions, a rectified confidence (R-Con) rejector and a confidence rejector can be coupled to distinguish any wrongly classified input from correctly classified ones, even under adaptive attacks. We also quantify that training R-Con to be aligned with T-Con could be an easier task than learning robust classifiers. In our experiments, we evaluate our rectified rejection (RR) module on CIFAR-10, CIFAR-10-C, and CIFAR-100 under several attacks, and demonstrate that the RR module is well compatible with different AT frameworks on improving robustness, with little extra computation.

read more

Citations
More filters
Posted Content

RobustBench: a standardized adversarial robustness benchmark.

TL;DR: This work evaluates robustness of models for their benchmark with AutoAttack, an ensemble of white- and black-box attacks which was recently shown in a large-scale study to improve almost all robustness evaluations compared to the original publications.
Posted Content

Long-term Cross Adversarial Training: A Robust Meta-learning Method for Few-shot Classification Tasks

TL;DR: Long-term cross adversarial training (LCAT) as mentioned in this paper is a meta-learning method on the adversarially robust neural network called Long-term Cross Adversarial Training, which can update the model parameters cross along the natural and adversarial sample distribution with long-term to improve both adversarial and clean few-shot classification accuracy.
Posted Content

Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them

TL;DR: In this article, a general hardness reduction between detection and classification of adversarial examples is proposed, where given a robust detector for attacks at distance ω(n) in some metric, a similarly robust (but inefficient) classifier for attacks ω (n)/2.
Posted Content

Accumulative Poisoning Attacks on Real-time Data

TL;DR: In this article, Zhao et al. proposed an attack strategy that associates an accumulative phase with poisoning attacks to secretly magnify the destructive effect of a (poisoned) trigger batch.
Posted Content

Machine Learning with a Reject Option: A survey.

TL;DR: A survey on machine learning with a reject option can be found in this paper, where the authors define the conditions leading to two types of rejection, ambiguity and novelty rejection, and describe the standard learning strategies to train such models and relate traditional machine learning techniques to rejection.
References
More filters
Journal Article

Visualizing Data using t-SNE

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Dissertation

Learning Multiple Layers of Features from Tiny Images

TL;DR: In this paper, the authors describe how to train a multi-layer generative model of natural images, using a dataset of millions of tiny colour images, described in the next section.
Proceedings Article

Intriguing properties of neural networks

TL;DR: It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.
Proceedings Article

Explaining and Harnessing Adversarial Examples

TL;DR: It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.
Related Papers (5)