Open AccessPosted Content
Adversarial Training with Rectified Rejection
Reads0
Chats0
TLDR
This paper proposed to use true confidence (T-Con) as a certainty oracle, and learn to predict T-Con by rectifying confidence, and showed that under mild conditions, a rectified confidence rejector and a confident rejector can be coupled to distinguish any wrongly classified input from correctly classified ones.Abstract:
Adversarial training (AT) is one of the most effective strategies for promoting model robustness, whereas even the state-of-the-art adversarially trained models struggle to exceed 60% robust test accuracy on CIFAR-10 without additional data, which is far from practical. A natural way to break this accuracy bottleneck is to introduce a rejection option, where confidence is a commonly used certainty proxy. However, the vanilla confidence can overestimate the model certainty if the input is wrongly classified. To this end, we propose to use true confidence (T-Con) (i.e., predicted probability of the true class) as a certainty oracle, and learn to predict T-Con by rectifying confidence. We prove that under mild conditions, a rectified confidence (R-Con) rejector and a confidence rejector can be coupled to distinguish any wrongly classified input from correctly classified ones, even under adaptive attacks. We also quantify that training R-Con to be aligned with T-Con could be an easier task than learning robust classifiers. In our experiments, we evaluate our rectified rejection (RR) module on CIFAR-10, CIFAR-10-C, and CIFAR-100 under several attacks, and demonstrate that the RR module is well compatible with different AT frameworks on improving robustness, with little extra computation.read more
Citations
More filters
Posted Content
RobustBench: a standardized adversarial robustness benchmark.
Francesco Croce,Maksym Andriushchenko,Vikash Sehwag,Edoardo Debenedetti,Nicolas Flammarion,Mung Chiang,Prateek Mittal,Matthias Hein +7 more
TL;DR: This work evaluates robustness of models for their benchmark with AutoAttack, an ensemble of white- and black-box attacks which was recently shown in a large-scale study to improve almost all robustness evaluations compared to the original publications.
Posted Content
Long-term Cross Adversarial Training: A Robust Meta-learning Method for Few-shot Classification Tasks
TL;DR: Long-term cross adversarial training (LCAT) as mentioned in this paper is a meta-learning method on the adversarially robust neural network called Long-term Cross Adversarial Training, which can update the model parameters cross along the natural and adversarial sample distribution with long-term to improve both adversarial and clean few-shot classification accuracy.
Posted Content
Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them
TL;DR: In this article, a general hardness reduction between detection and classification of adversarial examples is proposed, where given a robust detector for attacks at distance ω(n) in some metric, a similarly robust (but inefficient) classifier for attacks ω (n)/2.
Posted Content
Accumulative Poisoning Attacks on Real-time Data
TL;DR: In this article, Zhao et al. proposed an attack strategy that associates an accumulative phase with poisoning attacks to secretly magnify the destructive effect of a (poisoned) trigger batch.
Posted Content
Machine Learning with a Reject Option: A survey.
TL;DR: A survey on machine learning with a reject option can be found in this paper, where the authors define the conditions leading to two types of rejection, ambiguity and novelty rejection, and describe the standard learning strategies to train such models and relate traditional machine learning techniques to rejection.
References
More filters
Journal Article
Visualizing Data using t-SNE
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Dissertation
Learning Multiple Layers of Features from Tiny Images
TL;DR: In this paper, the authors describe how to train a multi-layer generative model of natural images, using a dataset of millions of tiny colour images, described in the next section.
Proceedings Article
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke,Sam Gross,Francisco Massa,Adam Lerer,James Bradbury,Gregory Chanan,Trevor Killeen,Zeming Lin,Natalia Gimelshein,Luca Antiga,Alban Desmaison,Andreas Kopf,Edward Z. Yang,Zachary DeVito,Martin Raison,Alykhan Tejani,Sasank Chilamkurthy,Benoit Steiner,Lu Fang,Junjie Bai,Soumith Chintala +20 more
TL;DR: This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.
Proceedings Article
Intriguing properties of neural networks
Christian Szegedy,Wojciech Zaremba,Ilya Sutskever,Joan Bruna,Dumitru Erhan,Ian Goodfellow,Rob Fergus,Rob Fergus +7 more
TL;DR: It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.
Proceedings Article
Explaining and Harnessing Adversarial Examples
TL;DR: It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.