scispace - formally typeset
Open AccessPosted Content

Towards Out-Of-Distribution Generalization: A Survey

Reads0
Chats0
TLDR
A comprehensive survey of OOD generalization methods can be found at http://out-of-distributiongeneralization.com as mentioned in this paper, where the authors provide a formal definition of the OOD problem.
Abstract
Classic machine learning methods are built on the $i.i.d.$ assumption that training and testing data are independent and identically distributed. However, in real scenarios, the $i.i.d.$ assumption can hardly be satisfied, rendering the sharp drop of classic machine learning algorithms' performances under distributional shifts, which indicates the significance of investigating the Out-of-Distribution generalization problem. Out-of-Distribution (OOD) generalization problem addresses the challenging setting where the testing distribution is unknown and different from the training. This paper serves as the first effort to systematically and comprehensively discuss the OOD generalization problem, from the definition, methodology, evaluation to the implications and future directions. Firstly, we provide the formal definition of the OOD generalization problem. Secondly, existing methods are categorized into three parts based on their positions in the whole learning pipeline, namely unsupervised representation learning, supervised model learning and optimization, and typical methods for each category are discussed in detail. We then demonstrate the theoretical connections of different categories, and introduce the commonly used datasets and evaluation metrics. Finally, we summarize the whole literature and raise some future directions for OOD generalization problem. The summary of OOD generalization methods reviewed in this survey can be found at http://out-of-distribution-generalization.com.

read more

Citations
More filters
Posted Content

OoD-Bench: Benchmarking and Understanding Out-of-Distribution Generalization Datasets and Algorithms

TL;DR: In this article, the authors identify and measure two distinct kinds of distribution shifts that are ubiquitous in various datasets and compare various OoD generalization algorithms with a new benchmark dominated by the two distribution shifts.
Posted Content

Deep Long-Tailed Learning: A Survey

TL;DR: A comprehensive survey on deep long-tailed learning can be found in this paper, where the authors group existing long-tail learning studies into three main categories (i.e., class re-balancing, information augmentation and module improvement).
Posted ContentDOI

Constructing benchmark test sets for biological sequence analysis using independent set algorithms

TL;DR: In this paper, independent set graph algorithms are used to split sequence data into dissimilar training and test sets, such that each test sequence is less than p% identical to any individual training sequence.
Posted Content

Confounder Identification-free Causal Visual Feature Learning

TL;DR: In this article, a Confounder Identification-free Causal Visual Feature Learning (CICF) method is proposed to learn causal features that are free of interference from confounders.
Posted Content

A benchmark with decomposed distribution shifts for 360 monocular depth estimation

TL;DR: In this paper, a distribution shift benchmark for monocular depth estimation is proposed, which decomposes the wider distribution shift of uncontrolled testing on in-the-wild data, to three distinct distribution shifts.
References
More filters
Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Journal ArticleDOI

Gradient-based learning applied to document recognition

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Journal ArticleDOI

Regression Shrinkage and Selection via the Lasso

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Related Papers (5)