scispace - formally typeset
Open accessPosted Content

Generalizing to Unseen Domains: A Survey on Domain Generalization

02 Mar 2021-arXiv: Learning-
Abstract: Machine learning systems generally assume that the training and testing distributions are the same. To this end, a key requirement is to develop models that can generalize to unseen distributions. Domain generalization (DG), i.e., out-of-distribution generalization, has attracted increasing interests in recent years. Domain generalization deals with a challenging setting where one or several different but related domain(s) are given, and the goal is to learn a model that can generalize to an unseen test domain. Great progress has been made in the area of domain generalization for years. This paper presents the first review of recent advances in this area. First, we provide a formal definition of domain generalization and discuss several related fields. We then thoroughly review the theories related to domain generalization and carefully analyze the theory behind generalization. We categorize recent algorithms into three classes: data manipulation, representation learning, and learning strategy, and present several popular algorithms in detail for each category. Third, we introduce the commonly used datasets and applications. Finally, we summarize existing literature and present some potential research topics for the future.

... read more


17 results found

Open accessPosted Content
Abstract: Despite remarkable success in a variety of applications, it is well-known that deep learning can fail catastrophically when presented with out-of-distribution data. Toward addressing this challenge, we consider the domain generalization problem, wherein predictors are trained using data drawn from a family of related training domains and then evaluated on a distinct and unseen test domain. We show that under a natural model of data generation and a concomitant invariance condition, the domain generalization problem is equivalent to an infinite-dimensional constrained statistical learning problem; this problem forms the basis of our approach, which we call Model-Based Domain Generalization. Due to the inherent challenges in solving constrained optimization problems in deep learning, we exploit nonconvex duality theory to develop unconstrained relaxations of this statistical problem with tight bounds on the duality gap. Based on this theoretical motivation, we propose a novel domain generalization algorithm with convergence guarantees. In our experiments, we report improvements of up to 30 percentage points over state-of-the-art domain generalization baselines on several benchmarks including ColoredMNIST, Camelyon17-WILDS, FMoW-WILDS, and PACS.

... read more

Topics: Generalization (60%), Domain (software engineering) (59%), Duality gap (52%) ... show more

7 Citations

Open accessPosted Content
Zheyan Shen1, Jiashuo Liu, Yue He1, Xingxuan Zhang  +3 moreInstitutions (1)
31 Aug 2021-arXiv: Learning
Abstract: Classic machine learning methods are built on the $i.i.d.$ assumption that training and testing data are independent and identically distributed. However, in real scenarios, the $i.i.d.$ assumption can hardly be satisfied, rendering the sharp drop of classic machine learning algorithms' performances under distributional shifts, which indicates the significance of investigating the Out-of-Distribution generalization problem. Out-of-Distribution (OOD) generalization problem addresses the challenging setting where the testing distribution is unknown and different from the training. This paper serves as the first effort to systematically and comprehensively discuss the OOD generalization problem, from the definition, methodology, evaluation to the implications and future directions. Firstly, we provide the formal definition of the OOD generalization problem. Secondly, existing methods are categorized into three parts based on their positions in the whole learning pipeline, namely unsupervised representation learning, supervised model learning and optimization, and typical methods for each category are discussed in detail. We then demonstrate the theoretical connections of different categories, and introduce the commonly used datasets and evaluation metrics. Finally, we summarize the whole literature and raise some future directions for OOD generalization problem. The summary of OOD generalization methods reviewed in this survey can be found at

... read more

Topics: Generalization (57%), Feature learning (53%)

6 Citations

Open accessPosted Content
Haotian Ye1, Chuanlong Xie2, Tianle Cai1, Ruichen Li1  +2 moreInstitutions (2)
08 Jun 2021-arXiv: Learning
Abstract: Generalization to out-of-distribution (OOD) data, or domain generalization, is one of the central problems in modern machine learning. Recently, there is a surge of attempts to propose algorithms for OOD that mainly build upon the idea of extracting invariant features. Although intuitively reasonable, theoretical understanding of what kind of invariance can guarantee OOD generalization is still limited, and generalization to arbitrary out-of-distribution is clearly impossible. In this work, we take the first step towards rigorous and quantitative definitions of 1) what is OOD; and 2) what does it mean by saying an OOD problem is learnable. We also introduce a new concept of expansion function, which characterizes to what extent the variance is amplified in the test domains over the training domains, and therefore give a quantitative meaning of invariant features. Based on these, we prove OOD generalization error bounds. It turns out that OOD generalization largely depends on the expansion function. As recently pointed out by Gulrajani and Lopez-Paz (2020), any OOD learning algorithm without a model selection module is incomplete. Our theory naturally induces a model selection criterion. Extensive experiments on benchmark OOD datasets demonstrate that our model selection criterion has a significant advantage over baselines.

... read more

Topics: Generalization (56%)

4 Citations

Open accessPosted ContentDOI: 10.1101/2021.05.06.443032
Min Oh1, Liqing Zhang1Institutions (1)
07 May 2021-bioRxiv
Abstract: Recent studies revealed that gut microbiota modulates the response to cancer immunotherapy and fecal microbiota transplantation has clinical benefit in melanoma patients during the treatment. Understanding microbiota affecting individual response is crucial to advance precision oncology. However, it is challenging to identify the key microbial taxa with limited data as statistical and machine learning models often lose their generalizability. In this study, DeepGeni, a deep generalized interpretable autoencoder, is proposed to improve the generalizability and interpretability of microbiome profiles by augmenting data and by introducing interpretable links in the autoencoder. DeepGeni-based machine learning classifier outperforms state-of-the-art classifier in the microbiome-driven prediction of responsiveness of melanoma patients treated with immune checkpoint inhibitors. DeepGeni-based machine learning classifier outperforms state-of-the-art classifier in the microbiome-driven responsiveness prediction of melanoma patients treated with immune checkpoint inhibitors. Also, the interpretable links of DeepGeni elucidate the most informative microbiota associated with cancer immunotherapy response.

... read more

Topics: Autoencoder (52%)

1 Citations

Open accessPosted ContentDOI: 10.1101/2021.05.25.445658
26 May 2021-bioRxiv
Abstract: Data discrepancy between preclinical and clinical datasets poses a major challenge for accurate drug response prediction based on gene expression data. Different methods of transfer learning have been proposed to address this data discrepancy. These methods generally use cell lines as source domains and patients, patient-derived xenografts, or other cell lines as target domains. However, they assume that they have access to the target domain during training or fine-tuning and they can only take labeled source domains as input. The former is a strong assumption that is not satisfied during deployment of these models in the clinic. The latter means these methods rely on labeled source domains which are of limited size. To avoid these assumptions, we formulate drug response prediction as an out-of-distribution generalization problem which does not assume that the target domain is accessible during training. Moreover, to exploit unlabeled source domain data, which tends to be much more plentiful than labeled data, we adopt a semi-supervised approach. We propose Velodrome, a semi-supervised method of out-of-distribution generalization that takes labeled and unlabeled data from different resources as input and makes generalizable predictions. Velodrome achieves this goal by introducing an objective function that combines a supervised loss for accurate prediction, an alignment loss for generalization, and a consistency loss to incorporate unlabeled samples. Our experimental results demonstrate that Velodrome outperforms state-of-the-art pharmacogenomics and transfer learning baselines on cell lines, patient-derived xenografts, and patients. Finally, we showed that Velodrome models generalize to different tissue types that were well-represented, under-represented, or completely absent in the training data. Overall, our results suggest that Velodrome may guide precision oncology more accurately.

... read more

Topics: Generalization (50%)

1 Citations


169 results found

Open accessJournal ArticleDOI: 10.1023/A:1022627411411
Corinna Cortes1, Vladimir Vapnik1Institutions (1)
15 Sep 1995-Machine Learning
Abstract: The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

... read more

Topics: Feature learning (63%), Active learning (machine learning) (62%), Feature vector (62%) ... show more

35,157 Citations

Proceedings ArticleDOI: 10.18653/V1/N19-1423
11 Oct 2018-
Abstract: We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a; Radford et al., 2018), BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5 (7.7 point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).

... read more

Topics: Question answering (54%), Language model (52%)

24,672 Citations

Journal ArticleDOI: 10.1109/TKDE.2009.191
Sinno Jialin Pan1, Qiang Yang1Institutions (1)
Abstract: A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data-labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression, and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift. We also explore some potential future issues in transfer learning research.

... read more

Topics: Semi-supervised learning (69%), Inductive transfer (68%), Multi-task learning (67%) ... show more

13,267 Citations

Open accessJournal ArticleDOI: 10.1109/TPAMI.2013.50
Abstract: The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.

... read more

8,575 Citations

Open accessProceedings ArticleDOI: 10.1109/CVPR.2016.350
Marius Cordts1, Mohamed Omran2, Sebastian Ramos3, Timo Rehfeld1  +5 moreInstitutions (3)
01 Jun 2016-
Abstract: Visual understanding of complex urban street scenes is an enabling factor for a wide range of applications. Object detection has benefited enormously from large-scale datasets, especially in the context of deep learning. For semantic urban scene understanding, however, no current dataset adequately captures the complexity of real-world urban scenes. To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling. Cityscapes is comprised of a large, diverse set of stereo video sequences recorded in streets from 50 different cities. 5000 of these images have high quality pixel-level annotations, 20 000 additional images have coarse annotations to enable methods that leverage large volumes of weakly-labeled data. Crucially, our effort exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity. Our accompanying empirical study provides an in-depth analysis of the dataset characteristics, as well as a performance evaluation of several state-of-the-art approaches based on our benchmark.

... read more

5,848 Citations

No. of citations received by the Paper in previous years
Network Information
Related Papers (5)
Domain Generalization with Adversarial Feature Learning01 Jun 2018

Haoliang Li, Sinno Jialin Pan +2 more

84% related
A theory of learning from different domains01 May 2010, Machine Learning

Shai Ben-David, John Blitzer +4 more

80% related
Domain Generalization: A Survey01 Jan 2021

Kaiyang Zhou, Ziwei Liu +3 more

80% related
Generalizing from Several Related Classification Tasks to a New Unlabeled Sample12 Dec 2011

Gilles Blanchard, Gyemin Lee +1 more

75% related
Analysis of Representations for Domain Adaptation04 Dec 2006

Shai Ben-David, John Blitzer +2 more

75% related