Open AccessPosted Content
Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization
John J. Miller,Rohan Taori,Aditi Raghunathan,Shiori Sagawa,Pang Wei Koh,Vaishaal Shankar,Percy Liang,Yair Carmon,Ludwig Schmidt +8 more
Reads0
Chats0
TLDR
In this article, the authors empirically show that out-of-distribution performance is strongly correlated with the performance of a wide range of models and distribution shifts and provide a candidate theory based on a Gaussian data model that shows how changes in the data covariance arising from distribution shift can affect the observed correlations.Abstract:
For machine learning systems to be reliable, we must understand their performance in unseen, out-of-distribution environments. In this paper, we empirically show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet, a synthetic pose estimation task derived from YCB objects, satellite imagery classification in FMoW-WILDS, and wildlife classification in iWildCam-WILDS. The strong correlations hold across model architectures, hyperparameters, training set size, and training duration, and are more precise than what is expected from existing domain adaptation theory. To complete the picture, we also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS. Finally, we provide a candidate theory based on a Gaussian data model that shows how changes in the data covariance arising from distribution shift can affect the observed correlations.read more
Citations
More filters
Posted Content
CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP
Andreas Fürst,Elisabeth Rumetshofer,Viet Hung Tran,Hubert Ramsauer,Fei Tang,Johannes M. Lehner,David P. Kreil,Michael K Kopp,Günter Klambauer,Angela Bitto-Nemling,Sepp Hochreiter +10 more
TL;DR: This article proposed contrastive leave-one-out boost (CLOOB) which replaces the original embedding by retrieved embeddings in the InfoLOOB objective, which stabilizes the Info-Lob objective.
Posted Content
On a Benefit of Mask Language Modeling: Robustness to Simplicity Bias.
TL;DR: The authors theoretically and empirically show that MLM pretraining makes models robust to lexicon-level spurious features, and they also explore the efficacy of pretrained masked language models in causal settings.
Proceedings ArticleDOI
On the Robustness of Reading Comprehension Models to Entity Renaming
TL;DR: Yan, Yang Xiao, Sagnik Mukherjee, Bill Yuchen Lin, Robin Jia, Xiang Ren as mentioned in this paper , 2019 Conference of the Association for Computational Linguistics: Human Language Technologies.
Posted Content
On the Robustness of Reading Comprehension Models to Entity Renaming.
TL;DR: The authors proposed a general and scalable method to replace person names with names from a variety of sources, ranging from common English names to names from other languages to arbitrary strings, and found that this can further improve the robustness of MRC models.
References
More filters
Posted Content
Do Image Classifiers Generalize Across Time
TL;DR: This work systematically analyzed the robustness of image classifiers to temporal perturbations in videos to construct two new datasets, ImageNet-Vid-Robust and YTBB-Rob Strong, containing a total of 57,897 images grouped into 3,139 sets of perceptually similar images.
Proceedings Article
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford,Jong Wook Kim,Chris Hallacy,Aditya Ramesh,Gabriel Goh,Sandhini Agarwal,Girish Sastry,Amanda Askell,Pamela Mishkin,Jack Clark,Gretchen Krueger,Ilya Sutskever +11 more
TL;DR: In this paper, a pre-training task of predicting which caption goes with which image is used to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet.
Proceedings Article
In Search of Lost Domain Generalization
Ishaan Gulrajani,David Lopez-Paz +1 more
TL;DR: DomainBed as mentioned in this paper is a testbed for domain generalization including seven benchmarks, fourteen algorithms, and three model selection criteria, and when carefully implemented and tuned, ERM outperforms the state-of-the-art in terms of average performance.
Proceedings Article
Cold Case: The Lost MNIST Digits
Chhavi Yadav,Léon Bottou +1 more
TL;DR: In this article, the authors reconstruct the MNIST dataset from its NIST source and its rich metadata such as writer identifier, partition identifier, etc., and reconstruct the complete MNIST test set with 60,000 samples instead of the usual 10,000, and investigate the impact of twenty-five years of MNIST experiments on the reported testing performances.
Proceedings Article
WILDS: A Benchmark of in-the-Wild Distribution Shifts
Pang Wei Koh,Shiori Sagawa,Henrik Marklund,Sang Michael Xie,Marvin Zhang,Akshay Balsubramani,Weihua Hu,Michihiro Yasunaga,Richard Lanas Phillips,Irena Gao,Tony Lee,Etienne David,Ian Stavness,Wei Guo,Berton A. Earnshaw,Imran S. Haque,Sara Beery,Jure Leskovec,Anshul Kundaje,Emma Pierson,Sergey Levine,Chelsea Finn,Percy Liang +22 more
TL;DR: WILDS as mentioned in this paper is a curated collection of 8 benchmark datasets that reflect a diverse range of distribution shifts which naturally arise in real-world applications, such as shifts across hospitals for tumor identification; across camera traps for wildlife monitoring; and across time and location in satellite imaging and poverty mapping.