Open AccessPosted Content
WILDS: A Benchmark of in-the-Wild Distribution Shifts
Pang Wei Koh,Shiori Sagawa,Henrik Marklund,Sang Michael Xie,Marvin Zhang,Akshay Balsubramani,Weihua Hu,Michihiro Yasunaga,Richard Lanas Phillips,Irena Gao,Tony Lee,Etienne David,Ian Stavness,Wei Guo,Berton A. Earnshaw,Imran S. Haque,Sara Beery,Jure Leskovec,Anshul Kundaje,Emma Pierson,Sergey Levine,Chelsea Finn,Percy Liang +22 more
TLDR
WILDS is presented, a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, and is hoped to encourage the development of general-purpose methods that are anchored to real-world distribution shifts and that work well across different applications and problem settings.Abstract:
Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild. Despite their ubiquity, these real-world distribution shifts are under-represented in the datasets widely used in the ML community today. To address this gap, we present WILDS, a curated collection of 8 benchmark datasets that reflect a diverse range of distribution shifts which naturally arise in real-world applications, such as shifts across hospitals for tumor identification; across camera traps for wildlife monitoring; and across time and location in satellite imaging and poverty mapping. On each dataset, we show that standard training results in substantially lower out-of-distribution than in-distribution performance, and that this gap remains even with models trained by existing methods for handling distribution shifts. This underscores the need for new training methods that produce models which are more robust to the types of distribution shifts that arise in practice. To facilitate method development, we provide an open-source package that automates dataset loading, contains default model architectures and hyperparameters, and standardizes evaluations. Code and leaderboards are available at this https URL.read more
Citations
More filters
Integrative analysis of 111 reference human epigenomes
Anshul Kundaje,Wouter Meuleman,Jason Ernst,Angela Yen,Pouya Kheradpour,Zhizhuo Zhang,Jianrong Wang,Lucas D. Ward,Abhishek Sarkar,Gerald Quon,Matthew L. Eaton,Yi-Chieh Wu,Andreas R. Pfenning,Xinchen Wang,Melina Claussnitzer,Yaping Liu,Mukul S. Bansal,Soheil Feizi-Khankandi,Ah Ram Kim,Richard C Sallari,Nicholas A Sinnott-Armstrong,Laurie A. Boyer,Elizabeta Gjoneska,Li-Huei Tsai,Manolis Kellis +24 more
TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.
Posted Content
RobustBench: a standardized adversarial robustness benchmark.
Francesco Croce,Maksym Andriushchenko,Vikash Sehwag,Edoardo Debenedetti,Nicolas Flammarion,Mung Chiang,Prateek Mittal,Matthias Hein +7 more
TL;DR: This work evaluates robustness of models for their benchmark with AutoAttack, an ensemble of white- and black-box attacks which was recently shown in a large-scale study to improve almost all robustness evaluations compared to the original publications.
Proceedings ArticleDOI
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Mitchell Wortsman,Gabriel Ilharco,Samir Yitzhak Gadre,Rebecca Roelofs,Raphael Gontijo-Lopes,Ari S. Morcos,Hongseok Namkoong,A. Farhadi,Yair Carmon,Simon Kornblith,Ludwig Schmidt +10 more
TL;DR: The model soup approach extends to multiple image classification and natural language processing tasks, improves out-of-distribution performance, and improves zero-shot performance on new downstream tasks.
Proceedings Article
Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution
TL;DR: This paper showed that fine-tuning can achieve worse accuracy than linear probing out-of-distribution (OOD) when the pretrained features are good and the distribution shift is large.
Proceedings ArticleDOI
MSeg: A Composite Dataset for Multi-Domain Semantic Segmentation
TL;DR: This work adopts zero-shot cross-dataset transfer as a benchmark to systematically evaluate a model’s robustness and shows that MSeg training yields substantially more robust models in comparison to training on individual datasets or naive mixing of datasets without the presented contributions.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Book ChapterDOI
U-Net: Convolutional Networks for Biomedical Image Segmentation
TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Proceedings ArticleDOI
Densely Connected Convolutional Networks
TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion, which can alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
Journal ArticleDOI
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.