WILDS: A Benchmark of in-the-Wild Distribution Shifts

Open AccessPosted Content

WILDS: A Benchmark of in-the-Wild Distribution Shifts

- 14 Dec 2020 -

TLDR

WILDS is presented, a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, and is hoped to encourage the development of general-purpose methods that are anchored to real-world distribution shifts and that work well across different applications and problem settings.

Abstract:

Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild. Despite their ubiquity, these real-world distribution shifts are under-represented in the datasets widely used in the ML community today. To address this gap, we present WILDS, a curated collection of 8 benchmark datasets that reflect a diverse range of distribution shifts which naturally arise in real-world applications, such as shifts across hospitals for tumor identification; across camera traps for wildlife monitoring; and across time and location in satellite imaging and poverty mapping. On each dataset, we show that standard training results in substantially lower out-of-distribution than in-distribution performance, and that this gap remains even with models trained by existing methods for handling distribution shifts. This underscores the need for new training methods that produce models which are more robust to the types of distribution shifts that arise in practice. To facilitate method development, we provide an open-source package that automates dataset loading, contains default model architectures and hyperparameters, and standardizes evaluations. Code and leaderboards are available at this https URL.

WILDS: A Benchmark of in-the-Wild Distribution Shifts

Citations

Integrative analysis of 111 reference human epigenomes

RobustBench: a standardized adversarial robustness benchmark.

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution

MSeg: A Composite Dataset for Multi-Domain Semantic Segmentation

References

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

U-Net: Convolutional Networks for Biomedical Image Segmentation

Densely Connected Convolutional Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Related Papers (5)

Deep Residual Learning for Image Recognition

Invariant Risk Minimization

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

ImageNet: A large-scale hierarchical image database

Deep CORAL: Correlation Alignment for Deep Domain Adaptation