Underspecification Presents Challenges for Credibility in Modern Machine Learning

Open AccessPosted Content

Underspecification Presents Challenges for Credibility in Modern Machine Learning

Alexander D'Amour, +39 more

- 06 Nov 2020 -

arXiv: Learning

Chats0

TLDR

This work shows the need to explicitly account for underspecification in modeling pipelines that are intended for real-world deployment in any domain, and shows that this problem appears in a wide variety of practical ML pipelines.

Abstract:

ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predictors returned by underspecified pipelines are often treated as equivalent based on their training domain performance, but we show here that such predictors can behave very differently in deployment domains. This ambiguity can lead to instability and poor model behavior in practice, and is a distinct failure mode from previously identified issues arising from structural mismatch between training and deployment domains. We show that this problem appears in a wide variety of practical ML pipelines, using examples from computer vision, medical imaging, natural language processing, clinical risk prediction based on electronic health records, and medical genomics. Our results show the need to explicitly account for underspecification in modeling pipelines that are intended for real-world deployment in any domain.

Underspecification Presents Challenges for Credibility in Modern Machine Learning

Citations

5分で分かる!? 有名論文ナナメ読み：Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

WILDS: A Benchmark of in-the-Wild Distribution Shifts

Learning Transferable Visual Models From Natural Language Supervision

Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges

References

Deep Residual Learning for Image Recognition

Long short-term memory

ImageNet: A large-scale hierarchical image database

PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Related Papers (5)

Deep Residual Learning for Image Recognition

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

Deep learning

Learning Multiple Layers of Features from Tiny Images

ImageNet: A large-scale hierarchical image database