Top 5 papers published by Been Kim from Google in 2020

Posted Content•

[...]

Pang Wei Koh¹, Thao Nguyen², Yew Siang Tang¹, Stephen Mussmann¹, Emma Pierson¹, Been Kim², Percy Liang¹ - Show less +3 more•Institutions (2)

Stanford University¹, Google²

09 Jul 2020-arXiv: Learning

TL;DR: On x-ray grading and bird identification, concept bottleneck models achieve competitive accuracy with standard end-to-end models, while enabling interpretation in terms of high-level clinical concepts (“bone spurs”) or bird attributes ( “wing color”).

...read moreread less

Abstract: We seek to learn models that we can interact with using high-level concepts: if the model did not think there was a bone spur in the x-ray, would it still predict severe arthritis? State-of-the-art models today do not typically support the manipulation of concepts like "the existence of bone spurs", as they are trained end-to-end to go directly from raw input (e.g., pixels) to output (e.g., arthritis severity). We revisit the classic idea of first predicting concepts that are provided at training time, and then using these concepts to predict the label. By construction, we can intervene on these concept bottleneck models by editing their predicted concept values and propagating these changes to the final prediction. On x-ray grading and bird identification, concept bottleneck models achieve competitive accuracy with standard end-to-end models, while enabling interpretation in terms of high-level clinical concepts ("bone spurs") or bird attributes ("wing color"). These models also allow for richer human-model interaction: accuracy improves significantly if we can correct model mistakes on concepts at test time.

...read moreread less

226 citations

Proceedings Article•

On Completeness-aware Concept-Based Explanations in Deep Neural Networks

[...]

Chih-Kuan Yeh¹, Been Kim², Sercan O. Arik², Chun-Liang Li¹, Tomas Pfister², Pradeep Ravikumar¹ - Show less +2 more•Institutions (2)

Carnegie Mellon University¹, Google²

01 Jan 2020

TL;DR: The notion of completeness is defined, which quantifies how sufficient a particular set of concepts is in explaining a model's prediction behavior based on the assumption that complete concept scores are sufficient statistics of the model prediction.

...read moreread less

Abstract: Human explanations of high-level decisions are often expressed in terms of key concepts the decisions are based on. In this paper, we study such concept-based explainability for Deep Neural Networks (DNNs). First, we define the notion of completeness, which quantifies how sufficient a particular set of concepts is in explaining a model's prediction behavior based on the assumption that complete concept scores are sufficient statistics of the model prediction. Next, we propose a concept discovery method that aims to infer a complete set of concepts that are additionally encouraged to be interpretable, which addresses the limitations of existing methods on concept explanations. To define an importance score for each discovered concept, we adapt game-theoretic notions to aggregate over sets and propose ConceptSHAP. Via proposed metrics and user studies, on a synthetic dataset with apriori-known concept explanations, as well as on real-world image and language datasets, we validate the effectiveness of our method in finding concepts that are both complete in explaining the decisions and interpretable. (The code is released at this https URL)

...read moreread less

124 citations

Proceedings Article•

Debugging Tests for Model Explanations

[...]

Julius Adebayo¹, Michael Muelly², Ilaria Liccardi¹, Been Kim²•Institutions (2)

Massachusetts Institute of Technology¹, Google²

01 Jan 2020

TL;DR: It is found that subjects fail to identify defective models using attributions, but instead rely, primarily, on model predictions.

...read moreread less

Abstract: We investigate whether post-hoc model explanations are effective for diagnosing model errors--model debugging. In response to the challenge of explaining a model's prediction, a vast array of explanation methods have been proposed. Despite increasing use, it is unclear if they are effective. To start, we categorize \textit{bugs}, based on their source, into:~\textit{data, model, and test-time} contamination bugs. For several explanation methods, we assess their ability to: detect spurious correlation artifacts (data contamination), diagnose mislabeled training examples (data contamination), differentiate between a (partially) re-initialized model and a trained one (model contamination), and detect out-of-distribution inputs (test-time contamination). We find that the methods tested are able to diagnose a spurious background bug, but not conclusively identify mislabeled training examples. In addition, a class of methods, that modify the back-propagation algorithm are invariant to the higher layer parameters of a deep network; hence, ineffective for diagnosing model contamination. We complement our analysis with a human subject study, and find that subjects fail to identify defective models using attributions, but instead rely, primarily, on model predictions. Taken together, our results provide guidance for practitioners and researchers turning to explanations as tools for model debugging.

...read moreread less

92 citations

Proceedings Article•

Concept Bottleneck Models

[...]

Pang Wei Koh¹, Thao Nguyen², Yew Siang Tang¹, Stephen Mussmann¹, Emma Pierson¹, Been Kim², Percy Liang¹ - Show less +3 more•Institutions (2)

Stanford University¹, Google²

12 Jul 2020

TL;DR: In this paper, a concept bottleneck model is proposed for x-ray grading and bird identification, where the concept values are predicted at training time and then used to predict the label at test time.

...read moreread less

Abstract: We seek to learn models that we can interact with using high-level concepts: if the model did not think there was a bone spur in the x-ray, would it still predict severe arthritis? State-of-the-art models today do not typically support the manipulation of concepts like “the existence of bone spurs”, as they are trained end-to-end to go directly from raw input (e.g., pixels) to output (e.g., arthritis severity). We revisit the classic idea of first predicting concepts that are provided at training time, and then using these concepts to predict the label. By construction, we can intervene on these concept bottleneck models by editing their predicted concept values and propagating these changes to the final prediction. On x-ray grading and bird identification, concept bottleneck models achieve competitive accuracy with standard end-to-end models, while enabling interpretation in terms of high-level clinical concepts (“bone spurs”) or bird attributes (“wing color”). These models also allow for richer human-model interaction: accuracy improves significantly if we can correct model mistakes on concepts at test time.

...read moreread less

50 citations

Posted Content•

Debugging Tests for Model Explanations

[...]

Julius Adebayo¹, Michael Muelly², Ilaria Liccardi¹, Been Kim²•Institutions (2)

Massachusetts Institute of Technology¹, Google²

10 Nov 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors investigate whether post-hoc model explanations are effective for diagnosing model errors, and find that they are ineffective in detecting spurious correlation artifacts and mislabeled training examples.

...read moreread less

Abstract: We investigate whether post-hoc model explanations are effective for diagnosing model errors--model debugging. In response to the challenge of explaining a model's prediction, a vast array of explanation methods have been proposed. Despite increasing use, it is unclear if they are effective. To start, we categorize \textit{bugs}, based on their source, into:~\textit{data, model, and test-time} contamination bugs. For several explanation methods, we assess their ability to: detect spurious correlation artifacts (data contamination), diagnose mislabeled training examples (data contamination), differentiate between a (partially) re-initialized model and a trained one (model contamination), and detect out-of-distribution inputs (test-time contamination). We find that the methods tested are able to diagnose a spurious background bug, but not conclusively identify mislabeled training examples. In addition, a class of methods, that modify the back-propagation algorithm are invariant to the higher layer parameters of a deep network; hence, ineffective for diagnosing model contamination. We complement our analysis with a human subject study, and find that subjects fail to identify defective models using attributions, but instead rely, primarily, on model predictions. Taken together, our results provide guidance for practitioners and researchers turning to explanations as tools for model debugging.

...read moreread less

13 citations

Showing papers by "Been Kim published in 2020"