Open AccessProceedings Article
Contrastive Explanations for Model Interpretability
Alon Jacovi,Swabha Swayamdipta,Shauli Ravfogel,Yanai Elazar,Yejin Choi,Yoav Goldberg +5 more
- pp 1597-1611
Reads0
Chats0
TLDR
This paper proposed a method to produce contrastive explanations in the latent space, via a projection of the input representation, such that only the features that differentiate two potential decisions are captured, and demonstrated the ability of label-contrastive explanations to provide fine-grained interpretability of model decisions.Abstract:
Contrastive explanations clarify why an event occurred in contrast to another. They are inherently intuitive to humans to both produce and comprehend. We propose a method to produce contrastive explanations in the latent space, via a projection of the input representation, such that only the features that differentiate two potential decisions are captured. Our modification allows model behavior to consider only contrastive reasoning, and uncover which aspects of the input are useful for and against particular decisions. Our contrastive explanations can additionally answer for which label, and against which alternative label, is a given input feature useful. We produce contrastive explanations via both high-level abstract concept attribution and low-level input token/span attribution for two NLP classification benchmarks. Our findings demonstrate the ability of label-contrastive explanations to provide fine-grained interpretability of model decisions.read more
References
More filters
Posted Content
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu,Myle Ott,Naman Goyal,Jingfei Du,Mandar Joshi,Danqi Chen,Omer Levy,Michael Lewis,Luke Zettlemoyer,Veselin Stoyanov +9 more
TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
Proceedings Article
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
TL;DR: In this paper, the gradient of the class score with respect to the input image is computed to compute a class saliency map, which can be used for weakly supervised object segmentation using classification ConvNets.
Proceedings ArticleDOI
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
TL;DR: The Multi-Genre Natural Language Inference corpus is introduced, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding and shows that it represents a substantially more difficult task than does the Stanford NLI corpus.
Proceedings ArticleDOI
A large annotated corpus for learning natural language inference
TL;DR: The Stanford Natural Language Inference (SNLI) corpus as discussed by the authors is a large-scale collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning.
Journal ArticleDOI
Explanation in artificial intelligence: Insights from the social sciences
TL;DR: This paper argues that the field of explainable artificial intelligence should build on existing research, and reviews relevant papers from philosophy, cognitive psychology/science, and social psychology, which study these topics, and draws out some important findings.