scispace - formally typeset
Open AccessProceedings Article

Contrastive Explanations for Model Interpretability

Reads0
Chats0
TLDR
This paper proposed a method to produce contrastive explanations in the latent space, via a projection of the input representation, such that only the features that differentiate two potential decisions are captured, and demonstrated the ability of label-contrastive explanations to provide fine-grained interpretability of model decisions.
Abstract
Contrastive explanations clarify why an event occurred in contrast to another. They are inherently intuitive to humans to both produce and comprehend. We propose a method to produce contrastive explanations in the latent space, via a projection of the input representation, such that only the features that differentiate two potential decisions are captured. Our modification allows model behavior to consider only contrastive reasoning, and uncover which aspects of the input are useful for and against particular decisions. Our contrastive explanations can additionally answer for which label, and against which alternative label, is a given input feature useful. We produce contrastive explanations via both high-level abstract concept attribution and low-level input token/span attribution for two NLP classification benchmarks. Our findings demonstrate the ability of label-contrastive explanations to provide fine-grained interpretability of model decisions.

read more

References
More filters
Posted Content

RoBERTa: A Robustly Optimized BERT Pretraining Approach

TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
Proceedings Article

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

TL;DR: In this paper, the gradient of the class score with respect to the input image is computed to compute a class saliency map, which can be used for weakly supervised object segmentation using classification ConvNets.
Proceedings ArticleDOI

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

TL;DR: The Multi-Genre Natural Language Inference corpus is introduced, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding and shows that it represents a substantially more difficult task than does the Stanford NLI corpus.
Proceedings ArticleDOI

A large annotated corpus for learning natural language inference

TL;DR: The Stanford Natural Language Inference (SNLI) corpus as discussed by the authors is a large-scale collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning.
Journal ArticleDOI

Explanation in artificial intelligence: Insights from the social sciences

TL;DR: This paper argues that the field of explainable artificial intelligence should build on existing research, and reviews relevant papers from philosophy, cognitive psychology/science, and social psychology, which study these topics, and draws out some important findings.
Related Papers (5)