Contrastive Explanations for Model Interpretability

Open AccessProceedings Article

Contrastive Explanations for Model Interpretability

Alon Jacovi, +5 more

- pp 1597-1611

Chats0

TLDR

This paper proposed a method to produce contrastive explanations in the latent space, via a projection of the input representation, such that only the features that differentiate two potential decisions are captured, and demonstrated the ability of label-contrastive explanations to provide fine-grained interpretability of model decisions.

Abstract:

Contrastive explanations clarify why an event occurred in contrast to another. They are inherently intuitive to humans to both produce and comprehend. We propose a method to produce contrastive explanations in the latent space, via a projection of the input representation, such that only the features that differentiate two potential decisions are captured. Our modification allows model behavior to consider only contrastive reasoning, and uncover which aspects of the input are useful for and against particular decisions. Our contrastive explanations can additionally answer for which label, and against which alternative label, is a given input feature useful. We produce contrastive explanations via both high-level abstract concept attribution and low-level input token/span attribution for two NLP classification benchmarks. Our findings demonstrate the ability of label-contrastive explanations to provide fine-grained interpretability of model decisions.

References

PDF

Open Access

More filters

Posted Content

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, +9 more

- 26 Jul 2019 -

arXiv: Computation and Language

TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

...read moreread less

Proceedings Article

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, +2 more

TL;DR: In this paper, the gradient of the class score with respect to the input image is computed to compute a class saliency map, which can be used for weakly supervised object segmentation using classification ConvNets.

...read moreread less

Proceedings ArticleDOI

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

Adina Williams, +2 more

TL;DR: The Multi-Genre Natural Language Inference corpus is introduced, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding and shows that it represents a substantially more difficult task than does the Stanford NLI corpus.

...read moreread less

Proceedings ArticleDOI

A large annotated corpus for learning natural language inference

Samuel R. Bowman, +3 more

TL;DR: The Stanford Natural Language Inference (SNLI) corpus as discussed by the authors is a large-scale collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning.

...read moreread less

Journal ArticleDOI

Explanation in artificial intelligence: Insights from the social sciences

Tim Miller

- 01 Feb 2019 -

Artificial Intelligence

TL;DR: This paper argues that the field of explainable artificial intelligence should build on existing research, and reviews relevant papers from philosophy, cognitive psychology/science, and social psychology, which study these topics, and draws out some important findings.

...read moreread less