Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness?

doi:10.18653/V1/2020.ACL-MAIN.386

Open AccessProceedings ArticleDOI

Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness?

- pp 4198-4205

TLDR

The current binary definition for faithfulness sets a potentially unrealistic bar for being considered faithful, and is called for discarding the binary notion of faithfulness in favor of a more graded one, which is of greater practical utility.

Abstract:

With the growing popularity of deep-learning based NLP models, comes a need for interpretable systems. But what is interpretability, and what constitutes a high-quality interpretation? In this opinion piece we reflect on the current state of interpretability evaluation research. We call for more clearly differentiating between different desired criteria an interpretation should satisfy, and focus on the faithfulness criteria. We survey the literature with respect to faithfulness evaluation, and arrange the current approaches around three assumptions, providing an explicit form to how faithfulness is "defined" by the community. We provide concrete guidelines on how evaluation of interpretation methods should and should not be conducted. Finally, we claim that the current binary definition for faithfulness sets a potentially unrealistic bar for being considered faithful. We call for discarding the binary notion of faithfulness in favor of a more graded one, which we believe will be of greater practical utility.

Citations

PDF

Open Access

More filters

Posted Content

Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI

Alon Jacovi, +3 more

- 15 Oct 2020 -

arXiv: Artificial Intelligence

TL;DR: This work discusses a model of trust inspired by, but not identical to, interpersonal trust as defined by sociologists, and incorporates a formalization of 'contractual trust', such that trust between a user and an AI model is trust that some implicit or explicit contract will hold.

...read moreread less

Proceedings ArticleDOI

Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI

Alon Jacovi, +3 more

TL;DR: In this paper, the authors discuss a model of trust inspired by sociologists' notion of interpersonal trust (i.e., trust between people) and discuss how to design trustworthy AI, how to evaluate whether trust has manifested, and whether it is warranted.

...read moreread less

Posted Content

Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking

Michael Sejr Schlichtkrull, +2 more

- 01 Oct 2020 -

arXiv: Computation and Language

TL;DR: This work introduces a post-hoc method for interpreting the predictions of GNNs which identifies unnecessary edges and uses this technique as an attribution method to analyze GNN models for two tasks -- question answering and semantic role labeling -- providing insights into the information flow in these models.

...read moreread less

Posted Content

On the Opportunities and Risks of Foundation Models.

Rishi Bommasani, +113 more

- 16 Aug 2021 -

arXiv: Learning

TL;DR: The authors provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e. g.g. model architectures, training procedures, data, systems, security, evaluation, theory) to their applications.

...read moreread less

Posted Content

Influence Functions in Deep Learning Are Fragile

Samyadeep Basu, +2 more

- 25 Jun 2020 -

arXiv: Learning

TL;DR: It is suggested that in general influence functions in deep learning are fragile and call for developing improved influence estimation methods to mitigate these issues in non-convex setups.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

Marco Tulio Ribeiro, +2 more

TL;DR: In this article, the authors propose LIME, a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem.

...read moreread less

Proceedings Article

A unified approach to interpreting model predictions

Scott M. Lundberg, +1 more

TL;DR: In this article, a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), is presented, which assigns each feature an importance value for a particular prediction.

...read moreread less

Journal ArticleDOI

A Survey of Methods for Explaining Black Box Models

Riccardo Guidotti, +5 more

- 22 Aug 2018 -

ACM Computing Surveys

TL;DR: In this paper, the authors provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box decision support systems, given a problem definition, a black box type, and a desired explanation, this survey should help the researcher to find the proposals more useful for his own work.

...read moreread less

Posted Content

Towards A Rigorous Science of Interpretable Machine Learning

Finale Doshi-Velez, +1 more

- 28 Feb 2017 -

arXiv: Machine Learning

TL;DR: This position paper defines interpretability and describes when interpretability is needed (and when it is not), and suggests a taxonomy for rigorous evaluation and exposes open questions towards a more rigorous science of interpretable machine learning.

...read moreread less

Journal ArticleDOI

The mythos of model interpretability

Zachary C. Lipton

- 26 Sep 2018 -

Communications of The ACM

TL;DR: In machine learning, the concept of interpretability is both important and slippery, so it is important to understand how these concepts can be modified.

...read moreread less

Collapse

Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness?

Citations

Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI

Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI

Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking

On the Opportunities and Risks of Foundation Models.

Influence Functions in Deep Learning Are Fragile

References

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

A unified approach to interpreting model predictions

A Survey of Methods for Explaining Black Box Models

Towards A Rigorous Science of Interpretable Machine Learning

The mythos of model interpretability

Related Papers (5)

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

Towards A Rigorous Science of Interpretable Machine Learning

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Axiomatic attribution for deep networks