scispace - formally typeset
Open AccessProceedings ArticleDOI

Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness?

TLDR
The current binary definition for faithfulness sets a potentially unrealistic bar for being considered faithful, and is called for discarding the binary notion of faithfulness in favor of a more graded one, which is of greater practical utility.
Abstract
With the growing popularity of deep-learning based NLP models, comes a need for interpretable systems. But what is interpretability, and what constitutes a high-quality interpretation? In this opinion piece we reflect on the current state of interpretability evaluation research. We call for more clearly differentiating between different desired criteria an interpretation should satisfy, and focus on the faithfulness criteria. We survey the literature with respect to faithfulness evaluation, and arrange the current approaches around three assumptions, providing an explicit form to how faithfulness is "defined" by the community. We provide concrete guidelines on how evaluation of interpretation methods should and should not be conducted. Finally, we claim that the current binary definition for faithfulness sets a potentially unrealistic bar for being considered faithful. We call for discarding the binary notion of faithfulness in favor of a more graded one, which we believe will be of greater practical utility.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI

TL;DR: This work discusses a model of trust inspired by, but not identical to, interpersonal trust as defined by sociologists, and incorporates a formalization of 'contractual trust', such that trust between a user and an AI model is trust that some implicit or explicit contract will hold.
Proceedings ArticleDOI

Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI

TL;DR: In this paper, the authors discuss a model of trust inspired by sociologists' notion of interpersonal trust (i.e., trust between people) and discuss how to design trustworthy AI, how to evaluate whether trust has manifested, and whether it is warranted.
Posted Content

Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking

TL;DR: This work introduces a post-hoc method for interpreting the predictions of GNNs which identifies unnecessary edges and uses this technique as an attribution method to analyze GNN models for two tasks -- question answering and semantic role labeling -- providing insights into the information flow in these models.
Posted Content

On the Opportunities and Risks of Foundation Models.

Rishi Bommasani, +113 more
- 16 Aug 2021 - 
TL;DR: The authors provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e. g.g. model architectures, training procedures, data, systems, security, evaluation, theory) to their applications.
Posted Content

Influence Functions in Deep Learning Are Fragile

TL;DR: It is suggested that in general influence functions in deep learning are fragile and call for developing improved influence estimation methods to mitigate these issues in non-convex setups.
References
More filters
Proceedings ArticleDOI

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

TL;DR: In this article, the authors propose LIME, a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem.
Proceedings Article

A unified approach to interpreting model predictions

TL;DR: In this article, a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), is presented, which assigns each feature an importance value for a particular prediction.
Journal ArticleDOI

A Survey of Methods for Explaining Black Box Models

TL;DR: In this paper, the authors provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box decision support systems, given a problem definition, a black box type, and a desired explanation, this survey should help the researcher to find the proposals more useful for his own work.
Posted Content

Towards A Rigorous Science of Interpretable Machine Learning

TL;DR: This position paper defines interpretability and describes when interpretability is needed (and when it is not), and suggests a taxonomy for rigorous evaluation and exposes open questions towards a more rigorous science of interpretable machine learning.
Journal ArticleDOI

The mythos of model interpretability

TL;DR: In machine learning, the concept of interpretability is both important and slippery, so it is important to understand how these concepts can be modified.