scispace - formally typeset
D

David Alvarez-Melis

Researcher at Massachusetts Institute of Technology

Publications -  38
Citations -  2328

David Alvarez-Melis is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Interpretability & Nonlinear dimensionality reduction. The author has an hindex of 19, co-authored 36 publications receiving 1675 citations. Previous affiliations of David Alvarez-Melis include Microsoft.

Papers
More filters
Proceedings Article

Towards robust interpretability with self-explaining neural networks

TL;DR: This work designs self-explaining models in stages, progressively generalizing linear classifiers to complex yet architecturally explicit models, and proposes three desiderata for explanations in general – explicitness, faithfulness, and stability.
Posted Content

On the Robustness of Interpretability Methods

TL;DR: It is argued that robustness of explanations---i.e., that similar inputs should give rise to similar explanations---is a key desideratum for interpretability, and metrics to quantify robustness are introduced.
Proceedings ArticleDOI

Gromov-Wasserstein Alignment of Word Embedding Spaces

TL;DR: The authors cast the correspondence problem directly as an optimal transport (OT) problem, building on the idea that word embeddings arise from metric recovery algorithms, and exploit the Gromov-Wasserstein distance that measures how similarities between pairs of words relate across languages.
Proceedings ArticleDOI

A causal framework for explaining the predictions of black-box sequence-to-sequence models

TL;DR: The method returns an “explanation” consisting of groups of input-output tokens that are causally related that are inferred by querying the model with perturbed inputs, generating a graph over tokens from the responses, and solving a partitions problem to select the most relevant components.

On the Robustness of Interpretability Methods

TL;DR: The authors argue that robustness of explanations is a key desideratum for interpretability, and introduce metrics to quantify robustness and demonstrate that current methods do not perform well according to these metrics.