Towards Ground Truth Evaluation of Visual Explanations

Citations

PDF

Open Access

More filters

Posted Content•

Toward Interpretable Machine Learning: Transparent Deep Neural Networks and Beyond

[...]

Wojciech Samek, Grégoire Montavon, Sebastian Lapuschkin, Christopher J. Anders, Klaus-Robert Müller - Show less +1 more

17 Mar 2020-arXiv: Learning

TL;DR: This work aims to provide a timely overview of this active emerging field of machine learning and explain its theoretical foundations, put interpretability algorithms to a test both from a theory and comparative evaluation perspective using extensive simulations, and outline best practice aspects.

...read moreread less

Abstract: With the broader and highly successful usage of machine learning in industry and the sciences, there has been a growing demand for explainable AI. Interpretability and explanation methods for gaining a better understanding about the problem solving abilities and strategies of nonlinear Machine Learning such as Deep Learning (DL), LSTMs, and kernel methods are therefore receiving increased attention. In this work we aim to (1) provide a timely overview of this active emerging field and explain its theoretical foundations, (2) put interpretability algorithms to a test both from a theory and comparative evaluation perspective using extensive simulations, (3) outline best practice aspects i.e. how to best include interpretation methods into the standard usage of machine learning and (4) demonstrate successful usage of explainable AI in a representative selection of application scenarios. Finally, we discuss challenges and possible future directions of this exciting foundational field of machine learning.

...read moreread less

75 citations

Proceedings Article•

Evaluating Attribution for Graph Neural Networks

[...]

Benjamin Sanchez-Lengeling¹, Jennifer N. Wei¹, Brian Lee², Emily Reif², Peter Y. Wang³, Wesley Wei Qian⁴, Kevin McCloskey², Lucy J. Colwell⁵, Alexander B. Wiltschko² - Show less +5 more•Institutions (5)

Harvard University¹, Google², Columbia University³, University of Illinois at Urbana–Champaign⁴, University of Cambridge⁵

01 Jan 2020

TL;DR: This work adapt commonly-used attribution methods for GNNs and quantitatively evaluate them using the axes of attribution accuracy, stability, faithfulness and consistency, and makes concrete recommendations for which attribution methods to use.

...read moreread less

Abstract: Interpretability of machine learning models is critical to scientific understanding, AI safety, and debugging. Attribution is one approach to interpretability, which highlights input dimensions that are influential to a neural network’s prediction. Evaluation of these methods is largely qualitative for image and text models, because acquiring ground truth attributions requires expensive and unreliable human judgment. Attribution has been comparatively understudied for graph neural networks (GNNs), a model class of growing importance that makes predictions on arbitrarily-sized graphs. Graph-valued data offer an opportunity to quantitatively benchmark attribution methods, because challenging synthetic graph problems have computable ground-truth attributions. In this work we adapt commonly-used attribution methods for GNNs and quantitatively evaluate them using the axes of attribution accuracy, stability, faithfulness and consistency. We make concrete recommendations for which attribution methods to use, and provide the data and code for our benchmarking suite. Rigorous and open source benchmarking of attribution methods in graphs could enable new methods development and broader use of attribution in real-world ML tasks.

...read moreread less

51 citations

Cites background from "Towards Ground Truth Evaluation of ..."

...Efforts to quantify the utility of attribution methods or apply sanity checks have been undertaken in input domains where human intuition is usually used to evaluate attribution quality [4, 50, 7, 32, 21, 5], like images and text....
[...]

Posted Content•

Deep weakly-supervised learning methods for classification and localization in histology images: a survey.

[...]

Jérôme Rony, Soufiane Belharbi, Jose Dolz, Ismail Ben Ayed, Luke McCaffrey, Eric Granger - Show less +2 more

08 Sep 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: Results indicate that several deep learning models, and in particular WILDCAT and deep MIL can provide a high level of classification accuracy, although pixel-wise localization of cancer regions remains an issue for such images.

...read moreread less

Abstract: Using state-of-the-art deep learning models for cancer diagnosis presents several challenges related to the nature and availability of labeled histology images. In particular, cancer grading and localization in these images normally relies on both image- and pixel-level labels, the latter requiring a costly annotation process. In this survey, deep weakly-supervised learning (WSL) models are investigated to identify and locate diseases in histology images, without the need for pixel-level annotations. Given training data with global image-level labels, these models allow to simultaneously classify histology images and yield pixel-wise localization scores, thereby identifying the corresponding regions of interest (ROI). Since relevant WSL models have mainly been investigated within the computer vision community, and validated on natural scene images, we assess the extent to which they apply to histology images which have challenging properties, e.g. very large size, similarity between foreground/background, highly unstructured regions, stain heterogeneity, and noisy/ambiguous labels. The most relevant models for deep WSL are compared experimentally in terms of accuracy (classification and pixel-wise localization) on several public benchmark histology datasets for breast and colon cancer -- BACH ICIAR 2018, BreaKHis, CAMELYON16, and GlaS. Furthermore, for large-scale evaluation of WSL models on histology images, we propose a protocol to construct WSL datasets from Whole Slide Imaging. Results indicate that several deep learning models can provide a high level of classification accuracy, although accurate pixel-wise localization of cancer regions remains an issue for such images. Code is publicly available.

...read moreread less

48 citations

Posted Content•

Deep Interpretable Classification and Weakly-Supervised Segmentation of Histology Images via Max-Min Uncertainty

[...]

Soufiane Belharbi, Jérôme Rony, Jose Dolz, Ismail Ben Ayed, Luke McCaffrey¹, Eric Granger - Show less +2 more•Institutions (1)

McGill University¹

14 Nov 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: High uncertainty is introduced as a criterion to localize non-discriminative regions that do not affect classifier decision, and is described with original Kullback-Leibler (KL) divergence losses evaluating the deviation of posterior predictions from the uniform distribution.

...read moreread less

Abstract: Weakly supervised learning (WSL) has recently triggered substantial interest as it mitigates the lack of pixel-wise annotations, while enabling interpretable models. Given global image labels, WSL methods yield pixel-level predictions (segmentations). Despite their recent success, mostly with natural images, such methods could be seriously challenged when the foreground and background regions have similar visual cues, yielding high false-positive rates in segmentations, as is the case of challenging histology images. WSL training is commonly driven by standard classification losses, which implicitly maximize model confidence and find the discriminative regions linked to classification decisions. Therefore, they lack mechanisms for modeling explicitly non-discriminative regions and reducing false-positive rates. We propose new regularization terms, which enable the model to seek both non-discriminative and discriminative regions, while discouraging unbalanced segmentations. We introduce high uncertainty as a criterion to localize non-discriminative regions that do not affect classifier decision, and describe it with original Kullback-Leibler (KL) divergence losses evaluating the deviation of posterior predictions from the uniform distribution. Our KL terms encourage high uncertainty of the model when the latter takes the latent non-discriminative regions as input. Our loss integrates: (i) a cross-entropy seeking a foreground, where model confidence about class prediction is high; (ii) a KL regularizer seeking a background, where model uncertainty is high; and (iii) log-barrier terms discouraging unbalanced segmentations. Comprehensive experiments and ablation studies over the public GlaS colon cancer data show substantial improvements over state-of-the-art WSL methods, and confirm the effect of our new regularizers. Our code is publicly available.

...read moreread less

34 citations

Cites background from "Towards Ground Truth Evaluation of ..."

...It is worth noting that such interpretability aspects are also attracting wide interest in computer vision (Bach et al., 2015; Bau et al., 2017; Bhatt et al., 2020; Dabkowski and Gal, 2017; Escalante et al., 2018; Fong et al., 2019; Fong and Vedaldi, 2017; Goh et al., 2020; Osman et al., 2020; Murdoch et al., 2019; Petsiuk et al., 2020; 2018; Ribeiro et al., 2016; Samek et al., 2020; 2017; Zhang et al., 2020; Belharbi et al., 2021) and medical imaging (de La Torre et al....
[...]

Posted Content•

Evaluating and Aggregating Feature-based Model Explanations

[...]

Umang Bhatt¹, Umang Bhatt², Adrian Weller¹, Adrian Weller³, José M. F. Moura² - Show less +1 more•Institutions (3)

University of Cambridge¹, Carnegie Mellon University², The Turing Institute³

01 May 2020-arXiv: Learning

TL;DR: This paper develops a procedure for learning an aggregate explanation function with lower complexity and then derive a new aggregate Shapley value explanation function that minimizes sensitivity.

...read moreread less

Abstract: A feature-based model explanation denotes how much each input feature contributes to a model's output for a given data point. As the number of proposed explanation functions grows, we lack quantitative evaluation criteria to help practitioners know when to use which explanation function. This paper proposes quantitative evaluation criteria for feature-based explanations: low sensitivity, high faithfulness, and low complexity. We devise a framework for aggregating explanation functions. We develop a procedure for learning an aggregate explanation function with lower complexity and then derive a new aggregate Shapley value explanation function that minimizes sensitivity.

...read moreread less

25 citations

Cites methods from "Towards Ground Truth Evaluation of ..."

...Note we omit evaluation criteria that assume access to ground-truth explanations for training points; for a thorough treatment on this topic, see [Hind et al., 2019; Osman et al., 2020]....
[...]

Towards Ground Truth Evaluation of Visual Explanations

Citations

Cites background from "Towards Ground Truth Evaluation of ..."

Cites background from "Towards Ground Truth Evaluation of ..."

Cites methods from "Towards Ground Truth Evaluation of ..."

References

"Towards Ground Truth Evaluation of ..." refers methods in this paper

Related Papers (5)

Trending Questions (1)