A unified approach to interpreting model predictions

Home
/
Papers
/
A unified approach to interpreting model predictions

Proceedings Article•

A unified approach to interpreting model predictions

Scott M. Lundberg¹, Su-In Lee¹•Institutions (1)

04 Dec 2017-Vol. 30, pp 4768-4777

TL;DR: In this article, a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), is presented, which assigns each feature an importance value for a particular prediction.

read less

Abstract: Understanding why a model makes a certain prediction can be as crucial as the prediction's accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we present a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent methods in the class lack the proposed desirable properties. Based on insights from this unification, we present new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

From Local Explanations to Global Understanding with Explainable AI for Trees.

[...]

Scott M. Lundberg¹, Scott M. Lundberg², Gabriel G. Erion², Hugh Chen², Alex J. DeGrave², Jordan M. Prutkin², Bala G. Nair², Ronit Katz², Jonathan Himmelfarb², Nisha Bansal², Su-In Lee² - Show less +7 more•Institutions (2)

Microsoft¹, University of Washington²

17 Jan 2020-Nature Machine Intelligence

TL;DR: An explanation method for trees is presented that enables the computation of optimal local explanations for individual predictions, and the authors demonstrate their method on three medical datasets.

...read moreread less

Abstract: Tree-based machine learning models such as random forests, decision trees and gradient boosted trees are popular nonlinear predictive models, yet comparatively little attention has been paid to explaining their predictions. Here we improve the interpretability of tree-based models through three main contributions. (1) A polynomial time algorithm to compute optimal explanations based on game theory. (2) A new type of explanation that directly measures local feature interaction effects. (3) A new set of tools for understanding global model structure based on combining many local explanations of each prediction. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to (1) identify high-magnitude but low-frequency nonlinear mortality risk factors in the US population, (2) highlight distinct population subgroups with shared risk characteristics, (3) identify nonlinear interaction effects among risk factors for chronic kidney disease and (4) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model’s performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains. Tree-based machine learning models are widely used in domains such as healthcare, finance and public services. The authors present an explanation method for trees that enables the computation of optimal local explanations for individual predictions, and demonstrate their method on three medical datasets.

...read moreread less

2,548 citations

Journal Article•DOI•

Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)

[...]

Amina Adadi¹, Mohammed Berrada¹•Institutions (1)

SIDI¹

17 Sep 2018-IEEE Access

TL;DR: This survey provides an entry point for interested researchers and practitioners to learn key aspects of the young and rapidly growing body of research related to XAI, and review the existing approaches regarding the topic, discuss trends surrounding its sphere, and present major research trajectories.

...read moreread less

Abstract: At the dawn of the fourth industrial revolution, we are witnessing a fast and widespread adoption of artificial intelligence (AI) in our daily life, which contributes to accelerating the shift towards a more algorithmic society. However, even with such unprecedented advancements, a key impediment to the use of AI-based systems is that they often lack transparency. Indeed, the black-box nature of these systems allows powerful predictions, but it cannot be directly explained. This issue has triggered a new debate on explainable AI (XAI). A research field holds substantial promise for improving trust and transparency of AI-based systems. It is recognized as the sine qua non for AI to continue making steady progress without disruption. This survey provides an entry point for interested researchers and practitioners to learn key aspects of the young and rapidly growing body of research related to XAI. Through the lens of the literature, we review the existing approaches regarding the topic, discuss trends surrounding its sphere, and present major research trajectories.

...read moreread less

2,258 citations

Posted Content•

Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI.

[...]

Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez¹, Javier Del Ser², Javier Del Ser³, Adrien Bennetot⁴, Adrien Bennetot¹, Siham Tabik⁵, Alberto Barbado⁶, Salvador García⁵, Sergio Gil-Lopez, Daniel Molina⁵, Richard Benjamins⁶, Raja Chatila⁴, Francisco Herrera⁵ - Show less +10 more•Institutions (6)

French Institute for Research in Computer Science and Automation¹, University of the Basque Country², Basque Center for Applied Mathematics³, University of Paris⁴, University of Granada⁵, Telefónica⁶

22 Oct 2019-arXiv: Artificial Intelligence

TL;DR: Previous efforts to define explainability in Machine Learning are summarized, establishing a novel definition that covers prior conceptual propositions with a major focus on the audience for which explainability is sought, and a taxonomy of recent contributions related to the explainability of different Machine Learning models are proposed.

...read moreread less

Abstract: In the last years, Artificial Intelligence (AI) has achieved a notable momentum that may deliver the best of expectations over many application sectors across the field. For this to occur, the entire community stands in front of the barrier of explainability, an inherent problem of AI techniques brought by sub-symbolism (e.g. ensembles or Deep Neural Networks) that were not present in the last hype of AI. Paradigms underlying this problem fall within the so-called eXplainable AI (XAI) field, which is acknowledged as a crucial feature for the practical deployment of AI models. This overview examines the existing literature in the field of XAI, including a prospect toward what is yet to be reached. We summarize previous efforts to define explainability in Machine Learning, establishing a novel definition that covers prior conceptual propositions with a major focus on the audience for which explainability is sought. We then propose and discuss about a taxonomy of recent contributions related to the explainability of different Machine Learning models, including those aimed at Deep Learning methods for which a second taxonomy is built. This literature analysis serves as the background for a series of challenges faced by XAI, such as the crossroads between data fusion and explainability. Our prospects lead toward the concept of Responsible Artificial Intelligence, namely, a methodology for the large-scale implementation of AI methods in real organizations with fairness, model explainability and accountability at its core. Our ultimate goal is to provide newcomers to XAI with a reference material in order to stimulate future research advances, but also to encourage experts and professionals from other disciplines to embrace the benefits of AI in their activity sectors, without any prior bias for its lack of interpretability.

...read moreread less

1,602 citations

Proceedings Article•DOI•

Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks

[...]

Bolun Wang¹, Yuanshun Yao², Shawn Shan², Huiying Li², Bimal Viswanath³, Haitao Zheng², Ben Y. Zhao² - Show less +3 more•Institutions (3)

University of California, Santa Barbara¹, University of Chicago², Virginia Tech³

19 May 2019

TL;DR: This work presents the first robust and generalizable detection and mitigation system for DNN backdoor attacks, and identifies multiple mitigation techniques via input filters, neuron pruning and unlearning.

...read moreread less

Abstract: Lack of transparency in deep neural networks (DNNs) make them susceptible to backdoor attacks, where hidden associations or triggers override normal classification to produce unexpected results. For example, a model with a backdoor always identifies a face as Bill Gates if a specific symbol is present in the input. Backdoors can stay hidden indefinitely until activated by an input, and present a serious security risk to many security or safety related applications, e.g. biometric authentication systems or self-driving cars. We present the first robust and generalizable detection and mitigation system for DNN backdoor attacks. Our techniques identify backdoors and reconstruct possible triggers. We identify multiple mitigation techniques via input filters, neuron pruning and unlearning. We demonstrate their efficacy via extensive experiments on a variety of DNNs, against two types of backdoor injection methods identified by prior work. Our techniques also prove robust against a number of variants of the backdoor attack.

...read moreread less

929 citations

Posted Content•

Sanity Checks for Saliency Maps

[...]

Julius Adebayo¹, Justin Gilmer², Michael Muelly², Ian Goodfellow², Moritz Hardt², Been Kim² - Show less +2 more•Institutions (2)

Massachusetts Institute of Technology¹, Google²

08 Oct 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: It is shown that some existing saliency methods are independent both of the model and of the data generating process, and methods that fail the proposed tests are inadequate for tasks that are sensitive to either data or model.

...read moreread less

Abstract: Saliency methods have emerged as a popular tool to highlight features in an input deemed relevant for the prediction of a learned model. Several saliency methods have been proposed, often guided by visual appeal on image data. In this work, we propose an actionable methodology to evaluate what kinds of explanations a given method can and cannot provide. We find that reliance, solely, on visual assessment can be misleading. Through extensive experiments we show that some existing saliency methods are independent both of the model and of the data generating process. Consequently, methods that fail the proposed tests are inadequate for tasks that are sensitive to either data or model, such as, finding outliers in the data, explaining the relationship between inputs and outputs that the model learned, and debugging the model. We interpret our findings through an analogy with edge detection in images, a technique that requires neither training data nor model. Theory in the case of a linear model and a single-layer convolutional neural network supports our experimental findings.

...read moreread less

927 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book Chapter•DOI•

A Value for n-person Games

[...]

Lloyd S. Shapley

18 Mar 1952

TL;DR: In this paper, an examination of elementary properties of a value for the essential case is presented, which is deduced from a set of three axioms, having simple intuitive interpretations.

...read moreread less

Abstract: An examination of a number of elementary properties of a value for the essential case. This value is deduced from a set of three axioms, having simple intuitive interpretations.

...read moreread less

7,208 citations

Journal Article•DOI•

On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation.

[...]

Sebastian Bach¹, Alexander Binder², Grégoire Montavon¹, Frederick Klauschen³, Klaus-Robert Müller¹, Wojciech Samek¹ - Show less +2 more•Institutions (3)

Technical University of Berlin¹, Singapore University of Technology and Design², Charité³

10 Jul 2015-PLOS ONE

TL;DR: This work proposes a general solution to the problem of understanding classification decisions by pixel-wise decomposition of nonlinear classifiers by introducing a methodology that allows to visualize the contributions of single pixels to predictions for kernel-based classifiers over Bag of Words features and for multilayered neural networks.

...read moreread less

Abstract: Understanding and interpreting classification decisions of automated image classification systems is of high value in many applications, as it allows to verify the reasoning of the system and provides additional information to the human expert. Although machine learning methods are solving very successfully a plethora of tasks, they have in most cases the disadvantage of acting as a black box, not providing any information about what made them arrive at a particular decision. This work proposes a general solution to the problem of understanding classification decisions by pixel-wise decomposition of nonlinear classifiers. We introduce a methodology that allows to visualize the contributions of single pixels to predictions for kernel-based classifiers over Bag of Words features and for multilayered neural networks. These pixel contributions can be visualized as heatmaps and are provided to a human expert who can intuitively not only verify the validity of the classification decision, but also focus further analysis on regions of potential interest. We evaluate our method for classifiers trained on PASCAL VOC 2009 images, synthetic image data containing geometric shapes, the MNIST handwritten digits data set and for the pre-trained ImageNet model available as part of the Caffe open source package.

...read moreread less

3,330 citations

Journal Article•DOI•

Explaining prediction models and individual predictions with feature contributions

[...]

Erik Štrumbelj¹, Igor Kononenko¹•Institutions (1)

University of Ljubljana¹

01 Dec 2014-Knowledge and Information Systems

TL;DR: A sensitivity analysis-based method for explaining prediction models that can be applied to any type of classification or regression model, and which is equivalent to commonly used additive model-specific methods when explaining an additive model.

...read moreread less

Abstract: We present a sensitivity analysis-based method for explaining prediction models that can be applied to any type of classification or regression model. Its advantage over existing general methods is that all subsets of input features are perturbed, so interactions and redundancies between features are taken into account. Furthermore, when explaining an additive model, the method is equivalent to commonly used additive model-specific methods. We illustrate the method's usefulness with examples from artificial and real-world data sets and an empirical analysis of running times. Results from a controlled experiment with 122 participants suggest that the method's explanations improved the participants' understanding of the model.

...read moreread less

1,024 citations

Posted Content•

Learning Important Features Through Propagating Activation Differences

[...]

Avanti Shrikumar¹, Peyton Greenside¹, Anshul Kundaje¹•Institutions (1)

Stanford University¹

10 Apr 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: DeepLIFT as mentioned in this paper decomposes the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input.

...read moreread less

Abstract: The purported "black box" nature of neural networks is a barrier to adoption in applications where interpretability is essential. Here we present DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. DeepLIFT compares the activation of each neuron to its 'reference activation' and assigns contribution scores according to the difference. By optionally giving separate consideration to positive and negative contributions, DeepLIFT can also reveal dependencies which are missed by other approaches. Scores can be computed efficiently in a single backward pass. We apply DeepLIFT to models trained on MNIST and simulated genomic data, and show significant advantages over gradient-based methods. Video tutorial: this http URL, ICML slides: this http URL, ICML talk: this https URL, code: this http URL.

...read moreread less

1,014 citations

Journal Article•DOI•

Monotonic solutions of cooperative games

[...]

H. P. Young¹•Institutions (1)

University of Maryland, College Park¹

01 Jun 1985-International Journal of Game Theory

TL;DR: In this article, the Shapley value for cooperative games is characterized and shown to be monotonic in the sense that if a game changes so that some player's contribution to all coalitions increases or stays the same then the player's allocation should not decrease.

...read moreread less

Abstract: The principle of monotonicity for cooperative games states that if a game changes so that some player's contribution to all coalitions increases or stays the same then the player's allocation should not decrease. There is a unique symmetric and efficient solution concept that is monotonic in this most general sense — the Shapley value. Monotonicity thus provides a simple characterization of the value without resorting to the usual “additivity” and “dummy” assumptions, and lends support to the use of the value in applications where the underlying “game” is changing, e.g. in cost allocation problems.

...read moreread less

780 citations