scispace - formally typeset
Open AccessJournal ArticleDOI

"What is relevant in a text document?": An interpretable machine learning approach

TLDR
A measure of model explanatory power is introduced and it is shown that, although the SVM and CNN models perform similarly in terms of classification accuracy, the latter exhibits a higher level of explainability which makes it more comprehensible for humans and potentially more useful for other applications.
Abstract
Text documents can be described by a number of abstract concepts such as semantic category, writing style, or sentiment. Machine learning (ML) models have been trained to automatically map documents to these abstract concepts, allowing to annotate very large text collections, more than could be processed by a human in a lifetime. Besides predicting the text’s category very accurately, it is also highly desirable to understand how and why the categorization process takes place. In this paper, we demonstrate that such understanding can be achieved by tracing the classification decision back to individual words using layer-wise relevance propagation (LRP), a recently developed technique for explaining predictions of complex non-linear classifiers. We train two word-based ML models, a convolutional neural network (CNN) and a bag-of-words SVM classifier, on a topic categorization task and adapt the LRP method to decompose the predictions of these models onto words. Resulting scores indicate how much individual words contribute to the overall classification decision. This enables one to distill relevant information from text documents without an explicit semantic information extraction step. We further use the word-wise relevance scores for generating novel vector-based document representations which capture semantic information. Based on these document vectors, we introduce a measure of model explanatory power and show that, although the SVM and CNN models perform similarly in terms of classification accuracy, the latter exhibits a higher level of explainability which makes it more comprehensible for humans and potentially more useful for other applications.

read more

Content maybe subject to copyright    Report

RESEARCH ARTICLE
"What is relevant in a text document?": An
interpretable machine learning approach
Leila Arras
1
, Franziska Horn
2
, Gre
´
goire Montavon
2
, Klaus-Robert Mu
¨
ller
2,3,4
*,
Wojciech Samek
1
*
1 Machine Learning Group, Fraunhofer Heinrich Hertz Institute, Berlin, Germany, 2 Machine Learning Group,
Technische Universita
¨
t Berlin, Berlin, Germany, 3 Department of Brain and Cognitive Engineering, Korea
University, Seoul, Korea, 4 Max Planck Institute for Informatics, Saarbru¨cken, Germany
*
klaus-robert.mueller@tu-berlin.de (KRM); wojciech.samek@hhi.fraunhofer.de (WS)
Abstract
Text documents can be described by a number of abstract concepts such as semantic cate-
gory, writing style, or sentiment. Machine learning (ML) models have been trained to auto-
matically map documents to these abstract concepts, allowing to annotate very large text
collections, more than could be processed by a human in a lifetime. Besides predicting the
text’s category very accurately, it is also highly desirable to understand how and why the cat-
egorization process takes place. In this paper, we demonstrate that such understanding can
be achieved by tracing the classification decision back to individual words using layer-wise
relevance propagation (LRP), a recently developed technique for explaining predictions of
complex non-linear classifiers. We train two word-based ML models, a convolutional neural
network (CNN) and a bag-of-words SVM classifier, on a topic categorization task and adapt
the LRP method to decompose the predictions of these models onto words. Resulting
scores indicate how much individual words contribute to the overall classification decision.
This enables one to distill relevant information from text documents without an explicit
semantic information extraction step. We further use the word-wise relevance scores for
generating novel vector-based document representations which capture semantic informa-
tion. Based on these document vectors, we introduce a measure of model explanatory
power and show that, although the SVM and CNN models perform similarly in terms of clas-
sification accuracy, the latter exhibits a higher level of explainability which makes it more
comprehensible for humans and potentially more useful for other applications.
1 Introduction
A number of real-world problems related to text data have been studied under the framework
of natural language processing (NLP). Examples of such problems include topic categorization,
sentiment analysis, machine translation, structured information extraction, and automatic
summarization. Due to the overwhelming amount of text data available on the Internet from
various sources such as user-generated content or digitized books, methods to automatically
and intelligently process large collections of text documents are in high demand. For several
PLOS ONE | https://doi.org/10.1371/journal.pone.01811 42 August 11, 2017 1 / 23
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Arras L, Horn F, Montavon G, Mu¨ller K-R,
Samek W (2017) "What is relevant in a text
document?": An interpretable machine learning
approach. PLoS ONE 12(8): e0181142.
https://doi.
org/10.1371/journal.pone.0181142
Editor: Grigori Sidorov, MEXICO
Received: December 23, 2016
Accepted: June 26, 2017
Published: August 11, 2017
Copyright: © 2017 Arras et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: Data are available
from the UCI Machine Learning Repository:
https://
archive.ics.uci.edu/ml/datasets/Twenty
+Newsgroups
.
Funding: This work was supported by the German
Ministry for Education and Research as Berlin Big
Data Center BBDC, funding mark 01IS14013A, by
the Institute for Information & Communications
Technology Promotion (IITP) grant funded by the
Korea government (No. 2017-0-00451) and by
DFG. KRM thanks for partial funding by the
National Research Foundation of Korea funded by

text applications, machine learning (ML) models based on global word statistics like TFIDF [1,
2] or linear classifiers are known to perform remarkably well, e.g. for unsupervised keyword
extraction [3] or document classification [4]. However more recently, neural network models
based on vector space representations of words (like [
5]) have shown to be of great benefit to a
large number of tasks. The trend was initiated by the seminal work of [
6] and [7], who intro-
duced word-based neural networks to perform various NLP tasks such as language modeling,
chunking, named entity recognition, and semantic role labeling. A number of recent works
(e.g. [
7, 8]) also refined the basic neural network architecture by incorporating useful struc-
tures such as convolution, pooling, and parse tree hierarchies, leading to further improvements
in model predictions. Overall, these ML models have permitted to assign automatically and
accurately concepts to entire documents or to sub-document levels like phrases; the assigned
information can then be mined on a large scale.
In parallel, a set of techniques were developed in the context of image categorization to
explain the predictions of convolutional neural networks (a state-of-the-art ML model in this
field) or related models. These techniques were able to associate to each prediction of the
model a meaningful pattern in the space of input features [
911] or to perform a decomposi-
tion onto the input pixels of the model output [1214]. In this paper, we will make use of the
layer-wise relevance propagation (LRP) technique [
13], which has already been substantially
tested on various datasets and ML models [
1518].
In the present work, we propose a method to identify which words in a text document are
important to explain the category associated to it. The approach consists in using a ML classi-
fier to predict the categories as accurately as possible, and in a second step, decompose the ML
prediction onto the input domain, thus assigning to each word in the document a relevance
score. The ML model of study will be a word-embedding based convolutional neural network
that we train on a text classification task, namely topic categorization of newsgroup docu-
ments. As a second ML model we consider a classical bag-of-words support vector machine
(BoW/SVM) classifier.
We contribute the following:
1. The LRP technique [
13] is brought to the NLP domain and its suitability for identifying rel-
evant words in text documents is demonstrated.
2. LRP relevances are validated, at the document level, by building document heatmap visuali-
zations, and at the dataset level, by compiling representative words for a text category. It is
also shown quantitatively that LRP better identifies relevant words than sensitivity analysis.
3. A novel way of generating vector-based document representations is introduced and it is
verified that these document vectors present semantic regularities within their original fea-
ture space akin to word vector representations.
4. A measure for model explanatory power is proposed and it is shown that two ML models, a
neural network and a BoW/SVM classifier, although presenting similar classification per-
formance, may substantially differ in terms of explainability.
The work is organized as follows. In Section 2 we describe the related work for explaining
classifier decisions with respect to input space variables. In Section 3 we introduce our neural
network ML model for document classification, as well as the LRP decomposition procedure
associated to its predictions. We describe how LRP relevance scores can be used to identify
important words in documents and introduce a novel way of condensing the semantic infor-
mation of a text document into a single document vector. Likewise in section 3 we introduce a
baseline ML model for document classification, as well as a gradient-based alternative for
"What is relevant in a text document?": An interpretable machine learning approach
PLOS ONE | https://doi.org/10.1371/journal.pone.01811 42 August 11, 2017 2 / 23
the Ministry of Education, Science, and Technology
in the BK21 program.
Competing interests: The authors have declared
that no competing interests exist.

assigning relevance scores to words. In Section 4 we define objective criteria for evaluating
word relevance scores, as well as for assessing model explanatory power. In Section 5 we intro-
duce the dataset and experimental setup, and in Section 6 we present the results. Finally, Sec-
tion 7 concludes our work.
2 Related work
Explanation of individual classification decisions in terms of input variables has been studied
for a variety of machine learning classifiers such as additive classifiers [
19], kernel-based classi-
fiers [20] or hierarchical networks [12]. Model-agnostic methods for explanations relying on
random sampling have also been proposed [
2123]. Despite their generality, the latter however
incur an additional computational cost due to the need to process the whole sample to provide
a single explanation. Other methods are more specific to deep convolutional neural networks
used in computer vision: the authors of [
9] proposed a network propagation technique based
on deconvolutions to reconstruct input image patterns that are linked to a particular feature
map activation or prediction. The work of [
10] is aimed at revealing salient structures within
images related to a specific class by computing the corresponding prediction score derivative
with respect to the input image. The latter method is based on gradient magnitude, and thus
reveals the sensitivity of the classifier decision to some local variation of the input image; this
technique is related to sensitivity analysis [
24, 25].
In contrast, the LRP method of [
13] corresponds to a full decomposition of the classifier’s
actual prediction score value for the current input image. One can show that sensitivity analysis
decomposes the gradient square norm of the function f, i.e.,
i
R
i
= kr
x
f(x)k
2
, whereas LRP
decomposes the function value itself
i
R
i
= f(x). Intuitively, when the classifier e.g. detects cars
in images, then sensitivity analysis answers the question “what makes this car image more or
less a car?”, whereas LRP answers the more fundamental question “what makes this image a
car at all?”. Note that the LRP framework can be applied to various models such as kernel sup-
port vector machines and deep neural networks [
13, 18]. We refer the reader to [15] for a com-
parison of the three explanation methods, and to [
14] for a view of particular instances of LRP
as a “deep Taylor decomposition” of the decision function. A tutorial on methods for inter-
preting and understanding deep neural networks can be found in [
26].
In the context of neural networks for text classification [
27] proposed to extract salient sen-
tences from text documents using loss gradient magnitudes. In order to validate the pertinence
of the sentences extracted via the neural network classifier, the latter work proposed to subse-
quently use these sentences as an input to an external classifier and compare the resulting clas-
sification performance to random and heuristic sentence selection. The work by [
28] also
employs gradient magnitudes to identify salient words within sentences, analogously to the
method proposed in computer vision by [10]. However their analysis is based on qualitative
interpretation of saliency heatmaps for exemplary sentences. In addition to the heatmap visu-
alizations, we provide a classifier-intrinsic quantitative validation of the word-level relevances.
We furthermore extend previous work from [
29] by adding a BoW/SVM baseline to the exper-
iments and proposing a new criterion for assessing model explanatory power. Recent work
from [
30, 31] uses LRP to explain recurrent neural network predictions in sentiment analysis
and machine translation.
3 Interpretable text classification
In this Section we describe our method for identifying words in a text document, that are rele-
vant with respect to a given category of a classification problem. For this, we assume that we
are given a vector-based word representation and a convolutional neural network that has
"What is relevant in a text document?": An interpretable machine learning approach
PLOS ONE | https://doi.org/10.1371/journal.pone.01811 42 August 11, 2017 3 / 23

already been trained to map accurately documents to their actual category. Our method can be
divided into four steps: (1) Compute an input representation of a text document based on
word vectors. (2) Forward-propagate the input representation through the convolutional neu-
ral network until the output is reached. (3) Backward-propagate the output through the net-
work using the layer-wise relevance propagation (LRP) method, until the input is reached. (4)
Pool the relevance scores associated to each input variable of the network onto the words to
which they belong. As a result of this four-step procedure, a decomposition of the prediction
score for a category onto the words of the documents is obtained. Decomposed terms are
called relevance scores. These relevance scores can be viewed as highlighted text or can be used
to form a list of top-words in the document. The whole procedure is also described visually in
Fig 1. While we detail in this Section the LRP method for a specific network architecture and
with predefined choices of layers, the method can in principle be extended to any architecture
composed of a similar or larger number of layers.
At the end of this Section we introduce different methods which will serve as baselines for
comparison. A baseline for the convolutional neural network model is the BoW/SVM classi-
fier, with the LRP procedure adapted accordingly [
13]. A baseline for the LRP relevance
decomposition procedure is gradient-based sensitivity analysis (SA), a technique which assigns
sensitivity scores to individual words. In the vector-based document representation experi-
ments, we will also compare LRP to uniform and TFIDF baselines.
3.1 Representing words and documents
Prior to training the neural network and using it for prediction and explanation, we first derive
a numerical representation of the text documents that will serve as an input to the neural clas-
sifier. To this end, we map each individual word in the document to a vector embedding, and
concatenate these embeddings to form a matrix of size the number of words in the document
Fig 1. Diagram of a CNN-based interpretable machine learning system. It consists of a forward processing that computes for each input document a
high-level concept (e.g. semantic category or sentiment), and a redistribution procedure that explains the prediction in terms of words.
https://doi.org/10.1371 /journal.pone.0181142.g001
"What is relevant in a text document?": An interpretable machine learning approach
PLOS ONE | https://doi.org/10.1371/journal.pone.01811 42 August 11, 2017 4 / 23

times the dimension of the word embeddings. A distributed representation of words can be
learned from scratch, or fine-tuned simultaneously with the classification task of interest. In
the present work, we use only pre-training as it was shown that, even without fine-tuning, this
leads to good neural network classification performance for a variety of tasks like e.g. part-of-
speech tagging or sentiment analysis [
7, 32].
One shallow neural network model for learning word embeddings from unlabeled text
sources, is the continuous bag-of-words (CBOW) model of [
33], which is similar to the log-
bilinear language model from [
34, 35] but ignores the order of context words. In the CBOW
model, the objective is to predict a target middle word from the average of the embeddings of
the context words that are surrounding the middle word, by means of direct dot products
between word embeddings. During training, a set of word embeddings for context words v
and for target words v
0
are learned separately. After training is completed, only the context
word embeddings v will be retained for further applications. The CBOW objective has a simple
maximum likelihood formulation, where one maximizes over the training data the sum of the
logarithm of probabilities of the form:
Pðw
t
jw
tn:tþn
Þ ¼
exp ð
1
2n
P
njn;j0
v
w
tþj
Þ
>
v
0
w
t
P
w2V
exp ð
1
2n
P
njn;j0
v
w
tþj
Þ
>
v
0
w
where the softmax normalization runs over all words w in the vocabulary V, 2n is the number
of context words per training text window, w
t
represents the target word at the t
th
position in
the training data and w
tn:t+n
represent the corresponding context words.
In the present work, we utilize pre-trained word embeddings obtained with the CBOW
architecture and the negative sampling training procedure [
5]. We will refer to these embed-
dings as word2vec embeddings.
3.2 Predicting category with a convolutional neural network
Our ML model for classifying text documents, is a word-embedding based convolutional neu-
ral network (CNN) model similar to the one proposed in [
32] for sentence classification,
which itself is a slight variant of the model introduced in [
7] for semantic role labeling. This
architecture is depicted in
Fig 1 (left) and is composed of several layers.
As previously described, in a first step we map each word in the document to its word2vec
vector. Denoting by D the word embedding dimension and by L the document length, our
input is a matrix of shape D × L (e.g., for the purpose of illustration, in
Fig 1 we have D = 8 and
L = 6). We denote by x
i,t
the value of the i
th
component of the word2vec vector representing
the t
th
word in the document. The convolution/detection layer produces a new representation
composed of F sequences indexed by j, where each element of the sequence is computed as:
8j; t : x
j;t
¼ max 0;
X
i;t
x
i;tt
w
ð1Þ
i;j;t
þ b
ð1Þ
j
¼ max 0;
X
i
ðx
i
w
ð1Þ
i;j
Þ
t
þ b
ð1Þ
j
where t indicates a position within the text sequence, j designates a feature map, and
τ 2 {0, 1, . . ., H 1} is a delay with range H, the filter size of the one-dimensional convolu-
tional operation . After the convolutional operation, which yields F features maps of length
L H + 1, we apply the ReLU non-linearity element-wise (e.g., in
Fig 1, we have F = 5 features
maps and a filter size H = 2, hence we use τ 2 {0, 1} and the resulting feature maps have a
length of 5). Note that the trainable parameters w
(1)
and b
(1)
do not depend on the position t
in the text document, hence the convolutional processing is equivariant with this physical
dimension. The next layer computes, for each dimension j of the previous representation, the
"What is relevant in a text document?": An interpretable machine learning approach
PLOS ONE | https://doi.org/10.1371/journal.pone.01811 42 August 11, 2017 5 / 23

Citations
More filters
Journal ArticleDOI

Methods for interpreting and understanding deep neural networks

TL;DR: The second part of the tutorial focuses on the recently proposed layer-wise relevance propagation (LRP) technique, for which the author provides theory, recommendations, and tricks, to make most efficient use of it on real data.
Posted Content

Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models

TL;DR: Two approaches to explaining predictions of deep learning models are presented, one method which computes the sensitivity of the prediction with respect to changes in the input and one approach which meaningfully decomposes the decision in terms of the input variables.
Journal ArticleDOI

A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI

TL;DR: A review on interpretabilities suggested by different research works and categorize them is provided, hoping that insight into interpretability will be born with more considerations for medical practices and initiatives to push forward data-based, mathematically grounded, and technically grounded medical education are encouraged.
Journal ArticleDOI

Unmasking Clever Hans Predictors and Assessing What Machines Really Learn

TL;DR: The authors investigate how these methods approach learning in order to assess the dependability of their decision making and propose a semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines.
Journal Article

Quantum-Chemical Insights from Deep Tensor Neural Networks

TL;DR: An efficient deep learning approach is developed that enables spatially and chemically resolved insights into quantum-mechanical observables of molecular systems, and unifies concepts from many-body Hamiltonians with purpose-designed deep tensor neural networks, which leads to size-extensive and uniformly accurate chemical space predictions.
References
More filters
Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Journal ArticleDOI

Learning representations by back-propagating errors

TL;DR: Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.
Posted Content

Efficient Estimation of Word Representations in Vector Space

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.
Book ChapterDOI

Visualizing and Understanding Convolutional Networks

TL;DR: A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.
Posted Content

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.
Related Papers (5)