"What is relevant in a text document?": An interpretable machine learning approach

doi:10.1371/JOURNAL.PONE.0181142

RESEARCH ARTICLE

"What is relevant in a text document?": An

interpretable machine learning approach

Leila Arras

1

, Franziska Horn

2

, Gre

´

goire Montavon

2

, Klaus-Robert Mu

¨

ller

2,3,4

*,

Wojciech Samek

1

*

1 Machine Learning Group, Fraunhofer Heinrich Hertz Institute, Berlin, Germany, 2 Machine Learning Group,

Technische Universita

¨

t Berlin, Berlin, Germany, 3 Department of Brain and Cognitive Engineering, Korea

University, Seoul, Korea, 4 Max Planck Institute for Informatics, Saarbru¨cken, Germany

*

klaus-robert.mueller@tu-berlin.de (KRM); wojciech.samek@hhi.fraunhofer.de (WS)

Abstract

Text documents can be described by a number of abstract concepts such as semantic cate-

gory, writing style, or sentiment. Machine learning (ML) models have been trained to auto-

matically map documents to these abstract concepts, allowing to annotate very large text

collections, more than could be processed by a human in a lifetime. Besides predicting the

text’s category very accurately, it is also highly desirable to understand how and why the cat-

egorization process takes place. In this paper, we demonstrate that such understanding can

be achieved by tracing the classification decision back to individual words using layer-wise

relevance propagation (LRP), a recently developed technique for explaining predictions of

complex non-linear classifiers. We train two word-based ML models, a convolutional neural

network (CNN) and a bag-of-words SVM classifier, on a topic categorization task and adapt

the LRP method to decompose the predictions of these models onto words. Resulting

scores indicate how much individual words contribute to the overall classification decision.

This enables one to distill relevant information from text documents without an explicit

semantic information extraction step. We further use the word-wise relevance scores for

generating novel vector-based document representations which capture semantic informa-

tion. Based on these document vectors, we introduce a measure of model explanatory

power and show that, although the SVM and CNN models perform similarly in terms of clas-

sification accuracy, the latter exhibits a higher level of explainability which makes it more

comprehensible for humans and potentially more useful for other applications.

1 Introduction

A number of real-world problems related to text data have been studied under the framework

of natural language processing (NLP). Examples of such problems include topic categorization,

sentiment analysis, machine translation, structured information extraction, and automatic

summarization. Due to the overwhelming amount of text data available on the Internet from

various sources such as user-generated content or digitized books, methods to automatically

and intelligently process large collections of text documents are in high demand. For several

PLOS ONE | https://doi.org/10.1371/journal.pone.01811 42 August 11, 2017 1 / 23

a1111111111

OPEN ACCESS

Citation: Arras L, Horn F, Montavon G, Mu¨ller K-R,

Samek W (2017) "What is relevant in a text

document?": An interpretable machine learning

approach. PLoS ONE 12(8): e0181142.

https://doi.

org/10.1371/journal.pone.0181142

Editor: Grigori Sidorov, MEXICO

Received: December 23, 2016

Accepted: June 26, 2017

Published: August 11, 2017

access article distributed under the terms of the

Creative Commons Attribution License, which

permits unrestricted use, distribution, and

reproduction in any medium, provided the original

author and source are credited.

Data Availability Statement: Data are available

from the UCI Machine Learning Repository:

https://

archive.ics.uci.edu/ml/datasets/Twenty

+Newsgroups

.

Funding: This work was supported by the German

Ministry for Education and Research as Berlin Big

Data Center BBDC, funding mark 01IS14013A, by

the Institute for Information & Communications

Technology Promotion (IITP) grant funded by the

Korea government (No. 2017-0-00451) and by

DFG. KRM thanks for partial funding by the

National Research Foundation of Korea funded by

text applications, machine learning (ML) models based on global word statistics like TFIDF [1,

2] or linear classifiers are known to perform remarkably well, e.g. for unsupervised keyword

extraction [3] or document classification [4]. However more recently, neural network models

based on vector space representations of words (like [

5]) have shown to be of great benefit to a

large number of tasks. The trend was initiated by the seminal work of [

6] and [7], who intro-

duced word-based neural networks to perform various NLP tasks such as language modeling,

chunking, named entity recognition, and semantic role labeling. A number of recent works

(e.g. [

7, 8]) also refined the basic neural network architecture by incorporating useful struc-

tures such as convolution, pooling, and parse tree hierarchies, leading to further improvements

in model predictions. Overall, these ML models have permitted to assign automatically and

accurately concepts to entire documents or to sub-document levels like phrases; the assigned

information can then be mined on a large scale.

In parallel, a set of techniques were developed in the context of image categorization to

explain the predictions of convolutional neural networks (a state-of-the-art ML model in this

field) or related models. These techniques were able to associate to each prediction of the

model a meaningful pattern in the space of input features [

9–11] or to perform a decomposi-

tion onto the input pixels of the model output [12–14]. In this paper, we will make use of the

layer-wise relevance propagation (LRP) technique [

13], which has already been substantially

tested on various datasets and ML models [

15–18].

In the present work, we propose a method to identify which words in a text document are

important to explain the category associated to it. The approach consists in using a ML classi-

fier to predict the categories as accurately as possible, and in a second step, decompose the ML

prediction onto the input domain, thus assigning to each word in the document a relevance

score. The ML model of study will be a word-embedding based convolutional neural network

that we train on a text classification task, namely topic categorization of newsgroup docu-

ments. As a second ML model we consider a classical bag-of-words support vector machine

(BoW/SVM) classifier.

We contribute the following:

1. The LRP technique [

13] is brought to the NLP domain and its suitability for identifying rel-

evant words in text documents is demonstrated.

2. LRP relevances are validated, at the document level, by building document heatmap visuali-

zations, and at the dataset level, by compiling representative words for a text category. It is

also shown quantitatively that LRP better identifies relevant words than sensitivity analysis.

3. A novel way of generating vector-based document representations is introduced and it is

verified that these document vectors present semantic regularities within their original fea-

ture space akin to word vector representations.

4. A measure for model explanatory power is proposed and it is shown that two ML models, a

neural network and a BoW/SVM classifier, although presenting similar classification per-

formance, may substantially differ in terms of explainability.

The work is organized as follows. In Section 2 we describe the related work for explaining

classifier decisions with respect to input space variables. In Section 3 we introduce our neural

network ML model for document classification, as well as the LRP decomposition procedure

associated to its predictions. We describe how LRP relevance scores can be used to identify

important words in documents and introduce a novel way of condensing the semantic infor-

mation of a text document into a single document vector. Likewise in section 3 we introduce a

baseline ML model for document classification, as well as a gradient-based alternative for

"What is relevant in a text document?": An interpretable machine learning approach

PLOS ONE | https://doi.org/10.1371/journal.pone.01811 42 August 11, 2017 2 / 23

the Ministry of Education, Science, and Technology

in the BK21 program.

Competing interests: The authors have declared

that no competing interests exist.

assigning relevance scores to words. In Section 4 we define objective criteria for evaluating

word relevance scores, as well as for assessing model explanatory power. In Section 5 we intro-

duce the dataset and experimental setup, and in Section 6 we present the results. Finally, Sec-

tion 7 concludes our work.

2 Related work

Explanation of individual classification decisions in terms of input variables has been studied

for a variety of machine learning classifiers such as additive classifiers [

19], kernel-based classi-

fiers [20] or hierarchical networks [12]. Model-agnostic methods for explanations relying on

random sampling have also been proposed [

21–23]. Despite their generality, the latter however

incur an additional computational cost due to the need to process the whole sample to provide

a single explanation. Other methods are more specific to deep convolutional neural networks

used in computer vision: the authors of [

9] proposed a network propagation technique based

on deconvolutions to reconstruct input image patterns that are linked to a particular feature

map activation or prediction. The work of [

10] is aimed at revealing salient structures within

images related to a specific class by computing the corresponding prediction score derivative

with respect to the input image. The latter method is based on gradient magnitude, and thus

reveals the sensitivity of the classifier decision to some local variation of the input image; this

technique is related to sensitivity analysis [

24, 25].

In contrast, the LRP method of [

13] corresponds to a full decomposition of the classifier’s

actual prediction score value for the current input image. One can show that sensitivity analysis

decomposes the gradient square norm of the function f, i.e., ∑

i

R

i

= kr

x

f(x)k

2

, whereas LRP

decomposes the function value itself ∑

i

R

i

= f(x). Intuitively, when the classifier e.g. detects cars

in images, then sensitivity analysis answers the question “what makes this car image more or

less a car?”, whereas LRP answers the more fundamental question “what makes this image a

car at all?”. Note that the LRP framework can be applied to various models such as kernel sup-

port vector machines and deep neural networks [

13, 18]. We refer the reader to [15] for a com-

parison of the three explanation methods, and to [

14] for a view of particular instances of LRP

as a “deep Taylor decomposition” of the decision function. A tutorial on methods for inter-

preting and understanding deep neural networks can be found in [

26].

In the context of neural networks for text classification [

27] proposed to extract salient sen-

tences from text documents using loss gradient magnitudes. In order to validate the pertinence

of the sentences extracted via the neural network classifier, the latter work proposed to subse-

quently use these sentences as an input to an external classifier and compare the resulting clas-

sification performance to random and heuristic sentence selection. The work by [

28] also

employs gradient magnitudes to identify salient words within sentences, analogously to the

method proposed in computer vision by [10]. However their analysis is based on qualitative

interpretation of saliency heatmaps for exemplary sentences. In addition to the heatmap visu-

alizations, we provide a classifier-intrinsic quantitative validation of the word-level relevances.

We furthermore extend previous work from [

29] by adding a BoW/SVM baseline to the exper-

iments and proposing a new criterion for assessing model explanatory power. Recent work

from [

30, 31] uses LRP to explain recurrent neural network predictions in sentiment analysis

and machine translation.

3 Interpretable text classification

In this Section we describe our method for identifying words in a text document, that are rele-

vant with respect to a given category of a classification problem. For this, we assume that we

are given a vector-based word representation and a convolutional neural network that has

"What is relevant in a text document?": An interpretable machine learning approach

PLOS ONE | https://doi.org/10.1371/journal.pone.01811 42 August 11, 2017 3 / 23

already been trained to map accurately documents to their actual category. Our method can be

divided into four steps: (1) Compute an input representation of a text document based on

word vectors. (2) Forward-propagate the input representation through the convolutional neu-

ral network until the output is reached. (3) Backward-propagate the output through the net-

work using the layer-wise relevance propagation (LRP) method, until the input is reached. (4)

Pool the relevance scores associated to each input variable of the network onto the words to

which they belong. As a result of this four-step procedure, a decomposition of the prediction

score for a category onto the words of the documents is obtained. Decomposed terms are

called relevance scores. These relevance scores can be viewed as highlighted text or can be used

to form a list of top-words in the document. The whole procedure is also described visually in

Fig 1. While we detail in this Section the LRP method for a specific network architecture and

with predefined choices of layers, the method can in principle be extended to any architecture

composed of a similar or larger number of layers.

At the end of this Section we introduce different methods which will serve as baselines for

comparison. A baseline for the convolutional neural network model is the BoW/SVM classi-

fier, with the LRP procedure adapted accordingly [

13]. A baseline for the LRP relevance

decomposition procedure is gradient-based sensitivity analysis (SA), a technique which assigns

sensitivity scores to individual words. In the vector-based document representation experi-

ments, we will also compare LRP to uniform and TFIDF baselines.

3.1 Representing words and documents

Prior to training the neural network and using it for prediction and explanation, we first derive

a numerical representation of the text documents that will serve as an input to the neural clas-

sifier. To this end, we map each individual word in the document to a vector embedding, and

concatenate these embeddings to form a matrix of size the number of words in the document

Fig 1. Diagram of a CNN-based interpretable machine learning system. It consists of a forward processing that computes for each input document a

high-level concept (e.g. semantic category or sentiment), and a redistribution procedure that explains the prediction in terms of words.

https://doi.org/10.1371 /journal.pone.0181142.g001

"What is relevant in a text document?": An interpretable machine learning approach

PLOS ONE | https://doi.org/10.1371/journal.pone.01811 42 August 11, 2017 4 / 23

times the dimension of the word embeddings. A distributed representation of words can be

learned from scratch, or fine-tuned simultaneously with the classification task of interest. In

the present work, we use only pre-training as it was shown that, even without fine-tuning, this

leads to good neural network classification performance for a variety of tasks like e.g. part-of-

speech tagging or sentiment analysis [

7, 32].

One shallow neural network model for learning word embeddings from unlabeled text

sources, is the continuous bag-of-words (CBOW) model of [

33], which is similar to the log-

bilinear language model from [

34, 35] but ignores the order of context words. In the CBOW

model, the objective is to predict a target middle word from the average of the embeddings of

the context words that are surrounding the middle word, by means of direct dot products

between word embeddings. During training, a set of word embeddings for context words v

and for target words v

0

are learned separately. After training is completed, only the context

word embeddings v will be retained for further applications. The CBOW objective has a simple

maximum likelihood formulation, where one maximizes over the training data the sum of the

logarithm of probabilities of the form:

Pðw

t

jw

tn:tþn

Þ ¼

exp ð

1

2n



P

njn;j6¼0

v

w

tþj

Þ

>

v

0

w

t

 

P

w2V

exp ð

1

2n



P

njn;j6¼0

v

w

tþj

Þ

>

v

0

w

 

where the softmax normalization runs over all words w in the vocabulary V, 2n is the number

of context words per training text window, w

t

represents the target word at the t

th

position in

the training data and w

t−n:t+n

represent the corresponding context words.

In the present work, we utilize pre-trained word embeddings obtained with the CBOW

architecture and the negative sampling training procedure [

5]. We will refer to these embed-

dings as word2vec embeddings.

3.2 Predicting category with a convolutional neural network

Our ML model for classifying text documents, is a word-embedding based convolutional neu-

ral network (CNN) model similar to the one proposed in [

32] for sentence classification,

which itself is a slight variant of the model introduced in [

7] for semantic role labeling. This

architecture is depicted in

Fig 1 (left) and is composed of several layers.

As previously described, in a first step we map each word in the document to its word2vec

vector. Denoting by D the word embedding dimension and by L the document length, our

input is a matrix of shape D × L (e.g., for the purpose of illustration, in

Fig 1 we have D = 8 and

L = 6). We denote by x

i,t

the value of the i

th

component of the word2vec vector representing

the t

th

word in the document. The convolution/detection layer produces a new representation

composed of F sequences indexed by j, where each element of the sequence is computed as:

8j; t : x

j;t

¼ max 0;

X

i;t

x

i;tt

w

ð1Þ

i;j;t

þ b

ð1Þ

j

 

¼ max 0;

X

i

ðx

i

 w

ð1Þ

i;j

Þ

t

þ b

ð1Þ

j

 

where t indicates a position within the text sequence, j designates a feature map, and

τ 2 {0, 1, . . ., H − 1} is a delay with range H, the filter size of the one-dimensional convolu-

tional operation . After the convolutional operation, which yields F features maps of length

L − H + 1, we apply the ReLU non-linearity element-wise (e.g., in

Fig 1, we have F = 5 features

maps and a filter size H = 2, hence we use τ 2 {0, 1} and the resulting feature maps have a

length of 5). Note that the trainable parameters w

(1)

and b

(1)

do not depend on the position t

in the text document, hence the convolutional processing is equivariant with this physical

dimension. The next layer computes, for each dimension j of the previous representation, the

"What is relevant in a text document?": An interpretable machine learning approach

PLOS ONE | https://doi.org/10.1371/journal.pone.01811 42 August 11, 2017 5 / 23

"What is relevant in a text document?": An interpretable machine learning approach

Citations

Methods for interpreting and understanding deep neural networks

Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models

A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI

Unmasking Clever Hans Predictors and Assessing What Machines Really Learn

Quantum-Chemical Insights from Deep Tensor Neural Networks

References

Distributed Representations of Words and Phrases and their Compositionality

Learning representations by back-propagating errors

Efficient Estimation of Word Representations in Vector Space

Visualizing and Understanding Convolutional Networks

Distributed Representations of Words and Phrases and their Compositionality

Related Papers (5)

On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation.

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

Methods for interpreting and understanding deep neural networks

Visualizing and Understanding Convolutional Networks

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps