scispace - formally typeset
Open AccessBook ChapterDOI

Complex Linguistic Features for Text Classification: A Comprehensive Study

TLDR
Phrases, word senses and syntactic relations derived by Natural Language Processing techniques were observed ineffective to increase retrieval accuracy.
Abstract
Previous researches on advanced representations for document retrieval have shown that statistical state-of-the-art models are not improved by a variety of different linguistic representations. Phrases, word senses and syntactic relations derived by Natural Language Processing (NLP) techniques were observed ineffective to increase retrieval accuracy. For Text Categorization (TC) are available fewer and less definitive studies on the use of advanced document representations as it is a relatively new research area (compared to document retrieval).

read more

Content maybe subject to copyright    Report

Advanced Structural Representations for Question
Classification and Answer Re-ranking
Silvia Quarteroni
1
, Alessandro Moschitti
2
, Suresh Manandhar
1
, and Roberto Basili
2
1
The University of York, York YO10 5DD, United Kingdom
{silvia,suresh}@cs.york.ac.uk
2
University of Rome “Tor Vergata”, Via del Politecnico 1, 00133 Rome, Italy
{moschitti,basili}@info.uniroma2.it
Abstract. In this paper, we study novel structures to represent information in
three vital tasks in question answering: question classification, answer classifica-
tion and answer reranking. We define a new tree structure called PAS to represent
predicate-argument relations, as well as a new kernel function to exploit its repre-
sentative power. Our experiments with Support Vector Machines and several tree
kernel functions suggest that syntactic information helps specific task as question
classification, whereas, when data sparseness is higher as in answer classification,
studying coarse semantic information like PAS is a promising research area.
1 Introduction
Question answering (QA) can be seen as a form of information retrieval where, given a
question expressed in natural language, one or more answers in the form of sentences
(or paragraphs, or phrases) are returned. The typical architecture of a QA system is or-
ganized in three phases: question processing, document retrieval and answer extraction
[1].
In question processing, useful information is gathered from the question and a query
is created; this is then submitted to an information retrieval engine, which provides a
ranked list of relevant documents. From these, the QA system must extract one or more
candidate answers, which can then be reranked according to various criteria such as
their similarity to the query. Question processing is usually centered around question
classification (QC), the task that maps a question into one of k expected answer classes.
This is a crucial task as it constrains the search space of possible answers and con-
tributes to selecting answer extraction strategies specific to a given answer class. Most
accurate QC systems apply supervised machine learning techniques, e.g. Support Vec-
tor Machines (SVMs) [2] or the SNoW model [3], where questions are encoded using
a variety of lexical, syntactic and semantic features; here, it has been shown that the
question’s syntactic structure contributes remarkably to the classification accuracy.
The retrieval and answer extraction phases consist in retrieving relevant documents
[4] and selecting candidate answer passages [5,1] from them. A further phase called
answer re-ranking is optionally applied. It is especially relevant in the case of non-
factoid questions, such as those requiring definitions, where the answer can be a whole
sentence or a paragraph. Here, the syntactic structure of a sentence appears once again
to provide more useful information than a bag of words for such a complex task.

An effective way to integrate syntactic structures in machine learning algorithms is
the use of tree kernel functions [6]. Successful applications of these have been reported
for question classification [2,7] and other tasks, e.g. relation extraction [8,7]. However,
such an approach may not be sufficient to encode syntactic structures in more complex
tasks such as computing the relationships between questions and answers in answer
reranking. The information provided by parse trees may prove too sparse: the same con-
cept, expressed in two different sentences, will produce different, unmatching parses.
One way to overcome this issue is to try to capture semantic relations by processing
shallow representations like predicate argument structures proposed in the PropBank
3
(PB) project [9]. We argue that such semantic structures can be used to characterize the
relation between a question and a candidate answer.
In this paper, we extensively study advanced structural representations, namely
parse trees, bag-of-words, Part-of-Speech tags and predicate argument structures for
question classification and answer re-ranking. We encode such information by combin-
ing tree kernels with linear kernels. Moreover, to exploit predicate argument informa-
tion - which we can automatically derive with our state-of-the-art software - we have
defined a new tree structure for its representation and a new kernel function able to pro-
cess its semantics. Additionally,for the purpose of answer classification and re-ranking,
we have created a corpus of answers to TREC-QA 2001 description questions obtained
using a Web-based QA system.
Our experiments with SVMs and the above kernels show that (a) our approach
reaches state-of-the-art accuracy on question classification and (b) PB predicativestruc-
tures are not effective for question classification but show promising results for answer
classification. Overall, our answer classifier increases the ranking accuracy of a basic
QA system by about 20 absolute percent points.
This paper is structured as follows: in Section 2, we introduce advanced models to
represent syntactic and semantic information in a QA context; Section 3 explains how
such information is exploited in an SVM learning framework by introducing novel tree
kernel functions; Section 4 reports our experiments on question classification, answer
classification and answer reranking; finally, Section 5 concludes on the utility of the
newly introduced structure representations and sets the basis for further work.
2 Advanced Models for Sentence/Question Representation
Traditionally, the majority of information retrieval tasks have been solved by means of
the so-called bag-of-words approach augmented by language modeling [10]. However,
when the task requires the use of more complex semantics the above approach does
not appear to be effective, as it is inadequate to perform fine-level textual analysis. To
overcome this, QA systems use linguistic processing tools such as syntactic parsers.
In our study we exploited two sources of syntactic information: deep syntactic parsers
and shallow semantic parsers. While parsing produces parse trees, shallow semantic
parsing detects and labels a proposition with the relations between its components, i.e.
predicates and arguments. While the former technology is well-studied [6,11], the latter
has only recently been the object of a consistent body of work.
3
www.cis.upenn.edu/
ace

2.1 Syntactic Structures
The syntactic parse tree of a sentence is a hierarchical representation of the syntactic
relationships between its words. In such tree, each node with its children is associated
with a grammar production rule, where the symbol at the left-hand side corresponds to
the parent and the symbols at the right-hand side are associated with the children. The
terminal symbols of the grammar are always associated with the leaves of the tree.
Parse trees have often been applied in natural language processing applications re-
quiring the use of grammatical relations, e.g. extraction of subject/object relations. Re-
cently, it has been shown [2,7] that syntactic information outperformed bag-of-words
and bag-of-n-grams on the classification of Question Type in QA. The advantage of
computing sentence similarity based on parse trees with respect to purely lexical ap-
proaches is that trees provide structural relations hard to compute with other methods.
However, when approaching complex QA tasks, the use of parse trees has some
limitations. For instance in definitional QA candidate answers can be expressed by long
and articulated definitions spanning one or more sentences. Here, since the information
encoded in a parse tree is intrinsically sparse, it does not contribute well to comput-
ing the similarity between long sentences or paragraphs. In this case, it makes sense
to investigate more “compact” forms of information representation: shallow semantics
could be an answer to prevent the sparseness of deep structural approachesand the noise
of bag-of-word models.
2.2 Semantic Structures
Initiatives such as PropBank (PB) [9] have led to the creation of vast and accurate
resources of manually annotated predicate argument structures. Using these, machine
learning techniques have proven successful in Semantic Role Labeling (SRL), the task
of attaching semantic roles to predicates and their arguments. SRL is a fully exploitable
technology: our SRL system based on SVMs is able to achieve an accuracy of 76% on
PB data, among the highest in CoNLL [12]. Attempting an application of SRL in the
context of QA hence appears natural, as understanding a question and pinpointing its
answer relies on a deep understanding of the question and answer’s semantics.
The PB corpus is one of the largest resources of manually annotated predicate ar-
gument structures
4
; for any given predicate, the expected arguments are labeled se-
quentially from ARG0 to ARG5, ARGA and ARGM. For example, the following
is a typical PB annotation of a sentence: [
ARG0
Compounded interest] [
predicate
computes] [
ARG1
the effective interest rate for an investment]
[
ARGMT MP
during the current year].
Such shallow semantic annotation is quite useful to harvest information. For instance,
the predicative annotation of a very similar sentence would be: [
ARGMT MP
In a
year][
ARG1
the bank interest rate] is [
predicate
evaluated] by [
ARG0
the compounded interest].
The above annotations can be represented by using tree structures like in Figure 1,
which we call PASs. These attempt to capture the semantics of both sentences.
4
It contains 300,000 words annotated with predicative information on top of the Penn Treebank
2 Wall Street Journal texts

PAS
ARG0
compounded interest
rel
compute
ARG1
the effective interest rate for an investment
ARGM-TMP
during a year
PAS
ARG0
compounded interest
rel
evaluate
ARG1
bank interest rate
ARGM-TMP
in a year
Fig.1. Predicate argument structures of two different sentences expressing similar semantics.
We can improve such representation by substituting the arguments with their most
important word often referred to as the semantic head as in Figure 2. It seems
intuitive that data sparseness can be remarkably reduced by using this shallow repre-
sentation instead of the BOW representation.
PAS
ARG0
interest
rel
compute
ARG1
rate
ARGM-TMP
year
PAS
ARG0
interest
rel
evaluate
ARG1
rate
ARGM-TMP
year
Fig.2. Improved predicate argument structures of two different sentences.
Knowing that syntactic trees and PASs may improve the simple BOW represen-
tation, we now face the problem of representing tree structures in learning machines.
Section 3 introduces a viable structure representation approach based on tree kernels.
3 Syntactic and Semantic Tree Kernels
As mentioned above, encoding syntactic/semantic information represented by means
of tree structures in the learning algorithm is problematic. One possible solution is to
use as features of a structure all its possible substructures. Given the combinatorial
explosion of considering the subparts, the resulting feature space is usually very large.
To manage such complexity we can define kernel functions that implicitly evaluate the
scalar product between two feature vectors without explicitly computing such vectors.
In the following subsections, we report the tree kernel function devised in [6] computing
the number of common subtrees between two syntactic parse trees and a new modified
version that evaluates the number of semantic structures shared between two PASs.
3.1 Syntactic Tree Kernel
Given two trees T
1
and T
2
, let {f
1
, f
2
, ..} = F be the set of substructures (fragments)
and let I
i
(n) be equal to 1 if f
i
is rooted at node n and 0 otherwise. We define
K(T
1
, T
2
) =
X
n
1
N
T
1
X
n
2
N
T
2
(n
1
, n
2
) (1)

T1
NP
DT
a
NN
dog
T2
NP
DT
a
NN
cat
f
1
NP
DT
a
NN
f
2
NP
DT
NN
f
3
DT
a
Fig.3. Input trees T1 and T2 with their fragments f
1
, f
2
and f
3
derived by the kernel function.
where N
T
1
and N
T
2
are the sets of nodes in T
1
and T
2
, respectively and (n
1
, n
2
) =
P
|F|
i=1
I
i
(n
1
)I
i
(n
2
). The latter is equal to the number of common fragments rooted in
nodes n
1
and n
2
. We can compute as follows:
1. if the productions at n
1
and n
2
are different then (n
1
, n
2
) = 0;
2. if the productions at n
1
and n
2
are the same, and n
1
and n
2
only have leaf children
(i.e. they are pre-terminals symbols) then (n
1
, n
2
) = 1;
3. if the productions at n
1
and n
2
are the same, and n
1
and n
2
are not pre-terminals
then
(n
1
, n
2
) =
nc(n
1
)
Y
j=1
(1 + (c
j
n
1
, c
j
n
2
)) (2)
where nc(n
1
)
5
is the number of children of n
1
and c
j
n
is the j-th child of node n. As
proved in [6], the above algorithm allow us to evaluate Eq. 1 in O(|N
T
1
| × |N
T
2
|).
Moreover, a decay factor λ is usually added by changing the formulae in (2) and (3)
to
6
:
2. (n
1
, n
2
) = λ,
3. (n
1
, n
2
) = λ
Q
nc(n
1
)
j=1
(1 + (c
j
n
1
, c
j
n
2
)).
As an example, Figure 3 shows two trees and the substructures they have in com-
mon. It is worth to note that the fragments of the above Syntactic Tree Kernel (STK)
are such that any node contains either all or none of its children. Consequently, [NP
[DT]] and [NP [NN]] are not valid fragments. This limitation makes it unsuitable
to derive important substructures from the PAS tree. The next section shows a new tree
kernel that takes this into account.
3.2 Semantic Tree Kernel
As mentioned above, the kernel function introduced in Section 2 is not sufficient to
derive all the required information from trees such as the PAS in Fig. 2: we would like
to have fragments that contain nodes with only part of the children, e.g. to neglect the
5
Note that, since the productions are the same, nc(n
1
) = nc(n
2
).
6
To have a similarity score between 0 and 1, we also apply the normalization in the kernel
space, i.e. K
(T
1
, T
2
) =
K(T
1
,T
2
)
K(T
1
,T
1
)×K(T
2
,T
2
)
.

Citations
More filters
Journal ArticleDOI

Machine learning

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Proceedings ArticleDOI

Automatically Assessing Review Helpfulness

TL;DR: This paper considers the task of automatically assessing review helpfulness, and finds that the most useful features include the length of the review, its unigrams, and its product rating.
Journal Article

Inductive learning algorithms and representations for text categorization

TL;DR: Text categorization-assignment of natural language texts to one or more predefined categories based on their content-is an important component in many information organization and management tasks.

Recent Developments in Document Clustering

TL;DR: This report aims to give a brief overview of the current state of document clustering research and present recent developments in a well-organized manner.
References
More filters
Book

The Nature of Statistical Learning Theory

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Journal ArticleDOI

Machine learning

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Journal ArticleDOI

WordNet : an electronic lexical database

Christiane Fellbaum
- 01 Sep 2000 - 
TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.
Journal ArticleDOI

Term Weighting Approaches in Automatic Text Retrieval

TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.
Proceedings ArticleDOI

Advances in kernel methods: support vector learning

TL;DR: Support vector machines for dynamic reconstruction of a chaotic system, Klaus-Robert Muller et al pairwise classification and support vector machines, Ulrich Kressel.
Frequently Asked Questions (15)
Q1. What are the contributions in "Advanced structural representations for question classification and answer re-ranking" ?

In this paper, the authors study novel structures to represent information in three vital tasks in question answering: question classification, answer classification and answer reranking. The authors define a new tree structure called PAS to represent predicate-argument relations, as well as a new kernel function to exploit its representative power. Their experiments with Support Vector Machines and several tree kernel functions suggest that syntactic information helps specific task as question classification, whereas, when data sparseness is higher as in answer classification, studying coarse semantic information like PAS is a promising research area. 

In the future, the authors will investigate the utility of PASs for similar tasks affected by noisy data and apply a true SVM re-ranker trained with the proposed advanced information. 

In this paper, the authors extensively study advanced structural representations, namely parse trees, bag-of-words, Part-of-Speech tags and predicate argument structures for question classification and answer re-ranking. 

Knowing that syntactic trees and PASs may improve the simple BOW representation, the authors now face the problem of representing tree structures in learning machines. 

In order to gather more statistically significant data, the authors ran five-fold cross-validation, with the constraint that two pairs 〈q, a1〉 and 〈q, a2〉 associated with the same question q could not be split between training and testing. 

One way to overcome this issue is to try to capture semantic relations by processing shallow representations like predicate argument structures proposed in the PropBank3 (PB) project [9]. 

The advantage of computing sentence similarity based on parse trees with respect to purely lexical approaches is that trees provide structural relations hard to compute with other methods. 

The retrieval and answer extraction phases consist in retrieving relevant documents [4] and selecting candidate answer passages [5,1] from them. 

Their experiments with Support Vector Machines and such new functions suggest that syntactic information helps specific tasks such as question classification. 

Their higher results with respect to [2] are explained by a highly performing BOW, the use of parameterization and most importantly the fact that their model is obtained by summing two separate kernel spaces (with separate normalization), as mixing BOW with tree kernels does not allow SVMs to exploit all its representational power. 

The performance of the multi-classifier and the individual binary classifiers are measured using accuracy and F1-measure, respectively. 

The authors collected a corpus containing 1123 sentences, 401 of which – labeled as “+1” – answered the question either concisely or with noise; the rest – labeled as “-1”– were either irrelevant to the question or contained hints relating to the question but could not be judged as valid answers9. 

each sentence in each document is compared to the question to compute the Jaccard similarity, which, in the answer extraction phase, is used to select the mostrelevant sentence. 

It contains 300,000 words annotated with predicative information on top of the Penn Treebank 2 Wall Street Journal textsThe authors can improve such representation by substituting the arguments with their most important word – often referred to as the semantic head – as in Figure 2. 

On the other hand, the coarse-grained semantic information contained by the PAS gives promising results in answer classification, which suffers more from data sparseness.