What future works have the authors mentioned in the paper "Advanced structural representations for question classification and answer re-ranking" ?

In the future, the authors will investigate the utility of PASs for similar tasks affected by noisy data and apply a true SVM re-ranker trained with the proposed advanced information.

How can the authors improve the representation of tree structures in learning machines?

Knowing that syntactic trees and PASs may improve the simple BOW representation, the authors now face the problem of representing tree structures in learning machines.

How did the authors gather the results of the answer classification experiment?

In order to gather more statistically significant data, the authors ran five-fold cross-validation, with the constraint that two pairs 〈q, a1〉 and 〈q, a2〉 associated with the same question q could not be split between training and testing.

What does the new kernel function do?

Their experiments with Support Vector Machines and such new functions suggest that syntactic information helps specific tasks such as question classification.

What is the reason why the results are higher than the results of [2]?

Their higher results with respect to [2] are explained by a highly performing BOW, the use of parameterization and most importantly the fact that their model is obtained by summing two separate kernel spaces (with separate normalization), as mixing BOW with tree kernels does not allow SVMs to exploit all its representational power.

How are the performance of the multi-classifier and the individual binary classifiers measured?

The performance of the multi-classifier and the individual binary classifiers are measured using accuracy and F1-measure, respectively.

How many sentences were used in the answer extraction phase?

The authors collected a corpus containing 1123 sentences, 401 of which – labeled as “+1” – answered the question either concisely or with noise; the rest – labeled as “-1”– were either irrelevant to the question or contained hints relating to the question but could not be judged as valid answers9.

What is the process of calculating the Jaccard similarity?

each sentence in each document is compared to the question to compute the Jaccard similarity, which, in the answer extraction phase, is used to select the mostrelevant sentence.

How many words can be used to improve the representation of the PB?

It contains 300,000 words annotated with predicative information on top of the Penn Treebank 2 Wall Street Journal textsThe authors can improve such representation by substituting the arguments with their most important word – often referred to as the semantic head – as in Figure 2.

What is the answer classification system?

On the other hand, the coarse-grained semantic information contained by the PAS gives promising results in answer classification, which suffers more from data sparseness.

(Open Access) Complex Linguistic Features for Text Classification: A Comprehensive Study (2004) | Alessandro Moschitti

Q: What are the contributions in "Advanced structural representations for question classification and answer re-ranking" ?

In this paper, the authors study novel structures to represent information in three vital tasks in question answering: question classification, answer classification and answer reranking. The authors define a new tree structure called PAS to represent predicate-argument relations, as well as a new kernel function to exploit its representative power. Their experiments with Support Vector Machines and several tree kernel functions suggest that syntactic information helps specific task as question classification, whereas, when data sparseness is higher as in answer classification, studying coarse semantic information like PAS is a promising research area.

Q: What is the purpose of this paper?

In this paper, the authors extensively study advanced structural representations, namely parse trees, bag-of-words, Part-of-Speech tags and predicate argument structures for question classification and answer re-ranking.

Advanced Structural Representations for Question

Classiﬁcation and Answer Re-ranking

Silvia Quarteroni

, Alessandro Moschitti

, Suresh Manandhar

, and Roberto Basili

The University of York, York YO10 5DD, United Kingdom

{silvia,suresh}@cs.york.ac.uk

University of Rome “Tor Vergata”, Via del Politecnico 1, 00133 Rome, Italy

{moschitti,basili}@info.uniroma2.it

Abstract. In this paper, we study novel structures to represent information in

three vital tasks in question answering: question classiﬁcation, answer classiﬁca-

tion and answer reranking. We deﬁne a new tree structure called PAS to represent

predicate-argument relations, as well as a new kernel function to exploit its repre-

sentative power. Our experiments with Support Vector Machines and several tree

kernel functions suggest that syntactic information helps speciﬁc task as question

classiﬁcation, whereas, when data sparseness is higher as in answer classiﬁcation,

studying coarse semantic information like PAS is a promising research area.

1 Introduction

Question answering (QA) can be seen as a form of information retrieval where, given a

question expressed in natural language, one or more answers in the form of sentences

(or paragraphs, or phrases) are returned. The typical architecture of a QA system is or-

ganized in three phases: question processing, document retrieval and answer extraction

[1].

In question processing, useful information is gathered from the question and a query

is created; this is then submitted to an information retrieval engine, which provides a

ranked list of relevant documents. From these, the QA system must extract one or more

candidate answers, which can then be reranked according to various criteria such as

their similarity to the query. Question processing is usually centered around question

classiﬁcation (QC), the task that maps a question into one of k expected answer classes.

This is a crucial task as it constrains the search space of possible answers and con-

tributes to selecting answer extraction strategies speciﬁc to a given answer class. Most

accurate QC systems apply supervised machine learning techniques, e.g. Support Vec-

tor Machines (SVMs) [2] or the SNoW model [3], where questions are encoded using

a variety of lexical, syntactic and semantic features; here, it has been shown that the

question’s syntactic structure contributes remarkably to the classiﬁcation accuracy.

The retrieval and answer extraction phases consist in retrieving relevant documents

[4] and selecting candidate answer passages [5,1] from them. A further phase called

answer re-ranking is optionally applied. It is especially relevant in the case of non-

factoid questions, such as those requiring deﬁnitions, where the answer can be a whole

sentence or a paragraph. Here, the syntactic structure of a sentence appears once again

to provide more useful information than a bag of words for such a complex task.

An effective way to integrate syntactic structures in machine learning algorithms is

the use of tree kernel functions [6]. Successful applications of these have been reported

for question classiﬁcation [2,7] and other tasks, e.g. relation extraction [8,7]. However,

such an approach may not be sufﬁcient to encode syntactic structures in more complex

tasks such as computing the relationships between questions and answers in answer

reranking. The information provided by parse trees may prove too sparse: the same con-

cept, expressed in two different sentences, will produce different, unmatching parses.

One way to overcome this issue is to try to capture semantic relations by processing

shallow representations like predicate argument structures proposed in the PropBank

(PB) project [9]. We argue that such semantic structures can be used to characterize the

relation between a question and a candidate answer.

In this paper, we extensively study advanced structural representations, namely

parse trees, bag-of-words, Part-of-Speech tags and predicate argument structures for

question classiﬁcation and answer re-ranking. We encode such information by combin-

ing tree kernels with linear kernels. Moreover, to exploit predicate argument informa-

tion - which we can automatically derive with our state-of-the-art software - we have

deﬁned a new tree structure for its representation and a new kernel function able to pro-

cess its semantics. Additionally,for the purpose of answer classiﬁcation and re-ranking,

we have created a corpus of answers to TREC-QA 2001 description questions obtained

using a Web-based QA system.

Our experiments with SVMs and the above kernels show that (a) our approach

reaches state-of-the-art accuracy on question classiﬁcation and (b) PB predicativestruc-

tures are not effective for question classiﬁcation but show promising results for answer

classiﬁcation. Overall, our answer classiﬁer increases the ranking accuracy of a basic

QA system by about 20 absolute percent points.

This paper is structured as follows: in Section 2, we introduce advanced models to

represent syntactic and semantic information in a QA context; Section 3 explains how

such information is exploited in an SVM learning framework by introducing novel tree

kernel functions; Section 4 reports our experiments on question classiﬁcation, answer

classiﬁcation and answer reranking; ﬁnally, Section 5 concludes on the utility of the

newly introduced structure representations and sets the basis for further work.

2 Advanced Models for Sentence/Question Representation

Traditionally, the majority of information retrieval tasks have been solved by means of

the so-called bag-of-words approach augmented by language modeling [10]. However,

when the task requires the use of more complex semantics the above approach does

not appear to be effective, as it is inadequate to perform ﬁne-level textual analysis. To

overcome this, QA systems use linguistic processing tools such as syntactic parsers.

In our study we exploited two sources of syntactic information: deep syntactic parsers

and shallow semantic parsers. While parsing produces parse trees, shallow semantic

parsing detects and labels a proposition with the relations between its components, i.e.

predicates and arguments. While the former technology is well-studied [6,11], the latter

has only recently been the object of a consistent body of work.

www.cis.upenn.edu/

∼

ace

2.1 Syntactic Structures

The syntactic parse tree of a sentence is a hierarchical representation of the syntactic

relationships between its words. In such tree, each node with its children is associated

with a grammar production rule, where the symbol at the left-hand side corresponds to

the parent and the symbols at the right-hand side are associated with the children. The

terminal symbols of the grammar are always associated with the leaves of the tree.

Parse trees have often been applied in natural language processing applications re-

quiring the use of grammatical relations, e.g. extraction of subject/object relations. Re-

cently, it has been shown [2,7] that syntactic information outperformed bag-of-words

and bag-of-n-grams on the classiﬁcation of Question Type in QA. The advantage of

computing sentence similarity based on parse trees with respect to purely lexical ap-

proaches is that trees provide structural relations hard to compute with other methods.

However, when approaching complex QA tasks, the use of parse trees has some

limitations. For instance in deﬁnitional QA candidate answers can be expressed by long

and articulated deﬁnitions spanning one or more sentences. Here, since the information

encoded in a parse tree is intrinsically sparse, it does not contribute well to comput-

ing the similarity between long sentences or paragraphs. In this case, it makes sense

to investigate more “compact” forms of information representation: shallow semantics

could be an answer to prevent the sparseness of deep structural approachesand the noise

of bag-of-word models.

2.2 Semantic Structures

Initiatives such as PropBank (PB) [9] have led to the creation of vast and accurate

resources of manually annotated predicate argument structures. Using these, machine

learning techniques have proven successful in Semantic Role Labeling (SRL), the task

of attaching semantic roles to predicates and their arguments. SRL is a fully exploitable

technology: our SRL system based on SVMs is able to achieve an accuracy of 76% on

PB data, among the highest in CoNLL [12]. Attempting an application of SRL in the

context of QA hence appears natural, as understanding a question and pinpointing its

answer relies on a deep understanding of the question and answer’s semantics.

The PB corpus is one of the largest resources of manually annotated predicate ar-

gument structures

; for any given predicate, the expected arguments are labeled se-

quentially from ARG0 to ARG5, ARGA and ARGM. For example, the following

is a typical PB annotation of a sentence: [

ARG0

Compounded interest] [

predicate

computes] [

ARG1

the effective interest rate for an investment]

[

ARGM−T MP

during the current year].

Such shallow semantic annotation is quite useful to harvest information. For instance,

the predicative annotation of a very similar sentence would be: [

ARGM−T MP

In a

year][

ARG1

the bank interest rate] is [

predicate

evaluated] by [

ARG0

the compounded interest].

The above annotations can be represented by using tree structures like in Figure 1,

which we call PASs. These attempt to capture the semantics of both sentences.

It contains 300,000 words annotated with predicative information on top of the Penn Treebank

2 Wall Street Journal texts

PAS

ARG0

compounded interest

rel

compute

ARG1

the effective interest rate for an investment

ARGM-TMP

during a year

PAS

ARG0

compounded interest

rel

evaluate

ARG1

bank interest rate

ARGM-TMP

in a year

Fig.1. Predicate argument structures of two different sentences expressing similar semantics.

We can improve such representation by substituting the arguments with their most

important word – often referred to as the semantic head – as in Figure 2. It seems

intuitive that data sparseness can be remarkably reduced by using this shallow repre-

sentation instead of the BOW representation.

PAS

ARG0

interest

rel

compute

ARG1

rate

ARGM-TMP

year

PAS

ARG0

interest

rel

evaluate

ARG1

rate

ARGM-TMP

year

Fig.2. Improved predicate argument structures of two different sentences.

Knowing that syntactic trees and PASs may improve the simple BOW represen-

tation, we now face the problem of representing tree structures in learning machines.

Section 3 introduces a viable structure representation approach based on tree kernels.

3 Syntactic and Semantic Tree Kernels

As mentioned above, encoding syntactic/semantic information represented by means

of tree structures in the learning algorithm is problematic. One possible solution is to

use as features of a structure all its possible substructures. Given the combinatorial

explosion of considering the subparts, the resulting feature space is usually very large.

To manage such complexity we can deﬁne kernel functions that implicitly evaluate the

scalar product between two feature vectors without explicitly computing such vectors.

In the following subsections, we report the tree kernel function devised in [6] computing

the number of common subtrees between two syntactic parse trees and a new modiﬁed

version that evaluates the number of semantic structures shared between two PASs.

3.1 Syntactic Tree Kernel

Given two trees T

and T

, let {f

, f

, ..} = F be the set of substructures (fragments)

and let I

(n) be equal to 1 if f

is rooted at node n and 0 otherwise. We deﬁne

K(T

, T

) =

∈N

∆(n

, n

) (1)

dog

cat

Fig.3. Input trees T1 and T2 with their fragments f

, f

and f

derived by the kernel function.

where N

and N

are the sets of nodes in T

and T

, respectively and ∆(n

, n

) =

|F|

i=1

). The latter is equal to the number of common fragments rooted in

nodes n

and n

. We can compute ∆ as follows:

1. if the productions at n

and n

are different then ∆(n

, n

) = 0;

2. if the productions at n

and n

are the same, and n

and n

only have leaf children

(i.e. they are pre-terminals symbols) then ∆(n

, n

) = 1;

3. if the productions at n

and n

are the same, and n

and n

are not pre-terminals

then

∆(n

, n

) =

nc(n

)

j=1

(1 + ∆(c

, c

)) (2)

where nc(n

)

is the number of children of n

and c

is the j-th child of node n. As

proved in [6], the above algorithm allow us to evaluate Eq. 1 in O(|N

| × |N

|).

Moreover, a decay factor λ is usually added by changing the formulae in (2) and (3)

2. ∆(n

, n

) = λ,

3. ∆(n

, n

) = λ

nc(n

)

j=1

(1 + ∆(c

, c

)).

As an example, Figure 3 shows two trees and the substructures they have in com-

mon. It is worth to note that the fragments of the above Syntactic Tree Kernel (STK)

are such that any node contains either all or none of its children. Consequently, [NP

[DT]] and [NP [NN]] are not valid fragments. This limitation makes it unsuitable

to derive important substructures from the PAS tree. The next section shows a new tree

kernel that takes this into account.

3.2 Semantic Tree Kernel

As mentioned above, the kernel function introduced in Section 2 is not sufﬁcient to

derive all the required information from trees such as the PAS in Fig. 2: we would like

to have fragments that contain nodes with only part of the children, e.g. to neglect the

Note that, since the productions are the same, nc(n

) = nc(n

To have a similarity score between 0 and 1, we also apply the normalization in the kernel

space, i.e. K

′

, T

) =

K(T

)

√

K(T

)×K(T

)

Complex Linguistic Features for Text Classification: A Comprehensive Study

Figures

Citations

Machine learning

Automatically Assessing Review Helpfulness

Inductive learning algorithms and representations for text categorization

Recent Developments in Document Clustering

Improving Methods for Single-label Text Categorization

References

The Nature of Statistical Learning Theory

Machine learning

WordNet : an electronic lexical database

Term Weighting Approaches in Automatic Text Retrieval

Advances in kernel methods: support vector learning

Related Papers (5)

Machine learning in automated text categorization

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

A Comparative Study on Feature Selection in Text Categorization

Term Weighting Approaches in Automatic Text Retrieval

Thumbs up? Sentiment Classification using Machine Learning Techniques

Frequently Asked Questions (15)

Q1. What are the contributions in "Advanced structural representations for question classification and answer re-ranking" ?

Q2. What future works have the authors mentioned in the paper "Advanced structural representations for question classification and answer re-ranking" ?

Q3. What is the purpose of this paper?

Q4. How can the authors improve the representation of tree structures in learning machines?

Q5. How did the authors gather the results of the answer classification experiment?

Q6. What is the way to overcome this issue?

Q7. What is the advantage of parsing trees?

Q8. What are the phases of question processing?

Q9. What does the new kernel function do?

Q10. What is the reason why the results are higher than the results of [2]?

Q11. How are the performance of the multi-classifier and the individual binary classifiers measured?

Q12. How many sentences were used in the answer extraction phase?

Q13. What is the process of calculating the Jaccard similarity?

Q14. How many words can be used to improve the representation of the PB?

Q15. What is the answer classification system?