Answer Extraction as Sequence Tagging with Tree Edit Distance

Home
/
Papers
/
Answer Extraction as Sequence Tagging with Tree Edit Distance

Proceedings Article•

Answer Extraction as Sequence Tagging with Tree Edit Distance

Xuchen Yao¹, Benjamin Van Durme¹, Chris Callison-Burch², Peter Clark³•Institutions (3)

Johns Hopkins University¹, University of Pennsylvania², Vulcan Inc.³

01 Jun 2013-pp 858-867

TL;DR: A linear-chain Conditional Random Field based on pairs of questions and their possible answer sentences, learning the association between questions and answer types is constructed, casting answer extraction as an answer sequence tagging problem for the first time.

read less

Abstract: Our goal is to extract answers from preretrieved sentences for Question Answering (QA). We construct a linear-chain Conditional Random Field based on pairs of questions and their possible answer sentences, learning the association between questions and answer types. This casts answer extraction as an answer sequence tagging problem for the first time, where knowledge of shared structure between question and source sentence is incorporated through features based on Tree Edit Distance (TED). Our model is free of manually created question and answer templates, fast to run (processing 200 QA pairs per second excluding parsing time), and yields an F1 of 63.3% on a new public dataset based on prior TREC QA evaluations. The developed system is open-source, and includes an implementation of the TED model that is state of the art in the task of ranking QA pairs.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

WikiQA: A Challenge Dataset for Open-Domain Question Answering

[...]

Yi Yang¹, Wen-tau Yih², Christopher Meek²•Institutions (2)

Georgia Institute of Technology¹, Microsoft²

21 Sep 2015

TL;DR: The WIKIQA dataset is described, a new publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering, which is more than an order of magnitude larger than the previous dataset.

...read moreread less

Abstract: We describe the WIKIQA dataset, a new publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering. Most previous work on answer sentence selection focuses on a dataset created using the TREC-QA data, which includes editor-generated questions and candidate answer sentences selected by matching content words in the question. WIKIQA is constructed using a more natural process and is more than an order of magnitude larger than the previous dataset. In addition, the WIKIQA dataset also includes questions for which there are no correct sentences, enabling researchers to work on answer triggering, a critical component in any QA system. We compare several systems on the task of answer sentence selection on both datasets and also describe the performance of a system on the problem of answer triggering using the WIKIQA dataset.

...read moreread less

853 citations

Cites background from "Answer Extraction as Sequence Taggi..."

...Answer sentence selection is a crucial subtask of the open-domain question answering (QA) problem, with the goal of extracting answers from a set of pre-selected sentences (Heilman and Smith, 2010; Yao et al., 2013; Severyn and Moschitti, 2013)....
[...]

Proceedings Article•DOI•

Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks

[...]

Aliaksei Severyn¹, Alessandro Moschitti²•Institutions (2)

Google¹, Qatar Computing Research Institute²

09 Aug 2015

TL;DR: This paper presents a convolutional neural network architecture for reranking pairs of short texts, where the optimal representation of text pairs and a similarity function to relate them in a supervised way from the available training data are learned.

...read moreread less

Abstract: Learning a similarity function between pairs of objects is at the core of learning to rank approaches In information retrieval tasks we typically deal with query-document pairs, in question answering -- question-answer pairs However, before learning can take place, such pairs needs to be mapped from the original space of symbolic words into some feature space encoding various aspects of their relatedness, eg lexical, syntactic and semantic Feature engineering is often a laborious task and may require external knowledge sources that are not always available or difficult to obtain Recently, deep learning approaches have gained a lot of attention from the research community and industry for their ability to automatically learn optimal feature representation for a given task, while claiming state-of-the-art performance in many tasks in computer vision, speech recognition and natural language processing In this paper, we present a convolutional neural network architecture for reranking pairs of short texts, where we learn the optimal representation of text pairs and a similarity function to relate them in a supervised way from the available training data Our network takes only words in the input, thus requiring minimal preprocessing In particular, we consider the task of reranking short text pairs where elements of the pair are sentences We test our deep learning system on two popular retrieval tasks from TREC: Question Answering and Microblog Retrieval Our model demonstrates strong performance on the first task beating previous state-of-the-art systems by about 3\% absolute points in both MAP and MRR and shows comparable results on tweet reranking, while enjoying the benefits of no manual feature engineering and no additional syntactic parsers

...read moreread less

796 citations

Cites background or methods from "Answer Extraction as Sequence Taggi..."

...The model of Yao et al., 2013 [37] applies linear chain CRFs with features derived from TED to automatically learn associations between questions and candidate answers....
[...]
..., 2013 [37] applies linear chain CRFs with features derived from TED to automatically learn associations between questions and candidate answers....
[...]

Posted Content•

LSTM-based Deep Learning Models for Non-factoid Answer Selection

[...]

Ming Tan, Cicero Nogueira dos Santos, Bing Xiang, Bowen Zhou

12 Nov 2015-arXiv: Computation and Language

TL;DR: A general deep learning framework is applied for the answer selection task, which does not depend on manually defined features or linguistic tools, and is extended in two directions to define a more composite representation for questions and answers.

...read moreread less

Abstract: In this paper, we apply a general deep learning (DL) framework for the answer selection task, which does not depend on manually defined features or linguistic tools. The basic framework is to build the embeddings of questions and answers based on bidirectional long short-term memory (biLSTM) models, and measure their closeness by cosine similarity. We further extend this basic model in two directions. One direction is to define a more composite representation for questions and answers by combining convolutional neural network with the basic framework. The other direction is to utilize a simple but efficient attention mechanism in order to generate the answer representation according to the question context. Several variations of models are provided. The models are examined by two datasets, including TREC-QA and InsuranceQA. Experimental results demonstrate that the proposed models substantially outperform several strong baselines.

...read moreread less

442 citations

Cites background or methods from "Answer Extraction as Sequence Taggi..."

...Our LSTM implementation is similar to the one in (Graves et al., 2013) with minor 2The data is obtained from (Yao et al., 2013) http://cs.jhu.edu/˜xuchen/packages/ jacana-qa-naacl2013-data-results.tar.bz2 modification....
[...]
...Some work tried to fulfill the matching using minimal edit sequences between dependency parse trees (Heilman & Smith, 2010; Yao et al., 2013)....
[...]
...The data is obtained from (Yao et al., 2013) http://cs....
[...]

Posted Content•

Deep learning for answer sentence selection

[...]

Lei Yu, Karl Moritz Hermann, Phil Blunsom, Stephen Pulman

04 Dec 2014-arXiv: Computation and Language

TL;DR: This work proposes a novel approach to solving the answer sentence selection task via means of distributed representations, and learns to match questions with answers by considering their semantic encoding.

...read moreread less

Abstract: Answer sentence selection is the task of identifying sentences that contain the answer to a given question This is an important problem in its own right as well as in the larger context of open domain question answering We propose a novel approach to solving this task via means of distributed representations, and learn to match questions with answers by considering their semantic encoding This contrasts prior work on this task, which typically relies on classifiers with large numbers of hand-crafted syntactic and semantic features and various external resources Our approach does not require any feature engineering nor does it involve specialist linguistic data, making this model easily applicable to a wide range of domains and languages Experimental results on a standard benchmark dataset from TREC demonstrate that---despite its simplicity---our model matches state of the art performance on the answer sentence selection task

...read moreread less

410 citations

Cites background from "Answer Extraction as Sequence Taggi..."

...Another option is discriminative models over features produced from minimal edit sequences between dependency parse trees [8, 21]....
[...]
...[21] extended Heilman and Smith’s approach with the difference that they used dynamic programming to find the optimal tree edit sequences....
[...]

Proceedings Article•DOI•

A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering

[...]

Di Wang¹, Eric Nyberg¹•Institutions (1)

Carnegie Mellon University¹

01 Jul 2015

TL;DR: The proposed method uses a stacked bidirectional Long-Short Term Memory network to sequentially read words from question and answer sentences, and then outputs their relevance scores, which outperforms previous work which requires syntactic features and external knowledge resources.

...read moreread less

Abstract: In this paper, we present an approach that address the answer sentence selection problem for question answering. The proposed method uses a stacked bidirectional Long-Short Term Memory (BLSTM) network to sequentially read words from question and answer sentences, and then outputs their relevance scores. Unlike prior work, this approach does not require any syntactic parsing or external knowledge resources such as WordNet which may not be available in some domains or languages. The full system is based on a combination of the stacked BLSTM relevance model and keywords matching. The results of our experiments on a public benchmark dataset from TREC show that our system outperforms previous work which requires syntactic features and external knowledge resources.

...read moreread less

378 citations

Cites methods from "Answer Extraction as Sequence Taggi..."

...Later, tree kernel function together with a logistic regression model (Heilman and Smith, 2010) or Conditional Random Fields models (Wang and Manning, 2010; Yao et al., 2013) with extracted feature were adopted to learn the associations between question and answer....
[...]
...…full training dataset is no longer available from the website of the lead author of (Wang et al., 2007), we obtained this data re-released from Yao et al. (2013): http://cs.jhu.edu/˜xuchen/packages/ jacana-qa-naacl2013-data-results.tar.bz2 evaluation.3 Evaluation Metric Following previous…...
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

The WEKA data mining software: an update

[...]

Mark Hall, Eibe Frank¹, Geoffrey Holmes¹, Bernhard Pfahringer¹, Peter Reutemann¹, Ian H. Witten¹ - Show less +2 more•Institutions (1)

University of Waikato¹

16 Nov 2009-Sigkdd Explorations

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

...read moreread less

Abstract: More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on Source-Forge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

...read moreread less

19,603 citations

"Answer Extraction as Sequence Taggi..." refers methods in this paper

...These features were then used to train a logistic regression model using Weka (Hall et al., 2009)....
[...]

Proceedings Article•

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

[...]

John Lafferty¹, Andrew McCallum, Fernando Pereira•Institutions (1)

Carnegie Mellon University¹

28 Jun 2001

TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.

...read moreread less

Abstract: We present conditional random fields , a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models and stochastic grammars for such tasks, including the ability to relax strong independence assumptions made in those models. Conditional random fields also avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states. We present iterative parameter estimation algorithms for conditional random fields and compare the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.

...read moreread less

13,190 citations

"Answer Extraction as Sequence Taggi..." refers methods in this paper

...Besides local POS/NER/DEP features, at each token we need to inspect the entire input to connect the answer sentence with the question sentence through tree edits, drawing features from the question and the edit script, motivating the use of a linear-chain CRF model (Lafferty et al., 2001) over HMMs....
[...]
...We propose the use of a linear-chain Conditional Random Field (CRF) (Lafferty et al., 2001) in order to cast the problem as one of sequence tagging by labeling each token in a candidate sentence as either Beginning, Inside or Outside (BIO) of an answer....
[...]
...…local POS/NER/DEP features, at each token we need to inspect the entire input to connect the answer sentence with the question sentence through tree edits, drawing features from the question and the edit script, motivating the use of a linear-chain CRF model (Lafferty et al., 2001) over HMMs....
[...]

Probabilistic Models for Segmenting and Labeling Sequence Data

[...]

John Lafferty, Andrew McCallum, Fernando Pereira, Kevin Duh

01 Jan 2005

11,364 citations

Journal Article•DOI•

Building Watson: An Overview of the DeepQA Project

[...]

David A. Ferrucci¹, Eric W. Brown¹, Jennifer Chu-Carroll¹, James Fan¹, David C. Gondek¹, Aditya Kalyanpur¹, Adam Lally¹, J. William Murdock¹, Eric Nyberg², John M. Prager¹, Nico Schlaefer², Chris Welty¹ - Show less +8 more•Institutions (2)

IBM¹, Carnegie Mellon University²

28 Jul 2010-Ai Magazine

TL;DR: The results strongly suggest that DeepQA is an effective and extensible architecture that may be used as a foundation for combining, deploying, evaluating and advancing a wide range of algorithmic techniques to rapidly advance the field of QA.

...read moreread less

Abstract: IBM Research undertook a challenge to build a computer system that could compete at the human champion level in real time on the American TV Quiz show, Jeopardy! The extent of the challenge includes fielding a real-time automatic contestant on the show, not merely a laboratory exercise. The Jeopardy! Challenge helped us address requirements that led to the design of the DeepQA architecture and the implementation of Watson. After 3 years of intense research and development by a core team of about 20 researches, Watson is performing at human expert-levels in terms of precision, confidence and speed at the Jeopardy! Quiz show. Our results strongly suggest that DeepQA is an effective and extensible architecture that may be used as a foundation for combining, deploying, evaluating and advancing a wide range of algorithmic techniques to rapidly advance the field of QA.

...read moreread less

1,446 citations

"Answer Extraction as Sequence Taggi..." refers background in this paper

...The success of IBM’s Watson system for Question Answering (QA) (Ferrucci et al., 2010) has illustrated a continued public interest in this topic....
[...]

Journal Article•DOI•

Simple fast algorithms for the editing distance between trees and related problems

[...]

Kaizhong Zhang¹, Dennis Shasha•Institutions (1)

New York University¹

01 Dec 1989-SIAM Journal on Computing

TL;DR: Algorithms are designed to answer the following kinds of questions about trees: what is the distance between two trees, and the analogous question for prunings as for subtrees.

...read moreread less

Abstract: Ordered labeled trees are trees in which the left-to-right order among siblings is significant. The distance between two ordered trees is considered to be the weighted number of edit operations (in...

...read moreread less

1,367 citations