Coarse-to-Fine Question Answering for Long Documents

doi:10.18653/V1/P17-1020

Home
/
Papers
/
Coarse-to-Fine Question Answering for Long Documents

Proceedings Article•DOI•

Coarse-to-Fine Question Answering for Long Documents

Eunsol Choi¹, Daniel Hewlett², Jakob Uszkoreit², Illia Polosukhin², Alexandre Lacoste², Jonathan Berant² - Show less +2 more•Institutions (2)

University of Washington¹, Google²

01 Jul 2017-Vol. 1, pp 209-220

TL;DR: A framework for question answering that can efficiently scale to longer documents while maintaining or even improving performance of state-of-the-art models is presented and sentence selection is treated as a latent variable trained jointly from the answer only using reinforcement learning.

read less

Abstract: We present a framework for question answering that can efficiently scale to longer documents while maintaining or even improving performance of state-of-the-art models. While most successful approaches for reading comprehension rely on recurrent neural networks (RNNs), running them over long documents is prohibitively slow because it is difficult to parallelize over sequences. Inspired by how people first skim the document, identify relevant parts, and carefully read these parts to produce an answer, we combine a coarse, fast model for selecting relevant sentences and a more expensive RNN for producing the answer from those sentences. We treat sentence selection as a latent variable trained jointly from the answer only using reinforcement learning. Experiments demonstrate state-of-the-art performance on a challenging subset of the WikiReading dataset and on a new dataset, while speeding up the model by 3.5x-6.7x.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content•

Deep Reinforcement Learning: An Overview

[...]

Yuxi Li

25 Jan 2017-arXiv: Learning

TL;DR: This work discusses core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration, and important mechanisms for RL, including attention and memory, unsupervised learning, transfer learning, multi-agent RL, hierarchical RL, and learning to learn.

...read moreread less

Abstract: We give an overview of recent exciting achievements of deep reinforcement learning (RL). We discuss six core elements, six important mechanisms, and twelve applications. We start with background of machine learning, deep learning and reinforcement learning. Next we discuss core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration. After that, we discuss important mechanisms for RL, including attention and memory, unsupervised learning, transfer learning, multi-agent RL, hierarchical RL, and learning to learn. Then we discuss various applications of RL, including games, in particular, AlphaGo, robotics, natural language processing, including dialogue systems, machine translation, and text generation, computer vision, neural architecture design, business management, finance, healthcare, Industry 4.0, smart grid, intelligent transportation systems, and computer systems. We mention topics not reviewed yet, and list a collection of RL resources. After presenting a brief summary, we close with discussions. Please see Deep Reinforcement Learning, arXiv:1810.06339, for a significant update.

...read moreread less

935 citations

Proceedings Article•DOI•

Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting

[...]

Yen-Chun Chen¹, Mohit Bansal²•Institutions (2)

Microsoft¹, University of North Carolina at Chapel Hill²

28 May 2018

TL;DR: The authors proposed a sentence-level policy gradient method to bridge the non-differentiable computation between these two neural networks in a hierarchical way, which achieved state-of-the-art performance on the CNN/Daily Mail dataset.

...read moreread less

Abstract: Inspired by how humans summarize long documents, we propose an accurate and fast summarization model that first selects salient sentences and then rewrites them abstractively (i.e., compresses and paraphrases) to generate a concise overall summary. We use a novel sentence-level policy gradient method to bridge the non-differentiable computation between these two neural networks in a hierarchical way, while maintaining language fluency. Empirically, we achieve the new state-of-the-art on all metrics (including human evaluation) on the CNN/Daily Mail dataset, as well as significantly higher abstractiveness scores. Moreover, by first operating at the sentence-level and then the word-level, we enable parallel decoding of our neural generative model that results in substantially faster (10-20x) inference speed as well as 4x faster training convergence than previous long-paragraph encoder-decoder models. We also demonstrate the generalization of our model on the test-only DUC-2002 dataset, where we achieve higher scores than a state-of-the-art model.

...read moreread less

513 citations

Posted Content•

Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting

[...]

Yen-Chun Chen¹, Mohit Bansal²•Institutions (2)

Microsoft¹, University of North Carolina at Chapel Hill²

28 May 2018-arXiv: Computation and Language

TL;DR: An accurate and fast summarization model that first selects salient sentences and then rewrites them abstractively to generate a concise overall summary is proposed, which achieves the new state-of-the-art on all metrics on the CNN/Daily Mail dataset, as well as significantly higher abstractiveness scores.

...read moreread less

272 citations

Cites background from "Coarse-to-Fine Question Answering f..."

...In question answering (QA), Choi et al. (2017) extract one sentence and then generate the answer from the sentence’s vector representation with RL bridging....
[...]

Proceedings Article•DOI•

Transfer Learning in Natural Language Processing.

[...]

Sebastian Ruder¹, Matthew E. Peters², Swabha Swayamdipta², Thomas Wolf³•Institutions (3)

Google¹, Allen Institute for Artificial Intelligence², Central European University³

01 Jun 2019

TL;DR: Transfer learning as discussed by the authors is a set of methods that extend the classical supervised machine learning paradigm by leveraging data from additional domains or tasks to train a model with better generalization properties, which can be used for NLP tasks.

...read moreread less

Abstract: The classic supervised machine learning paradigm is based on learning in isolation, a single predictive model for a task using a single dataset. This approach requires a large number of training examples and performs best for well-defined and narrow tasks. Transfer learning refers to a set of methods that extend this approach by leveraging data from additional domains or tasks to train a model with better generalization properties. Over the last two years, the field of Natural Language Processing (NLP) has witnessed the emergence of several transfer learning methods and architectures which significantly improved upon the state-of-the-art on a wide range of NLP tasks. These improvements together with the wide availability and ease of integration of these methods are reminiscent of the factors that led to the success of pretrained word embeddings and ImageNet pretraining in computer vision, and indicate that these methods will likely become a common tool in the NLP landscape as well as an important research direction. We will present an overview of modern transfer learning methods in NLP, how models are pre-trained, what information the representations they learn capture, and review examples and case studies on how these models can be integrated and adapted in downstream NLP tasks.

...read moreread less

267 citations

Proceedings Article•

R 3 : Reinforced Ranker-Reader for Open-Domain Question Answering.

[...]

Shuohang Wang¹, Mo Yu², Xiaoxiao Guo², Zhiguo Wang², Tim Klinger², Wei Zhang³, Shiyu Chang², Gerald Tesauro², Bowen Zhou², Jing Jiang¹ - Show less +6 more•Institutions (3)

Singapore Management University¹, IBM², Anhui University³

01 Jan 2018

TL;DR: This paper proposes a new pipeline for open-domain QA with a Ranker component, which learns to rank retrieved passages in terms of likelihood of extracting the ground-truth answer to a given question, and proposes a novel method that jointly trains the Ranker along with an answer-extraction Reader model, based on reinforcement learning.

...read moreread less

Abstract: In recent years researchers have achieved considerable success applying neural network methods to question answering (QA). These approaches have achieved state of the art results in simplified closed-domain settings such as the SQuAD (Rajpurkar et al. 2016) dataset, which provides a preselected passage, from which the answer to a given question may be extracted. More recently, researchers have begun to tackle open-domain QA, in which the model is given a question and access to a large corpus (e.g., wikipedia) instead of a pre-selected passage (Chen et al. 2017a). This setting is more complex as it requires large-scale search for relevant passages by an information retrieval component, combined with a reading comprehension model that “reads” the passages to generate an answer to the question. Performance in this setting lags well behind closed-domain performance. In this paper, we present a novel open-domain QA system called Reinforced Ranker-Reader (R), based on two algorithmic innovations. First, we propose a new pipeline for open-domain QA with a Ranker component, which learns to rank retrieved passages in terms of likelihood of extracting the ground-truth answer to a given question. Second, we propose a novel method that jointly trains the Ranker along with an answer-extraction Reader model, based on reinforcement learning. We report extensive experimental results showing that our method significantly improves on the state of the art for multiple open-domain QA datasets. 2

...read moreread less

266 citations

Cites background or methods from "Coarse-to-Fine Question Answering f..."

...In comparison, Choi et al. 2017 has as its main aim to speed up the RC model in the single passage setting....
[...]
...Cheng and Lapata 2016 and Choi et al. 2017 proposed coarse-to-fine frameworks with an additional sentence selection step before the original wordlevel prediction for text summarization and reading comprehension, respectively....
[...]
...From the method-perspective, our work is most close to Choi et al. 2017’s work in terms of the usage of REINFORCE....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

Collapse

References

PDF

Open Access

More filters

Proceedings Article•

Adam: A Method for Stochastic Optimization

[...]

Diederik P. Kingma¹, Jimmy Ba²•Institutions (2)

University of Amsterdam¹, University of Toronto²

01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

...read moreread less

111,197 citations

"Coarse-to-Fine Question Answering f..." refers methods in this paper

...We use the Adam (Kingma and Ba, 2015) optimizer and TensorFlow framework (Abadi et al....
[...]

Proceedings Article•

Neural Machine Translation by Jointly Learning to Align and Translate

[...]

Dzmitry Bahdanau¹, Kyunghyun Cho², Yoshua Bengio²•Institutions (2)

Jacobs University Bremen¹, Université de Montréal²

01 Jan 2015

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Abstract: Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

...read moreread less

20,027 citations

"Coarse-to-Fine Question Answering f..." refers methods in this paper

...Soft Attention In the soft attention model (Bahdanau et al., 2015) we compute a weighted average of the tokens in the sentences according to p(s | x, d)....
[...]

Proceedings Article•DOI•

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

[...]

Kyunghyun Cho¹, Bart van Merriënboer², Caglar Gulcehre², Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio³, Yoshua Bengio⁴, Yoshua Bengio⁵ - Show less +5 more•Institutions (5)

Aalto University¹, Université de Montréal², École Polytechnique de Montréal³, Alcatel-Lucent⁴, AT&T⁵

01 Jan 2014

TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.

...read moreread less

Abstract: In this paper, we propose a novel neural network model called RNN Encoder‐ Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixedlength vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder‐Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.

...read moreread less

19,998 citations

Posted Content•

Neural Machine Translation by Jointly Learning to Align and Translate

[...]

Dzmitry Bahdanau¹, Kyunghyun Cho², Yoshua Bengio²•Institutions (2)

Jacobs University Bremen¹, Université de Montréal²

01 Sep 2014-arXiv: Computation and Language

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

14,077 citations

Proceedings Article•DOI•

Convolutional Neural Networks for Sentence Classification

[...]

Yoon Kim¹•Institutions (1)

New York University¹

25 Aug 2014

TL;DR: The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification, and are proposed to allow for the use of both task-specific and static vectors.

...read moreread less

Abstract: We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks. We show that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the architecture to allow for the use of both task-specific and static vectors. The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification.

...read moreread less

9,776 citations