The NarrativeQA Reading Comprehension Challenge
Tomáš Kočiský,Jonathan Schwarz,Phil Blunsom,Chris Dyer,Karl Moritz Hermann,Gábor Melis,Edward Grefenstette +6 more
TLDR
A new dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts are presented, designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience.Abstract:
Reading comprehension (RC)—in contrast to information retrieval—requires integrating information and reasoning about events, entities, and their relations across a full document. Question answering...read more
Citations
More filters
Journal ArticleDOI
Natural Questions: A Benchmark for Question Answering Research
Tom Kwiatkowski,Jennimaria Palomaki,Olivia Redfield,Michael Collins,Ankur P. Parikh,Chris Alberti,Danielle Epstein,Illia Polosukhin,Jacob Devlin,Kenton Lee,Kristina Toutanova,Llion Jones,Matthew Kelcey,Ming-Wei Chang,Andrew M. Dai,Jakob Uszkoreit,Quoc V. Le,Slav Petrov +17 more
TL;DR: The Natural Questions corpus, a question answering data set, is presented, introducing robust metrics for the purposes of evaluating question answering systems; demonstrating high human upper bounds on these metrics; and establishing baseline results using competitive methods drawn from related literature.
Journal ArticleDOI
CoQA: A Conversational Question Answering Challenge
TL;DR: The CoQA dataset as mentioned in this paper contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains, and the answers are free-form text with their corresponding evidence highlighted in the passage.
Proceedings ArticleDOI
QuAC: Question Answering in Context
Eunsol Choi,He He,Mohit Iyyer,Mohit Iyyer,Mark Yatskar,Wen-tau Yih,Yejin Choi,Yejin Choi,Percy Liang,Luke Zettlemoyer +9 more
TL;DR: QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context, as it shows in a detailed qualitative evaluation.
Proceedings ArticleDOI
UNIFIEDQA: Crossing Format Boundaries with a Single QA System
Daniel Khashabi,Sewon Min,Tushar Khot,Ashish Sabharwal,Oyvind Tafjord,Peter Clark,Hannaneh Hajishirzi +6 more
TL;DR: This work uses the latest advances in language modeling to build a single pre-trained QA model, UNIFIEDQA, that performs well across 19 QA datasets spanning 4 diverse formats, and results in a new state of the art on 10 factoid and commonsense question answering datasets.
Posted Content
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
Gautier Izacard,Edouard Grave +1 more
TL;DR: Interestingly, it is observed that the performance of this method significantly improves when increasing the number of retrieved passages, evidence that sequence-to-sequence models offers a flexible framework to efficiently aggregate and combine evidence from multiple passages.
References
More filters
Journal ArticleDOI
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings ArticleDOI
Glove: Global Vectors for Word Representation
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Proceedings ArticleDOI
Bleu: a Method for Automatic Evaluation of Machine Translation
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Proceedings Article
Sequence to Sequence Learning with Neural Networks
TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.
Proceedings Article
ROUGE: A Package for Automatic Evaluation of Summaries
TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.