Showing papers on "Question answering published in 2019"

PDF

Open Access

Posted Content•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

[...]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

23 Oct 2019-arXiv: Learning

TL;DR: This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

...read moreread less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

6,953 citations

Journal Article•DOI•

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

[...]

Jinhyuk Lee¹, Wonjin Yoon¹, Sungdong Kim², Donghyeon Kim¹, Sunkyu Kim¹, Chan Ho So¹, Jaewoo Kang¹ - Show less +3 more•Institutions (2)

Korea University¹, Naver Corporation²

25 Jan 2019-Bioinformatics

TL;DR: This article proposed BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora.

...read moreread less

Abstract: Motivation Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora. In this article, we investigate how the recently introduced pre-trained language model BERT can be adapted for biomedical corpora. Results We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts. Availability and implementation We make the pre-trained weights of BioBERT freely available at https://github.com/naver/biobert-pretrained, and the source code for fine-tuning BioBERT available at https://github.com/dmis-lab/biobert.

...read moreread less

2,680 citations

Proceedings Article•DOI•

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

[...]

Hao Tan¹, Mohit Bansal¹•Institutions (1)

University of North Carolina at Chapel Hill¹

20 Aug 2019

TL;DR: The LXMERT (Learning Cross-Modality Encoder Representations from Transformers) framework, a large-scale Transformer model that consists of three encoders, achieves the state-of-the-art results on two visual question answering datasets and shows the generalizability of the pre-trained cross-modality model.

...read moreread less

Abstract: Vision-and-language reasoning requires an understanding of visual concepts, language semantics, and, most importantly, the alignment and relationships between these two modalities. We thus propose the LXMERT (Learning Cross-Modality Encoder Representations from Transformers) framework to learn these vision-and-language connections. In LXMERT, we build a large-scale Transformer model that consists of three encoders: an object relationship encoder, a language encoder, and a cross-modality encoder. Next, to endow our model with the capability of connecting vision and language semantics, we pre-train the model with large amounts of image-and-sentence pairs, via five diverse representative pre-training tasks: masked language modeling, masked object prediction (feature regression and label classification), cross-modality matching, and image question answering. These tasks help in learning both intra-modality and cross-modality relationships. After fine-tuning from our pre-trained parameters, our model achieves the state-of-the-art results on two visual question answering datasets (i.e., VQA and GQA). We also show the generalizability of our pre-trained cross-modality model by adapting it to a challenging visual-reasoning task, NLVR2, and improve the previous best result by 22% absolute (54% to 76%). Lastly, we demonstrate detailed ablation studies to prove that both our novel model components and pre-training strategies significantly contribute to our strong results. Code and pre-trained models publicly available at: https://github.com/airsplay/lxmert

...read moreread less

1,729 citations

Journal Article•DOI•

Natural Questions: A Benchmark for Question Answering Research

[...]

Tom Kwiatkowski¹, Jennimaria Palomaki¹, Olivia Redfield¹, Michael Collins², Ankur P. Parikh¹, Chris Alberti¹, Danielle Epstein¹, Illia Polosukhin¹, Jacob Devlin¹, Kenton Lee¹, Kristina Toutanova¹, Llion Jones¹, Matthew Kelcey¹, Ming-Wei Chang¹, Andrew M. Dai¹, Jakob Uszkoreit¹, Quoc V. Le¹, Slav Petrov¹ - Show less +14 more•Institutions (2)

Google¹, Columbia University²

02 Aug 2019-Transactions of the Association for Computational Linguistics

TL;DR: The Natural Questions corpus, a question answering data set, is presented, introducing robust metrics for the purposes of evaluating question answering systems; demonstrating high human upper bounds on these metrics; and establishing baseline results using competitive methods drawn from related literature.

...read moreread less

Abstract: We present the Natural Questions corpus, a question answering data set. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a...

...read moreread less

1,618 citations

Posted Content•

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

[...]

Jiasen Lu¹, Dhruv Batra², Devi Parikh², Stefan Lee²•Institutions (2)

Salesforce.com¹, Georgia Institute of Technology²

06 Aug 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language, is presented, extending the popular BERT architecture to a multi-modal two-stream model, pro-cessing both visual and textual inputs in separate streams that interact through co-attentional transformer layers.

...read moreread less

Abstract: We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. We extend the popular BERT architecture to a multi-modal two-stream model, pro-cessing both visual and textual inputs in separate streams that interact through co-attentional transformer layers. We pretrain our model through two proxy tasks on the large, automatically collected Conceptual Captions dataset and then transfer it to multiple established vision-and-language tasks -- visual question answering, visual commonsense reasoning, referring expressions, and caption-based image retrieval -- by making only minor additions to the base architecture. We observe significant improvements across tasks compared to existing task-specific models -- achieving state-of-the-art on all four tasks. Our work represents a shift away from learning groundings between vision and language only as part of task training and towards treating visual grounding as a pretrainable and transferable capability.

...read moreread less

1,241 citations

Proceedings Article•

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

[...]

Jiasen Lu¹, Dhruv Batra², Devi Parikh², Stefan Lee²•Institutions (2)

Salesforce.com¹, Georgia Institute of Technology²

06 Aug 2019

TL;DR: The ViLBERT model as mentioned in this paper extends the BERT architecture to a multi-modal two-stream model, processing both visual and textual inputs in separate streams that interact through co-attentional transformer layers.

...read moreread less

Abstract: We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. We extend the popular BERT architecture to a multi-modal two-stream model, processing both visual and textual inputs in separate streams that interact through co-attentional transformer layers. We pretrain our model through two proxy tasks on the large, automatically collected Conceptual Captions dataset and then transfer it to multiple established vision-and-language tasks -- visual question answering, visual commonsense reasoning, referring expressions, and caption-based image retrieval -- by making only minor additions to the base architecture. We observe significant improvements across tasks compared to existing task-specific models -- achieving state-of-the-art on all four tasks. Our work represents a shift away from learning groundings between vision and language only as part of task training and towards treating visual grounding as a pretrainable and transferable capability.

...read moreread less

1,069 citations

Proceedings Article•

Unified Language Model Pre-training for Natural Language Understanding and Generation

[...]

Li Dong¹, Nan Yang¹, Wenhui Wang¹, Furu Wei¹, Xiaodong Liu², Yu Wang¹, Jianfeng Gao¹, Ming Zhou¹, Hsiao-Wuen Hon¹ - Show less +5 more•Institutions (2)

Microsoft¹, Edinburgh Napier University²

08 May 2019

TL;DR: UniLM as mentioned in this paper is a unified pre-trained language model that can be fine-tuned for both natural language understanding and generation tasks, achieving state-of-the-art results on five natural language generation datasets, including improving the CNN/DailyMail abstractive summarization ROUGE-L to 40.51 (2.04 absolute improvement).

...read moreread less

Abstract: This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks. The model is pre-trained using three types of language modeling tasks: unidirectional, bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on. UniLM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Moreover, UniLM achieves new state-of-the-art results on five natural language generation datasets, including improving the CNN/DailyMail abstractive summarization ROUGE-L to 40.51 (2.04 absolute improvement), the Gigaword abstractive summarization ROUGE-L to 35.75 (0.86 absolute improvement), the CoQA generative question answering F1 score to 82.5 (37.1 absolute improvement), the SQuAD question generation BLEU-4 to 22.12 (3.75 absolute improvement), and the DSTC7 document-grounded dialog response generation NIST-4 to 2.67 (human performance is 2.65). The code and pre-trained models are available at https://github.com/microsoft/unilm.

...read moreread less

1,019 citations

Posted Content•

Language Models as Knowledge Bases

[...]

Fabio Petroni¹, Tim Rocktäschel¹, Patrick S. H. Lewis¹, Anton Bakhtin¹, Yuxiang Wu², Alexander H. Miller¹, Sebastian Riedel¹ - Show less +3 more•Institutions (2)

Facebook¹, University College London²

03 Sep 2019-arXiv: Computation and Language

TL;DR: An in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models finds that BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge.

...read moreread less

Abstract: Recent progress in pretraining language models on large textual corpora led to a surge of improvements for downstream NLP tasks. Whilst learning linguistic knowledge, these models may also be storing relational knowledge present in the training data, and may be able to answer queries structured as "fill-in-the-blank" cloze statements. Language models have many advantages over structured knowledge bases: they require no schema engineering, allow practitioners to query about an open class of relations, are easy to extend to more data, and require no human supervision to train. We present an in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models. We find that (i) without fine-tuning, BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge, (ii) BERT also does remarkably well on open-domain question answering against a supervised baseline, and (iii) certain types of factual knowledge are learned much more readily than others by standard language model pretraining approaches. The surprisingly strong ability of these models to recall factual knowledge without any fine-tuning demonstrates their potential as unsupervised open-domain QA systems. The code to reproduce our analysis is available at this https URL.

...read moreread less

839 citations

Posted Content•

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

[...]

Weijie Su¹, Xizhou Zhu¹, Yue Cao¹, Bin Li², Lewei Lu², Furu Wei², Jifeng Dai² - Show less +3 more•Institutions (2)

University of Science and Technology of China¹, Microsoft²

22 Aug 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: A new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT), which adopts the simple yet powerful Transformer model as the backbone, and extends it to take both visual and linguistic embedded features as input.

...read moreread less

Abstract: We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short). VL-BERT adopts the simple yet powerful Transformer model as the backbone, and extends it to take both visual and linguistic embedded features as input. In it, each element of the input is either of a word from the input sentence, or a region-of-interest (RoI) from the input image. It is designed to fit for most of the visual-linguistic downstream tasks. To better exploit the generic representation, we pre-train VL-BERT on the massive-scale Conceptual Captions dataset, together with text-only corpus. Extensive empirical analysis demonstrates that the pre-training procedure can better align the visual-linguistic clues and benefit the downstream tasks, such as visual commonsense reasoning, visual question answering and referring expression comprehension. It is worth noting that VL-BERT achieved the first place of single model on the leaderboard of the VCR benchmark. Code is released at \url{this https URL}.

...read moreread less

822 citations

Journal Article•DOI•

CoQA: A Conversational Question Answering Challenge

[...]

Siva Reddy¹, Danqi Chen¹, Christopher D. Manning¹•Institutions (1)

Stanford University¹

29 May 2019-Transactions of the Association for Computational Linguistics

TL;DR: The CoQA dataset as mentioned in this paper contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains, and the answers are free-form text with their corresponding evidence highlighted in the passage.

...read moreread less

Abstract: Humans gather information through conversations involving a series of interconnected questions and answers. For machines to assist in information gathering, it is therefore essential to enable them to answer conversational questions. We introduce CoQA, a novel dataset for building Conversational Question Answering systems. Our dataset contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. The questions are conversational, and the answers are free-form text with their corresponding evidence highlighted in the passage. We analyze CoQA in depth and show that conversational questions have challenging phenomena not present in existing reading comprehension datasets, e.g., coreference and pragmatic reasoning. We evaluate strong dialogue and reading comprehension models on CoQA. The best system obtains an F1 score of 65.4%, which is 23.4 points behind human performance (88.8%), indicating there is ample room for improvement. We present CoQA as a challenge to the community at https://stanfordnlp.github.io/coqa

...read moreread less

720 citations

Proceedings Article•DOI•

Latent Retrieval for Weakly Supervised Open Domain Question Answering

[...]

Kenton Lee¹, Ming-Wei Chang¹, Kristina Toutanova¹•Institutions (1)

Google¹

01 Jun 2019

Abstract: Recent work on open domain question answering (QA) assumes strong supervision of the supporting evidence and/or assumes a blackbox information retrieval (IR) system to retrieve evidence candidates. We argue that both are suboptimal, since gold evidence is not always available, and QA is fundamentally different from IR. We show for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system. In this setting, evidence retrieval from all of Wikipedia is treated as a latent variable. Since this is impractical to learn from scratch, we pre-train the retriever with an Inverse Cloze Task. We evaluate on open versions of five QA datasets. On datasets where the questioner already knows the answer, a traditional IR system such as BM25 is sufficient. On datasets where a user is genuinely seeking an answer, we show that learned retrieval is crucial, outperforming BM25 by up to 19 points in exact match.

...read moreread less

Posted Content•

ERNIE: Enhanced Representation through Knowledge Integration

[...]

Yu Sun, Wang Shuohuan, Li Yukun, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Zhu Danxiang, Hao Tian, Hua Wu¹ - Show less +6 more•Institutions (1)

Baidu¹

19 Apr 2019-arXiv: Computation and Language

TL;DR: Experimental results show that ERNIE outperforms other baseline methods, achieving new state-of-the-art results on five Chinese natural language processing tasks including natural language inference, semantic similarity, named entity recognition, sentiment analysis and question answering.

...read moreread less

Abstract: We present a novel language representation model enhanced by knowledge called ERNIE (Enhanced Representation through kNowledge IntEgration). Inspired by the masking strategy of BERT, ERNIE is designed to learn language representation enhanced by knowledge masking strategies, which includes entity-level masking and phrase-level masking. Entity-level strategy masks entities which are usually composed of multiple words.Phrase-level strategy masks the whole phrase which is composed of several words standing together as a conceptual unit.Experimental results show that ERNIE outperforms other baseline methods, achieving new state-of-the-art results on five Chinese natural language processing tasks including natural language inference, semantic similarity, named entity recognition, sentiment analysis and question answering. We also demonstrate that ERNIE has more powerful knowledge inference capacity on a cloze test.

...read moreread less

Proceedings Article•DOI•

Language Models as Knowledge Bases

[...]

Fabio Petroni¹, Tim Rocktäschel¹, Patrick S. H. Lewis¹, Anton Bakhtin¹, Yuxiang Wu², Alexander H. Miller¹, Sebastian Riedel¹ - Show less +3 more•Institutions (2)

Facebook¹, University College London²

01 Sep 2019

TL;DR: This article presented an in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models.

...read moreread less

Proceedings Article•DOI•

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

[...]

Drew A. Hudson¹, Christopher D. Manning¹•Institutions (1)

Stanford University¹

15 Jun 2019

TL;DR: GQA as discussed by the authors is a dataset for real-world visual reasoning and compositional question answering, which leverages Visual Genome scene graph structures to create 22M diverse reasoning questions, which all come with functional programs that represent their semantics.

...read moreread less

Abstract: We introduce GQA, a new dataset for real-world visual reasoning and compositional question answering, seeking to address key shortcomings of previous VQA datasets. We have developed a strong and robust question engine that leverages Visual Genome scene graph structures to create 22M diverse reasoning questions, which all come with functional programs that represent their semantics. We use the programs to gain tight control over the answer distribution and present a new tunable smoothing technique to mitigate question biases. Accompanying the dataset is a suite of new metrics that evaluate essential qualities such as consistency, grounding and plausibility. A careful analysis is performed for baselines as well as state-of-the-art models, providing fine-grained results for different question types and topologies. Whereas a blind LSTM obtains a mere 42.1%, and strong VQA models achieve 54.1%, human performance tops at 89.3%, offering ample opportunity for new research to explore. We hope GQA will provide an enabling resource for the next generation of models with enhanced robustness, improved consistency, and deeper semantic understanding of vision and language.

...read moreread less

Posted Content•

The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision

[...]

Jiayuan Mao¹, Chuang Gan², Pushmeet Kohli³, Joshua B. Tenenbaum², Jiajun Wu² - Show less +1 more•Institutions (3)

Tsinghua University¹, Massachusetts Institute of Technology², Google³

26 Apr 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: The Neuro-Symbolic Concept Learner (NS-CL), a model that learns visual concepts, words, and semantic parsing of sentences without explicit supervision on any of them; instead, the model learns by simply looking at images and reading paired questions and answers.

...read moreread less

Abstract: We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that learns visual concepts, words, and semantic parsing of sentences without explicit supervision on any of them; instead, our model learns by simply looking at images and reading paired questions and answers. Our model builds an object-based scene representation and translates sentences into executable, symbolic programs. To bridge the learning of two modules, we use a neuro-symbolic reasoning module that executes these programs on the latent scene representation. Analogical to human concept learning, the perception module learns visual concepts based on the language description of the object being referred to. Meanwhile, the learned visual concepts facilitate learning new words and parsing new sentences. We use curriculum learning to guide the searching over the large compositional space of images and language. Extensive experiments demonstrate the accuracy and efficiency of our model on learning visual concepts, word representations, and semantic parsing of sentences. Further, our method allows easy generalization to new object attributes, compositions, language concepts, scenes and questions, and even new program domains. It also empowers applications including visual question answering and bidirectional image-text retrieval.

...read moreread less

Posted Content•

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

[...]

Drew A. Hudson¹, Christopher D. Manning¹•Institutions (1)

Stanford University¹

25 Feb 2019-arXiv: Computation and Language

TL;DR: GQA as mentioned in this paper is a dataset for real-world visual reasoning and compositional question answering, which leverages scene graph structures to create 22M diverse reasoning questions, all of which come with functional programs that represent their semantics.

...read moreread less

Abstract: We introduce GQA, a new dataset for real-world visual reasoning and compositional question answering, seeking to address key shortcomings of previous VQA datasets. We have developed a strong and robust question engine that leverages scene graph structures to create 22M diverse reasoning questions, all come with functional programs that represent their semantics. We use the programs to gain tight control over the answer distribution and present a new tunable smoothing technique to mitigate question biases. Accompanying the dataset is a suite of new metrics that evaluate essential qualities such as consistency, grounding and plausibility. An extensive analysis is performed for baselines as well as state-of-the-art models, providing fine-grained results for different question types and topologies. Whereas a blind LSTM obtains mere 42.1%, and strong VQA models achieve 54.1%, human performance tops at 89.3%, offering ample opportunity for new research to explore. We strongly hope GQA will provide an enabling resource for the next generation of models with enhanced robustness, improved consistency, and deeper semantic understanding for images and language.

...read moreread less

Proceedings Article•DOI•

Deep Modular Co-Attention Networks for Visual Question Answering

[...]

Zhou Yu¹, Jun Yu, Yuhao Cui¹, Dacheng Tao², Qi Tian³ - Show less +1 more•Institutions (3)

Hangzhou Dianzi University¹, University of Sydney², Huawei³

15 Jun 2019

TL;DR: In this article, a modular co-attention network (MCAN) is proposed, which consists of Modular Co-Attention (MCA) layers cascaded in depth.

...read moreread less

Abstract: Visual Question Answering (VQA) requires a fine-grained and simultaneous understanding of both the visual content of images and the textual content of questions. Therefore, designing an effective `co-attention' model to associate key words in questions with key objects in images is central to VQA performance. So far, most successful attempts at co-attention learning have been achieved by using shallow models, and deep co-attention models show little improvement over their shallow counterparts. In this paper, we propose a deep Modular Co-Attention Network (MCAN) that consists of Modular Co-Attention (MCA) layers cascaded in depth. Each MCA layer models the self-attention of questions and images, as well as the question-guided-attention of images jointly using a modular composition of two basic attention units. We quantitatively and qualitatively evaluate MCAN on the benchmark VQA-v2 dataset and conduct extensive ablation studies to explore the reasons behind MCAN's effectiveness. Experimental results demonstrate that MCAN significantly outperforms the previous state-of-the-art. Our best single model delivers 70.63% overall accuracy on the test-dev set.

...read moreread less

Proceedings Article•DOI•

Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence

[...]

Chi Sun¹, Luyao Huang¹, Xipeng Qiu¹•Institutions (1)

Fudan University¹

01 Jun 2019

TL;DR: This paper constructs an auxiliary sentence from the aspect and converts ABSA to a sentence-pair classification task, such as question answering (QA) and natural language inference (NLI), and fine-tune the pre-trained model from BERT.

...read moreread less

Abstract: Aspect-based sentiment analysis (ABSA), which aims to identify fine-grained opinion polarity towards a specific aspect, is a challenging subtask of sentiment analysis (SA). In this paper, we construct an auxiliary sentence from the aspect and convert ABSA to a sentence-pair classification task, such as question answering (QA) and natural language inference (NLI). We fine-tune the pre-trained model from BERT and achieve new state-of-the-art results on SentiHood and SemEval-2014 Task 4 datasets. The source codes are available at https://github.com/HSLCY/ABSA-BERT-pair.

...read moreread less

Posted Content•

Unified Language Model Pre-training for Natural Language Understanding and Generation

[...]

Li Dong¹, Nan Yang¹, Wenhui Wang¹, Furu Wei¹, Xiaodong Liu¹, Yu Wang¹, Jianfeng Gao¹, Ming Zhou¹, Hsiao-Wuen Hon¹ - Show less +5 more•Institutions (1)

Microsoft¹

08 May 2019-arXiv: Computation and Language

TL;DR: A new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks that compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks.

...read moreread less

Proceedings Article•DOI•

CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge

[...]

Alon Talmor¹, Jonathan Herzig², Nicholas Lourie¹, Jonathan Berant³•Institutions (3)

Allen Institute for Artificial Intelligence¹, IBM², Tel Aviv University³

01 Jun 2019

TL;DR: In this article, the authors present commonsenseQA, a dataset for commonsense question answering with prior knowledge, where workers are asked to create multiple-choice questions with complex semantics that often require prior knowledge.

...read moreread less

Abstract: When answering a question, people often draw upon their rich world knowledge in addition to the particular context. Recent work has focused primarily on answering questions given some relevant document or context, and required very little general background. To investigate question answering with prior knowledge, we present CommonsenseQA: a challenging new dataset for commonsense question answering. To capture common sense beyond associations, we extract from ConceptNet (Speer et al., 2017) multiple target concepts that have the same semantic relation to a single source concept. Crowd-workers are asked to author multiple-choice questions that mention the source concept and discriminate in turn between each of the target concepts. This encourages workers to create questions with complex semantics that often require prior knowledge. We create 12,247 questions through this procedure and demonstrate the difficulty of our task with a large number of strong baselines. Our best baseline is based on BERT-large (Devlin et al., 2018) and obtains 56% accuracy, well below human performance, which is 89%.

...read moreread less

Proceedings Article•DOI•

Knowledge Graph Embedding Based Question Answering

[...]

Xiao Huang¹, Jingyuan Zhang¹, Dingcheng Li¹, Ping Li¹•Institutions (1)

Baidu¹

30 Jan 2019

TL;DR: An effective Knowledge Embedding based Question Answering (KEQA) framework that focuses on answering the most common types of questions, i.e., simple questions, in which each question could be answered by the machine straightforwardly if its single head entity and single predicate are correctly identified.

...read moreread less

Abstract: Question answering over knowledge graph (QA-KG) aims to use facts in the knowledge graph (KG) to answer natural language questions. It helps end users more efficiently and more easily access the substantial and valuable knowledge in the KG, without knowing its data structures. QA-KG is a nontrivial problem since capturing the semantic meaning of natural language is difficult for a machine. Meanwhile, many knowledge graph embedding methods have been proposed. The key idea is to represent each predicate/entity as a low-dimensional vector, such that the relation information in the KG could be preserved. The learned vectors could benefit various applications such as KG completion and recommender systems. In this paper, we explore to use them to handle the QA-KG problem. However, this remains a challenging task since a predicate could be expressed in different ways in natural language questions. Also, the ambiguity of entity names and partial names makes the number of possible answers large. To bridge the gap, we propose an effective Knowledge Embedding based Question Answering (KEQA) framework. We focus on answering the most common types of questions, i.e., simple questions, in which each question could be answered by the machine straightforwardly if its single head entity and single predicate are correctly identified. To answer a simple question, instead of inferring its head entity and predicate directly, KEQA targets at jointly recovering the question's head entity, predicate, and tail entity representations in the KG embedding spaces. Based on a carefully-designed joint distance metric, the three learned vectors' closest fact in the KG is returned as the answer. Experiments on a widely-adopted benchmark demonstrate that the proposed KEQA outperforms the state-of-the-art QA-KG methods.

...read moreread less

Proceedings Article•DOI•

End-to-End Open-Domain Question Answering with BERTserini

[...]

Wei Yang¹, Yuqing Xie¹, Aileen Lin, Xingyu Li, Luchen Tan¹, Kun Xiong, Ming Li², Jimmy Lin¹ - Show less +4 more•Institutions (2)

University of Waterloo¹, California State University, Fresno²

01 Feb 2019

TL;DR: In this paper, an end-to-end question answering system that integrates BERT with the open-source Anserini information retrieval toolkit is presented, which integrates best practices from IR with a BERT-based reader to identify answers from a large corpus of Wikipedia articles.

...read moreread less

Abstract: We demonstrate an end-to-end question answering system that integrates BERT with the open-source Anserini information retrieval toolkit. In contrast to most question answering and reading comprehension models today, which operate over small amounts of input text, our system integrates best practices from IR with a BERT-based reader to identify answers from a large corpus of Wikipedia articles in an end-to-end fashion. We report large improvements over previous results on a standard benchmark test collection, showing that fine-tuning pretrained BERT with SQuAD is sufficient to achieve high accuracy in identifying answer spans.

...read moreread less

Posted Content•

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

[...]

Hao Tan¹, Mohit Bansal¹•Institutions (1)

University of North Carolina at Chapel Hill¹

20 Aug 2019-arXiv: Computation and Language

TL;DR: LXMERT as mentioned in this paper proposes a large-scale Transformer model that consists of three encoders: an object relationship encoder, a language encoder and a cross-modality encoder.

...read moreread less

Abstract: Vision-and-language reasoning requires an understanding of visual concepts, language semantics, and, most importantly, the alignment and relationships between these two modalities. We thus propose the LXMERT (Learning Cross-Modality Encoder Representations from Transformers) framework to learn these vision-and-language connections. In LXMERT, we build a large-scale Transformer model that consists of three encoders: an object relationship encoder, a language encoder, and a cross-modality encoder. Next, to endow our model with the capability of connecting vision and language semantics, we pre-train the model with large amounts of image-and-sentence pairs, via five diverse representative pre-training tasks: masked language modeling, masked object prediction (feature regression and label classification), cross-modality matching, and image question answering. These tasks help in learning both intra-modality and cross-modality relationships. After fine-tuning from our pre-trained parameters, our model achieves the state-of-the-art results on two visual question answering datasets (i.e., VQA and GQA). We also show the generalizability of our pre-trained cross-modality model by adapting it to a challenging visual-reasoning task, NLVR2, and improve the previous best result by 22% absolute (54% to 76%). Lastly, we demonstrate detailed ablation studies to prove that both our novel model components and pre-training strategies significantly contribute to our strong results; and also present several attention visualizations for the different encoders. Code and pre-trained models publicly available at: this https URL

...read moreread less

Proceedings Article•DOI•

Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases

[...]

Christopher Clark¹, Mark Yatskar², Luke Zettlemoyer³•Institutions (3)

Allen Institute for Artificial Intelligence¹, University of Washington², Princeton University³

01 Sep 2019

TL;DR: This paper trains a naive model that makes predictions exclusively based on dataset biases, and a robust model as part of an ensemble with the naive one in order to encourage it to focus on other patterns in the data that are more likely to generalize.

...read moreread less

Abstract: State-of-the-art models often make use of superficial patterns in the data that do not generalize well to out-of-domain or adversarial settings. For example, textual entailment models often learn that particular key words imply entailment, irrespective of context, and visual question answering models learn to predict prototypical answers, without considering evidence in the image. In this paper, we show that if we have prior knowledge of such biases, we can train a model to be more robust to domain shift. Our method has two stages: we (1) train a naive model that makes predictions exclusively based on dataset biases, and (2) train a robust model as part of an ensemble with the naive one in order to encourage it to focus on other patterns in the data that are more likely to generalize. Experiments on five datasets with out-of-domain test sets show significantly improved robustness in all settings, including a 12 point gain on a changing priors visual question answering dataset and a 9 point gain on an adversarial question answering test set.

...read moreread less

Proceedings Article•DOI•

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

[...]

Kenneth Marino¹, Mohammad Rastegari², Ali Farhadi², Roozbeh Mottaghi²•Institutions (2)

Carnegie Mellon University¹, Allen Institute for Artificial Intelligence²

15 Jun 2019

TL;DR: Recently, this paper proposed the OK-VQA dataset, which includes more than 14,000 questions that require external knowledge to answer and showed that the performance of state-of-the-art VQA models degrades drastically in this new setting.

...read moreread less

Abstract: Visual Question Answering (VQA) in its ideal form lets us study reasoning in the joint space of vision and language and serves as a proxy for the AI task of scene understanding. However, most VQA benchmarks to date are focused on questions such as simple counting, visual attributes, and object detection that do not require reasoning or knowledge beyond what is in the image. In this paper, we address the task of knowledge-based visual question answering and provide a benchmark, called OK-VQA, where the image content is not sufficient to answer the questions, encouraging methods that rely on external knowledge resources. Our new dataset includes more than 14,000 questions that require external knowledge to answer. We show that the performance of the state-of-the-art VQA models degrades drastically in this new setting. Our analysis shows that our knowledge-based VQA task is diverse, difficult, and large compared to previous knowledge-based VQA datasets. We hope that this dataset enables researchers to open up new avenues for research in this domain.

...read moreread less

Proceedings Article•DOI•

Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering

[...]

Peng Gao¹, Zhengkai Jiang¹, Haoxuan You², Pan Lu³, Steven C. H. Hoi², Xiaogang Wang⁴, Hongsheng Li¹ - Show less +3 more•Institutions (4)

The Chinese University of Hong Kong¹, Tsinghua University², Chinese Academy of Sciences³, Southern Methodist University⁴

15 Jun 2019

TL;DR: Zhang et al. as discussed by the authors propose a novel method of dynamically fuse multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities.

...read moreread less

Abstract: Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fuse multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities. It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering. We also show that, the proposed dynamic intra modality attention flow conditioned on the other modality can dynamically modulate the intra-modality attention of the current modality, which is vital for multimodality feature fusion. Experimental evaluations on the VQA 2.0 dataset show that the proposed method achieves the state-of-the-art VQA performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method.

...read moreread less

Proceedings Article•DOI•

Entity-Relation Extraction as Multi-Turn Question Answering

[...]

Xiaoya Li, Fan Yin, Zijun Sun, Xiayu Li, Arianna Yuan¹, Duo Chai, Mingxin Zhou, Jiwei Li² - Show less +4 more•Institutions (2)

Stanford University¹, Zhejiang University²

01 Jul 2019

TL;DR: This article cast the task of entity-relation extraction as a multi-turn question answering problem, i.e., the extraction of entities and elations is transformed to identifying answer spans from the context.

...read moreread less

Abstract: In this paper, we propose a new paradigm for the task of entity-relation extraction. We cast the task as a multi-turn question answering problem, i.e., the extraction of entities and elations is transformed to the task of identifying answer spans from the context. This multi-turn QA formalization comes with several key advantages: firstly, the question query encodes important information for the entity/relation class we want to identify; secondly, QA provides a natural way of jointly modeling entity and relation; and thirdly, it allows us to exploit the well developed machine reading comprehension (MRC) models. Experiments on the ACE and the CoNLL04 corpora demonstrate that the proposed paradigm significantly outperforms previous best models. We are able to obtain the state-of-the-art results on all of the ACE04, ACE05 and CoNLL04 datasets, increasing the SOTA results on the three datasets to 49.6 (+1.2), 60.3 (+0.7) and 69.2 (+1.4), respectively. Additionally, we construct and will release a newly developed dataset RESUME, which requires multi-step reasoning to construct entity dependencies, as opposed to the single-step dependency extraction in the triplet exaction in previous datasets. The proposed multi-turn QA model also achieves the best performance on the RESUME dataset.

...read moreread less

Posted Content•

Document Expansion by Query Prediction.

[...]

Rodrigo Nogueira, Wei Yang, Jimmy Lin, Kyunghyun Cho

17 Apr 2019-arXiv: Information Retrieval

TL;DR: A simple method that predicts which queries will be issued for a given document and then expands it with those predictions with a vanilla sequence-to-sequence model, trained using datasets consisting of pairs of query and relevant documents is proposed.

...read moreread less

Abstract: One technique to improve the retrieval effectiveness of a search engine is to expand documents with terms that are related or representative of the documents' content.From the perspective of a question answering system, this might comprise questions the document can potentially answer. Following this observation, we propose a simple method that predicts which queries will be issued for a given document and then expands it with those predictions with a vanilla sequence-to-sequence model, trained using datasets consisting of pairs of query and relevant documents. By combining our method with a highly-effective re-ranking component, we achieve the state of the art in two retrieval tasks. In a latency-critical regime, retrieval results alone (without re-ranking) approach the effectiveness of more computationally expensive neural re-rankers but are much faster.

...read moreread less

Proceedings Article•DOI•

ELI5: Long Form Question Answering

[...]

Angela Fan¹, Yacine Jernite¹, Ethan Perez¹, David Grangier², Jason Weston¹, Michael Auli¹ - Show less +2 more•Institutions (2)

Facebook¹, Google²

01 Jul 2019

TL;DR: This work introduces the first large-scale corpus for long form question answering, a task requiring elaborate and in-depth answers to open-ended questions, and shows that an abstractive model trained with a multi-task objective outperforms conventional Seq2Seq, language modeling, as well as a strong extractive baseline.

...read moreread less

Abstract: We introduce the first large-scale corpus for long form question answering, a task requiring elaborate and in-depth answers to open-ended questions. The dataset comprises 270K threads from the Reddit forum “Explain Like I’m Five” (ELI5) where an online community provides answers to questions which are comprehensible by five year olds. Compared to existing datasets, ELI5 comprises diverse questions requiring multi-sentence answers. We provide a large set of web documents to help answer the question. Automatic and human evaluations show that an abstractive model trained with a multi-task objective outperforms conventional Seq2Seq, language modeling, as well as a strong extractive baseline.However, our best model is still far from human performance since raters prefer gold responses in over 86% of cases, leaving ample opportunity for future improvement.

...read moreread less

Proceedings Article•DOI•

PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text

[...]

Haitian Sun¹, Tania Bedrax-Weiss¹, William W. Cohen¹•Institutions (1)

Google¹

21 Apr 2019

TL;DR: PullNet is described, an integrated framework for learning what to retrieve and reasoning with this heterogeneous information to find the best answer in an open-domain question answering setting.

...read moreread less

Abstract: We consider open-domain question answering (QA) where answers are drawn from either a corpus, a knowledge base (KB), or a combination of both of these. We focus on a setting in which a corpus is supplemented with a large but incomplete KB, and on questions that require non-trivial (e.g., “multi-hop”) reasoning. We describe PullNet, an integrated framework for (1) learning what to retrieve and (2) reasoning with this heterogeneous information to find the best answer. PullNet uses an iterative process to construct a question-specific subgraph that contains information relevant to the question. In each iteration, a graph convolutional network (graph CNN) is used to identify subgraph nodes that should be expanded using retrieval (or “pull”) operations on the corpus and/or KB. After the subgraph is complete, another graph CNN is used to extract the answer from the subgraph. This retrieve-and-reason process allows us to answer multi-hop questions using large KBs and corpora. PullNet is weakly supervised, requiring question-answer pairs but not gold inference paths. Experimentally PullNet improves over the prior state-of-the art, and in the setting where a corpus is used with incomplete KB these improvements are often dramatic. PullNet is also often superior to prior systems in a KB-only setting or a text-only setting.

...read moreread less

Collapse