Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content•

Towards Zero-Shot Multilingual Synthetic Question and Answer Generation for Cross-Lingual Reading Comprehension

[...]

Siamak Shakeri¹, Noah Constant¹, Mihir Kale¹, Linting Xue¹•Institutions (1)

Google¹

22 Oct 2020-arXiv: Computation and Language

TL;DR: This article proposed a method to generate multilingual question and answer pairs on a large scale through the use of a single generative model, which can be used to improve the zero-shot performance of multilingual QA models on target languages.

...read moreread less

Abstract: We propose a simple method to generate multilingual question and answer pairs on a large scale through the use of a single generative model. These synthetic samples can be used to improve the zero-shot performance of multilingual QA models on target languages. Our proposed multi-task training of the generative model only requires the labeled training samples in English, thus removing the need for such samples in the target languages, making it applicable to far more languages than those with labeled data. Human evaluations indicate the majority of such samples are grammatically correct and sensible. Experimental results show our proposed approach can achieve large gains on the XQuAD dataset, reducing the gap between zero-shot and supervised performance of smaller QA models on various languages.

...read moreread less

1 citations

Posted Content•

LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval

[...]

Siqi Sun¹, Yen-Chun Chen², Linjie Li², Shuohang Wang², Yuwei Fang², Jingjing Liu² - Show less +2 more•Institutions (2)

Toyota Technological Institute at Chicago¹, Microsoft²

16 Mar 2021-arXiv: Computation and Language

TL;DR: LightningDOT as mentioned in this paper pre-trained on three novel learning objectives, extracting feature indexes offline, and employing instant dot-product matching with further re-ranking, which significantly speeds up retrieval process.

...read moreread less

Abstract: Multimodal pre-training has propelled great advancement in vision-and-language research. These large-scale pre-trained models, although successful, fatefully suffer from slow inference speed due to enormous computation cost mainly from cross-modal attention in Transformer architecture. When applied to real-life applications, such latency and computation demand severely deter the practical use of pre-trained models. In this paper, we study Image-text retrieval (ITR), the most mature scenario of V+L application, which has been widely studied even prior to the emergence of recent pre-trained models. We propose a simple yet highly effective approach, LightningDOT that accelerates the inference time of ITR by thousands of times, without sacrificing accuracy. LightningDOT removes the time-consuming cross-modal attention by pre-training on three novel learning objectives, extracting feature indexes offline, and employing instant dot-product matching with further re-ranking, which significantly speeds up retrieval process. In fact, LightningDOT achieves new state of the art across multiple ITR benchmarks such as Flickr30k, COCO and Multi30K, outperforming existing pre-trained models that consume 1000x magnitude of computational hours. Code and pre-training checkpoints are available at this https URL.

...read moreread less

1 citations

Posted Content•

NaturalProofs: Mathematical Theorem Proving in Natural Language

[...]

Sean Welleck¹, Jiacheng Liu, Ronan Le Bras², Hannaneh Hajishirzi¹, Yejin Choi¹, Kyunghyun Cho³ - Show less +2 more•Institutions (3)

University of Washington¹, Allen Institute for Artificial Intelligence², New York University³

24 Mar 2021-arXiv: Information Retrieval

TL;DR: NaturalProofs as discussed by the authors is a multi-domain corpus of mathematical statements and their proofs written in natural mathematical language, allowing for evaluating both in-distribution and zero-shot generalization.

...read moreread less

Abstract: Understanding and creating mathematics using natural mathematical language - the mixture of symbolic and natural language used by humans - is a challenging and important problem for driving progress in machine learning. As a step in this direction, we develop NaturalProofs, a multi-domain corpus of mathematical statements and their proofs, written in natural mathematical language. NaturalProofs unifies broad coverage, deep coverage, and low-resource mathematical sources, allowing for evaluating both in-distribution and zero-shot generalization. Using NaturalProofs, we benchmark strong neural methods on mathematical reference retrieval and generation tasks which test a system's ability to determine key results that appear in a proof. Large-scale sequence models show promise compared to classical information retrieval methods, yet their performance and out-of-domain generalization leave substantial room for improvement. NaturalProofs opens many avenues for research on challenging mathematical tasks.

...read moreread less

1 citations

Posted Content•

Towards Transparent Interactive Semantic Parsing via Step-by-Step Correction.

[...]

Lingbo Mo, Ashley Lewis, Huan Sun, Michael White¹•Institutions (1)

Ohio State University¹

15 Oct 2021-arXiv: Computation and Language

TL;DR: The authors investigated an interactive semantic parsing framework that explains the predicted logical form step by step in natural language and enables the user to make corrections through natural-language feedback for individual steps, aiming to increase the transparency of the parsing process and help the user appropriately trust the final answer.

...read moreread less

Abstract: Existing studies on semantic parsing focus primarily on mapping a natural-language utterance to a corresponding logical form in one turn. However, because natural language can contain a great deal of ambiguity and variability, this is a difficult challenge. In this work, we investigate an interactive semantic parsing framework that explains the predicted logical form step by step in natural language and enables the user to make corrections through natural-language feedback for individual steps. We focus on question answering over knowledge bases (KBQA) as an instantiation of our framework, aiming to increase the transparency of the parsing process and help the user appropriately trust the final answer. To do so, we construct INSPIRED, a crowdsourced dialogue dataset derived from the ComplexWebQuestions dataset. Our experiments show that the interactive framework with human feedback has the potential to greatly improve overall parse accuracy. Furthermore, we develop a pipeline for dialogue simulation to evaluate our framework w.r.t. a variety of state-of-the-art KBQA models without involving further crowdsourcing effort. The results demonstrate that our interactive semantic parsing framework promises to be effective across such models.

...read moreread less

1 citations

Proceedings Article•

Improving Unsupervised Commonsense Reasoning Using Knowledge-Enabled Natural Language Inference

[...]

Canming Huang, Weinan He, Yongmei Liu

01 Nov 2021

TL;DR: The authors used transfer learning from large NLI datasets, and injecting crucial knowledge from commonsense sources such as ATOMIC 2020 and ConceptNet, to solve diverse commonsense reasoning tasks.

...read moreread less

Abstract: Recent methods based on pre-trained language models have shown strong supervised performance on commonsense reasoning. However, they rely on expensive data annotation and time-consuming training. Thus, we focus on unsupervised commonsense reasoning. We show the effectiveness of using a common framework, Natural Language Inference (NLI), to solve diverse commonsense reasoning tasks. By leveraging transfer learning from large NLI datasets, and injecting crucial knowledge from commonsense sources such as ATOMIC 2020 and ConceptNet, our method achieved state-of-the-art unsupervised performance on two commonsense reasoning tasks: WinoWhy and CommonsenseQA. Further analysis demonstrated the benefits of multiple categories of knowledge, but problems about quantities and antonyms are still challenging.

...read moreread less

1 citations