Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content•

Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models

[...]

Jianmo Ni, Gustavo Hernandez Abrego, Noah Constant, Ji Ma, Keith Hall, Daniel Cer, Yinfei Yang - Show less +3 more

19 Aug 2021-arXiv: Computation and Language

TL;DR: This paper provided the first exploration of text-to-text transformers (T5) sentence embeddings for language processing tasks and achieved state-of-the-art performance on transfer tasks and semantic textual similarity.

...read moreread less

Abstract: We provide the first exploration of text-to-text transformers (T5) sentence embeddings. Sentence embeddings are broadly useful for language processing tasks. While T5 achieves impressive performance on language tasks cast as sequence-to-sequence mapping problems, it is unclear how to produce sentence embeddings from encoder-decoder models. We investigate three methods for extracting T5 sentence embeddings: two utilize only the T5 encoder and one uses the full T5 encoder-decoder model. Our encoder-only models outperforms BERT-based sentence embeddings on both transfer tasks and semantic textual similarity (STS). Our encoder-decoder method achieves further improvement on STS. Scaling up T5 from millions to billions of parameters is found to produce consistent improvements on downstream tasks. Finally, we introduce a two-stage contrastive learning approach that achieves a new state-of-art on STS using sentence embeddings, outperforming both Sentence BERT and SimCSE.

...read moreread less

14 citations

Proceedings Article•DOI•

Domain-robust VQA with diverse datasets and methods but no target labels

[...]

Mingda Zhang¹, Tristan Maidment¹, Ahmad Diab¹, Adriana Kovashka¹, Rebecca Hwa¹ - Show less +1 more•Institutions (1)

University of Pittsburgh¹

01 Jun 2021

TL;DR: In this paper, the authors quantify domain shifts between popular VQA datasets, in both visual and textual space, and test the robustness of different families of visual question answering methods (two-stream, transformer and neuro-symbolic) to these shifts.

...read moreread less

Abstract: The observation that computer vision methods overfit to dataset specifics has inspired diverse attempts to make object recognition models robust to domain shifts. However, similar work on domain-robust visual question answering methods is very limited. Domain adaptation for VQA differs from adaptation for object recognition due to additional complexity: VQA models handle multimodal inputs, methods contain multiple steps with diverse modules resulting in complex optimization, and answer spaces in different datasets are vastly different. To tackle these challenges, we first quantify domain shifts between popular VQA datasets, in both visual and textual space. To disentangle shifts between datasets arising from different modalities, we also construct synthetic shifts in the image and question domains separately. Second, we test the robustness of different families of VQA methods (classic two-stream, transformer, and neuro-symbolic methods) to these shifts. Third, we test the applicability of existing domain adaptation methods and devise a new one to bridge VQA domain gaps, adjusted to specific VQA models. To emulate the setting of real-world generalization, we focus on unsupervised domain adaptation and the open-ended classification task formulation.

...read moreread less

14 citations

Journal Article•DOI•

Rethinking Search: Making Domain Experts out of Dilettantes.

[...]

Donald Metzler¹, Yi Tay¹, Dara Bahri¹, Marc Najork¹•Institutions (1)

Google¹

05 May 2021-arXiv: Information Retrieval

TL;DR: The authors examines how ideas from classical information retrieval and pre-trained language models can be synthesized and evolved into systems that truly deliver on the promise of domain expert advice, but they do not have a true understanding of the world, they are prone to hallucinating, and they are incapable of justifying their utterances by referring to supporting documents in the corpus they were trained over.

...read moreread less

Abstract: When experiencing an information need, users want to engage with a domain expert, but often turn to an information retrieval system, such as a search engine, instead. Classical information retrieval systems do not answer information needs directly, but instead provide references to (hopefully authoritative) answers. Successful question answering systems offer a limited corpus created on-demand by human experts, which is neither timely nor scalable. Pre-trained language models, by contrast, are capable of directly generating prose that may be responsive to an information need, but at present they are dilettantes rather than domain experts -- they do not have a true understanding of the world, they are prone to hallucinating, and crucially they are incapable of justifying their utterances by referring to supporting documents in the corpus they were trained over. This paper examines how ideas from classical information retrieval and pre-trained language models can be synthesized and evolved into systems that truly deliver on the promise of domain expert advice.

...read moreread less

14 citations

Proceedings Article•DOI•

MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization

[...]

Canwen Xu¹, Jiaxin Pei¹, Hongtao Wu¹, Yiyu Liu¹, Chenliang Li¹ - Show less +1 more•Institutions (1)

Wuhan University¹

01 Apr 2020

TL;DR: This article proposed MATINF, the first large-scale dataset for cross-task learning in NLP, which contains 1.07 million question-answer pairs with human-labeled categories and user-generated question descriptions.

...read moreread less

Abstract: Recently, large-scale datasets have vastly facilitated the development in nearly all domains of Natural Language Processing. However, there is currently no cross-task dataset in NLP, which hinders the development of multi-task learning. We propose MATINF, the first jointly labeled large-scale dataset for classification, question answering and summarization. MATINF contains 1.07 million question-answer pairs with human-labeled categories and user-generated question descriptions. Based on such rich information, MATINF is applicable for three major NLP tasks, including classification, question answering, and summarization. We benchmark existing methods and a novel multi-task baseline over MATINF to inspire further research. Our comprehensive comparison and experiments over MATINF and other datasets demonstrate the merits held by MATINF.

...read moreread less

13 citations

Posted Content•

Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing

[...]

Xi Victoria Lin¹, Richard Socher¹, Caiming Xiong¹•Institutions (1)

Salesforce.com¹

23 Dec 2020-arXiv: Computation and Language

TL;DR: BRIDGE as discussed by the authors represents the question and DB schema in a tagged sequence where a subset of the fields are augmented with cell values mentioned in the question, and the hybrid sequence is encoded by BERT with minimal subsequent layers.

...read moreread less

Abstract: We present BRIDGE, a powerful sequential architecture for modeling dependencies between natural language questions and relational databases in cross-DB semantic parsing. BRIDGE represents the question and DB schema in a tagged sequence where a subset of the fields are augmented with cell values mentioned in the question. The hybrid sequence is encoded by BERT with minimal subsequent layers and the text-DB contextualization is realized via the fine-tuned deep attention in BERT. Combined with a pointer-generator decoder with schema-consistency driven search space pruning, BRIDGE attained state-of-the-art performance on popular cross-DB text-to-SQL benchmarks, Spider (71.1\% dev, 67.5\% test with ensemble model) and WikiSQL (92.6\% dev, 91.9\% test). Our analysis shows that BRIDGE effectively captures the desired cross-modal dependencies and has the potential to generalize to more text-DB related tasks. Our implementation is available at \url{this https URL}.

...read moreread less

13 citations