Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

How Reliable are Model Diagnostics

[...]

Vamsi Aribandi¹, Yi Tay², Donald Metzler²•Institutions (2)

Birla Institute of Technology and Science¹, Google²

01 Aug 2021

TL;DR: The authors examine three recent diagnostic tests for pre-trained language models, and find that likelihood-based and representation-based model diagnostics are not yet as reliable as previously assumed, and formulate recommendations for practitioners and researchers.

...read moreread less

Abstract: In the pursuit of a deeper understanding of a model's behaviour, there is recent impetus for developing suites of probes aimed at diagnosing models beyond simple metrics like accuracy or BLEU. This paper takes a step back and asks an important and timely question: how reliable are these diagnostics in providing insight into models and training setups? We critically examine three recent diagnostic tests for pre-trained language models, and find that likelihood-based and representation-based model diagnostics are not yet as reliable as previously assumed. Based on our empirical findings, we also formulate recommendations for practitioners and researchers.

...read moreread less

1 citations

Posted Content•

Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features

[...]

Hannah Rashkin¹, David Reitter², Gaurav Singh Tomar², Dipanjan Das²•Institutions (2)

University of Washington¹, Google²

14 Jul 2021-arXiv: Computation and Language

TL;DR: This article proposed different evaluation measures to disentangle different styles of responses by quantifying the informativeness and objectivity of the responses, and used these measures to train a generative neural dialogue model that is controlled to stay faithful to the evidence.

...read moreread less

Abstract: Knowledge-grounded dialogue systems are intended to convey information that is based on evidence provided in a given source text. We discuss the challenges of training a generative neural dialogue model for such systems that is controlled to stay faithful to the evidence. Existing datasets contain a mix of conversational responses that are faithful to selected evidence as well as more subjective or chit-chat style responses. We propose different evaluation measures to disentangle these different styles of responses by quantifying the informativeness and objectivity. At training time, additional inputs based on these evaluation measures are given to the dialogue model. At generation time, these additional inputs act as stylistic controls that encourage the model to generate responses that are faithful to the provided evidence. We also investigate the usage of additional controls at decoding time using resampling techniques. In addition to automatic metrics, we perform a human evaluation study where raters judge the output of these controlled generation models to be generally more objective and faithful to the evidence compared to baseline dialogue systems.

...read moreread less

1 citations

Posted Content•

Few-Shot Self-Rationalization with Natural Language Prompts.

[...]

Ana Marasović¹, Iz Beltagy¹, Doug Downey¹, Matthew E. Peters¹•Institutions (1)

Allen Institute for Artificial Intelligence¹

16 Nov 2021-arXiv: Computation and Language

TL;DR: This article presented FEB, a collection of natural language prompts for few-shot self-rationalization, and used this dataset to train a self-realization model with few training examples.

...read moreread less

Abstract: Self-rationalization models that predict task labels and generate free-text elaborations for their predictions could enable more intuitive interaction with NLP systems. These models are, however, currently trained with a large amount of human-written free-text explanations for each task which hinders their broader usage. We propose to study a more realistic setting of self-rationalization using few training examples. We present FEB -- a standardized collection of four existing English-language datasets and associated metrics. We identify the right prompting approach by extensively exploring natural language prompts on FEB. Then, by using this prompt and scaling the model size, we demonstrate that making progress on few-shot self-rationalization is possible. We show there is still ample room for improvement in this task: the average plausibility of generated explanations assessed by human annotators is at most 51%, while plausibility of human explanations is 76%. We hope that FEB together with our proposed approach will spur the community to take on the few-shot self-rationalization challenge.

...read moreread less

1 citations

French Contextualized Word-Embeddings with a sip of CaBeRnet: a New French Balanced Reference Corpus

[...]

Murielle Fabre, Pedro Javier Ortiz Suárez, Benoît Sagot, Éric Villemonte de la Clergerie

16 May 2020

TL;DR: It is hypothesized that a linguistically representative and balanced corpora will allow the language model to be more efficient and representative of a given language and therefore yield better evaluation scores on different evaluation sets and tasks.

...read moreread less

Abstract: This paper describes and compares the impact of different types and size of training corpora on language models like ELMO. By asking the fundamental question of quality versus quantity we evaluate four French corpora for training on parsing scores, POS-tagging and named-entities recognition downstream tasks. The paper studies the relevance of a new corpus, CaBeRnet, featuring a representative range of language usage, including a balanced variety of genres (oral transcriptions, newspapers, popular magazines, technical reports, fiction, academic texts), in oral and written styles. We hypothesize that a linguistically representative and balanced corpora will allow the language model to be more efficient and representative of a given language and therefore yield better evaluation scores on different evaluation sets and tasks.

...read moreread less

1 citations

Posted Content•

VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts

[...]

Wenhui Wang¹, Hangbo Bao¹, Li Dong¹, Furu Wei¹•Institutions (1)

Microsoft¹

03 Nov 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a Mixture-of-Modality-Experts (MoME) Transformer network is proposed, where each block contains a pool of modality-specific experts and a shared self-attention layer.

...read moreread less

Abstract: We present a unified Vision-Language pretrained Model (VLMo) that jointly learns a dual encoder and a fusion encoder with a modular Transformer network. Specifically, we introduce Mixture-of-Modality-Experts (MoME) Transformer, where each block contains a pool of modality-specific experts and a shared self-attention layer. Because of the modeling flexibility of MoME, pretrained VLMo can be fine-tuned as a fusion encoder for vision-language classification tasks, or used as a dual encoder for efficient image-text retrieval. Moreover, we propose a stagewise pre-training strategy, which effectively leverages large-scale image-only and text-only data besides image-text pairs. Experimental results show that VLMo achieves state-of-the-art results on various vision-language tasks, including VQA and NLVR2. The code and pretrained models are available at https://aka.ms/vlmo.

...read moreread less

1 citations