Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content•

LILA: Language-Informed Latent Actions.

[...]

Siddharth Karamcheti, Megha Srivastava, Percy Liang, Dorsa Sadigh¹•Institutions (1)

Stanford University¹

05 Nov 2021-arXiv: Robotics

TL;DR: This article proposed Language-Informed Latent Actions (LILA), a framework for learning natural language interfaces in the context of human-robot collaboration, which learns to use language to modulate this controller, providing users with a language-informed control space.

...read moreread less

Abstract: We introduce Language-Informed Latent Actions (LILA), a framework for learning natural language interfaces in the context of human-robot collaboration. LILA falls under the shared autonomy paradigm: in addition to providing discrete language inputs, humans are given a low-dimensional controller $-$ e.g., a 2 degree-of-freedom (DoF) joystick that can move left/right and up/down $-$ for operating the robot. LILA learns to use language to modulate this controller, providing users with a language-informed control space: given an instruction like "place the cereal bowl on the tray," LILA may learn a 2-DoF space where one dimension controls the distance from the robot's end-effector to the bowl, and the other dimension controls the robot's end-effector pose relative to the grasp point on the bowl. We evaluate LILA with real-world user studies, where users can provide a language instruction while operating a 7-DoF Franka Emika Panda Arm to complete a series of complex manipulation tasks. We show that LILA models are not only more sample efficient and performant than imitation learning and end-effector control baselines, but that they are also qualitatively preferred by users.

...read moreread less

Proceedings Article•DOI•

Saying No is An Art: Contextualized Fallback Responses for Unanswerable Dialogue Queries

[...]

Ashish Shrivastava, Kaustubh Dhole¹, Abhinav Bhatt, Sharvani Raghunath•Institutions (1)

Tata Institute of Fundamental Research¹

01 Aug 2021

TL;DR: The authors designed a neural approach which generates responses which are contextually aware with the user query as well as say no to the user, such customized responses provide paraphrasing ability and contextualization.

...read moreread less

Abstract: Despite end-to-end neural systems making significant progress in the last decade for task-oriented as well as chit-chat based dialogue systems, most dialogue systems rely on hybrid approaches which use a combination of rule-based, retrieval and generative approaches for generating a set of ranked responses. Such dialogue systems need to rely on a fallback mechanism to respond to out-of-domain or novel user queries which are not answerable within the scope of the dialogue system. While, dialogue systems today rely on static and unnatural responses like “I don’t know the answer to that question” or “I’m not sure about that”, we design a neural approach which generates responses which are contextually aware with the user query as well as say no to the user. Such customized responses provide paraphrasing ability and contextualization as well as improve the interaction with the user and reduce dialogue monotonicity. Our simple approach makes use of rules over dependency parses and a text-to-text transformer fine-tuned on synthetic data of question-response pairs generating highly relevant, grammatical as well as diverse questions. We perform automatic and manual evaluations to demonstrate the efficacy of the system.

...read moreread less

Proceedings Article•DOI•

Large Product Key Memory for Pretrained Language Models

[...]

Gyuwan Kim¹, Tae Hwan Jung•Institutions (1)

Naver Corporation¹

08 Oct 2020

TL;DR: A new memory usage metric is defined, and careful observation reveals that most memory slots remain outdated during the training of PKM-augmented models, enhancing memory utilization and downstream performance.

...read moreread less

Abstract: Product key memory (PKM) proposed by Lample et al. (2019) enables to improve prediction accuracy by increasing model capacity efficiently with insignificant computational overhead. However, their empirical application is only limited to causal language modeling. Motivated by the recent success of pretrained language models (PLMs), we investigate how to incorporate large PKM into PLMs that can be finetuned for a wide variety of downstream NLP tasks. We define a new memory usage metric, and careful observation using this metric reveals that most memory slots remain outdated during the training of PKM-augmented models. To train better PLMs by tackling this issue, we propose simple but effective solutions: (1) initialization from the model weights pretrained without memory and (2) augmenting PKM by addition rather than replacing a feed-forward network. We verify that both of them are crucial for the pretraining of PKM-augmented PLMs, enhancing memory utilization and downstream performance. Code and pretrained weights are available at https://github.com/clovaai/pkm-transformers.

...read moreread less

Posted Content•

QACE: Asking Questions to Evaluate an Image Caption

[...]

Hwanhee Lee¹, Thomas Scialom², Seunghyun Yoon¹, Franck Dernoncourt³, Kyomin Jung¹ - Show less +1 more•Institutions (3)

Seoul National University¹, University of Paris², Adobe Systems³

28 Aug 2021-arXiv: Computation and Language

TL;DR: This paper proposed a new metric based on Question Answering for Caption Evaluation (QACE), which generates questions on the evaluated caption and checks its content by asking the questions on either the reference caption or the source image.

...read moreread less

Abstract: In this paper, we propose QACE, a new metric based on Question Answering for Caption Evaluation. QACE generates questions on the evaluated caption and checks its content by asking the questions on either the reference caption or the source image. We first develop QACE-Ref that compares the answers of the evaluated caption to its reference, and report competitive results with the state-of-the-art metrics. To go further, we propose QACE-Img, which asks the questions directly on the image, instead of reference. A Visual-QA system is necessary for QACE-Img. Unfortunately, the standard VQA models are framed as a classification among only a few thousand categories. Instead, we propose Visual-T5, an abstractive VQA system. The resulting metric, QACE-Img is multi-modal, reference-less, and explainable. Our experiments show that QACE-Img compares favorably w.r.t. other reference-less metrics. We will release the pre-trained models to compute QACE.

...read moreread less

Posted Content•

Amortized Prompt: Lightweight Fine-Tuning for CLIP in Domain Generalization

[...]

Xin Zhang¹, Yusuke Iwasawa, Yutaka Matsuo, Shixiang Shane Gu•Institutions (1)

University of Tokyo¹

25 Nov 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: Amortized Prompt (AP) as mentioned in this paper is a novel approach for domain inference in the form of prompt generation, which has been shown to be robust to many distribution shifts and therefore should lead to substantial improvements in DG.

...read moreread less

Abstract: Domain generalization (DG) is a difficult transfer learning problem aiming to learn a generalizable model to unseen domains. Recent massive pre-trained models such as CLIP and GPT-3, i.e. foundation models (FMs), have been shown to be robust to many distribution shifts and therefore should lead to substantial improvements in DG. In this work, we study generic ways to adopt CLIP for DG problems in image classification, where we evaluate on naive zero-shot learning and full DG learning settings. For the latter, we propose AP (Amortized Prompt), as a novel approach for domain inference in the form of prompt generation. Using several standard datasets on domain generalization benchmark, namely PACS, VLCS, OfficeHome, and TerraIncognita, CLIP provides comparable performance without fine-tuning any parameters, suggesting the applicability and importance of FM in DG. In addition, we show that combining domain prompt inference with CLIP enables AP to outperform strong baselines and the naive CLIP baselines by a large margin, raising accuracy from 71.3\% to 79.3\%. We hope the simplicity and success of our approach emphasizes the importance of and leads to wider more adoption and analysis of foundation models in the field of domain generalization.

...read moreread less