scispace - formally typeset
Search or ask a question
Journal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Content maybe subject to copyright    Report

Citations
More filters
Posted Content
TL;DR: This article proposed Language-Informed Latent Actions (LILA), a framework for learning natural language interfaces in the context of human-robot collaboration, which learns to use language to modulate this controller, providing users with a language-informed control space.
Abstract: We introduce Language-Informed Latent Actions (LILA), a framework for learning natural language interfaces in the context of human-robot collaboration. LILA falls under the shared autonomy paradigm: in addition to providing discrete language inputs, humans are given a low-dimensional controller $-$ e.g., a 2 degree-of-freedom (DoF) joystick that can move left/right and up/down $-$ for operating the robot. LILA learns to use language to modulate this controller, providing users with a language-informed control space: given an instruction like "place the cereal bowl on the tray," LILA may learn a 2-DoF space where one dimension controls the distance from the robot's end-effector to the bowl, and the other dimension controls the robot's end-effector pose relative to the grasp point on the bowl. We evaluate LILA with real-world user studies, where users can provide a language instruction while operating a 7-DoF Franka Emika Panda Arm to complete a series of complex manipulation tasks. We show that LILA models are not only more sample efficient and performant than imitation learning and end-effector control baselines, but that they are also qualitatively preferred by users.
Proceedings ArticleDOI
01 Aug 2021
TL;DR: The authors designed a neural approach which generates responses which are contextually aware with the user query as well as say no to the user, such customized responses provide paraphrasing ability and contextualization.
Abstract: Despite end-to-end neural systems making significant progress in the last decade for task-oriented as well as chit-chat based dialogue systems, most dialogue systems rely on hybrid approaches which use a combination of rule-based, retrieval and generative approaches for generating a set of ranked responses. Such dialogue systems need to rely on a fallback mechanism to respond to out-of-domain or novel user queries which are not answerable within the scope of the dialogue system. While, dialogue systems today rely on static and unnatural responses like “I don’t know the answer to that question” or “I’m not sure about that”, we design a neural approach which generates responses which are contextually aware with the user query as well as say no to the user. Such customized responses provide paraphrasing ability and contextualization as well as improve the interaction with the user and reduce dialogue monotonicity. Our simple approach makes use of rules over dependency parses and a text-to-text transformer fine-tuned on synthetic data of question-response pairs generating highly relevant, grammatical as well as diverse questions. We perform automatic and manual evaluations to demonstrate the efficacy of the system.
Proceedings ArticleDOI
08 Oct 2020
TL;DR: A new memory usage metric is defined, and careful observation reveals that most memory slots remain outdated during the training of PKM-augmented models, enhancing memory utilization and downstream performance.
Abstract: Product key memory (PKM) proposed by Lample et al. (2019) enables to improve prediction accuracy by increasing model capacity efficiently with insignificant computational overhead. However, their empirical application is only limited to causal language modeling. Motivated by the recent success of pretrained language models (PLMs), we investigate how to incorporate large PKM into PLMs that can be finetuned for a wide variety of downstream NLP tasks. We define a new memory usage metric, and careful observation using this metric reveals that most memory slots remain outdated during the training of PKM-augmented models. To train better PLMs by tackling this issue, we propose simple but effective solutions: (1) initialization from the model weights pretrained without memory and (2) augmenting PKM by addition rather than replacing a feed-forward network. We verify that both of them are crucial for the pretraining of PKM-augmented PLMs, enhancing memory utilization and downstream performance. Code and pretrained weights are available at https://github.com/clovaai/pkm-transformers.
Posted Content
TL;DR: This paper proposed a new metric based on Question Answering for Caption Evaluation (QACE), which generates questions on the evaluated caption and checks its content by asking the questions on either the reference caption or the source image.
Abstract: In this paper, we propose QACE, a new metric based on Question Answering for Caption Evaluation. QACE generates questions on the evaluated caption and checks its content by asking the questions on either the reference caption or the source image. We first develop QACE-Ref that compares the answers of the evaluated caption to its reference, and report competitive results with the state-of-the-art metrics. To go further, we propose QACE-Img, which asks the questions directly on the image, instead of reference. A Visual-QA system is necessary for QACE-Img. Unfortunately, the standard VQA models are framed as a classification among only a few thousand categories. Instead, we propose Visual-T5, an abstractive VQA system. The resulting metric, QACE-Img is multi-modal, reference-less, and explainable. Our experiments show that QACE-Img compares favorably w.r.t. other reference-less metrics. We will release the pre-trained models to compute QACE.
Posted Content
TL;DR: Amortized Prompt (AP) as mentioned in this paper is a novel approach for domain inference in the form of prompt generation, which has been shown to be robust to many distribution shifts and therefore should lead to substantial improvements in DG.
Abstract: Domain generalization (DG) is a difficult transfer learning problem aiming to learn a generalizable model to unseen domains. Recent massive pre-trained models such as CLIP and GPT-3, i.e. foundation models (FMs), have been shown to be robust to many distribution shifts and therefore should lead to substantial improvements in DG. In this work, we study generic ways to adopt CLIP for DG problems in image classification, where we evaluate on naive zero-shot learning and full DG learning settings. For the latter, we propose AP (Amortized Prompt), as a novel approach for domain inference in the form of prompt generation. Using several standard datasets on domain generalization benchmark, namely PACS, VLCS, OfficeHome, and TerraIncognita, CLIP provides comparable performance without fine-tuning any parameters, suggesting the applicability and importance of FM in DG. In addition, we show that combining domain prompt inference with CLIP enables AP to outperform strong baselines and the naive CLIP baselines by a large margin, raising accuracy from 71.3\% to 79.3\%. We hope the simplicity and success of our approach emphasizes the importance of and leads to wider more adoption and analysis of foundation models in the field of domain generalization.
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.