Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content•

BERT got a Date: Introducing Transformers to Temporal Tagging.

[...]

Satya Almasian¹, Dennis Aumiller, Michael Gertz•Institutions (1)

Heidelberg University¹

30 Sep 2021-arXiv: Computation and Language

TL;DR: The authors proposed a transformer encoder-decoder model for joint temporal tagging and type classification, which is based on the RoBERTa language model and achieves state-of-the-art performance on rare classes.

...read moreread less

Abstract: Temporal expressions in text play a significant role in language understanding and correctly identifying them is fundamental to various retrieval and natural language processing systems. Previous works have slowly shifted from rule-based to neural architectures, capable of tagging expressions with higher accuracy. However, neural models can not yet distinguish between different expression types at the same level as their rule-based counterparts. In this work, we aim to identify the most suitable transformer architecture for joint temporal tagging and type classification, as well as, investigating the effect of semi-supervised training on the performance of these systems. Based on our study of token classification variants and encoder-decoder architectures, we present a transformer encoder-decoder model using the RoBERTa language model as our best performing system. By supplementing training resources with weakly labeled data from rule-based systems, our model surpasses previous works in temporal tagging and type classification, especially on rare classes. Our code and pre-trained experiments are available at: this https URL

...read moreread less

Proceedings Article•

What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

[...]

Boseop Kim, HyoungSeok Kim, Sang Woo Lee¹, Gichang Lee, Dong-Hyun Kwak¹, Dong Hyeon Jeon, Sunghyun Park², Sungju Kim, Seonhoon Kim³, Dongpil Seo, Heungsub Lee, Minyoung Jeong, Sungjae Lee⁴, Minsub Kim, Suk Hyun Ko, Seokhun Kim, Taeyong Park³, Jinuk Kim, Soyoung Kang, Na-Hyeon Ryu, Kang Min Yoo¹, Minsuk Chang⁵, Soobin Suh, Sookyo In, Jin-Seong Park⁶, Kyungduk Kim⁷, Hiun Kim, Jisu Jeong¹, Yong Goo Yeo, Donghoon Ham, Dongju Park, Min Young Lee⁸, Jae-Wook Kang⁹, Inho Kang¹, Jung-Woo Ha¹, Woo-Myoung Park⁷, Nako Sung¹ - Show less +33 more•Institutions (9)

Naver Corporation¹, Amazon.com², Seoul National University³, Dong-eui University⁴, KAIST⁵, Hanyang University⁶, Samsung⁷, Yonsei University⁸, Chonbuk National University⁹

10 Sep 2021

TL;DR: HyperCLOVA as mentioned in this paper is a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens, which shows state-of-the-art zero-shot and few-shot learning performances on various downstream tasks in Korean.

...read moreread less

Abstract: GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data. Here we address some remaining issues less reported by the GPT-3 paper, such as a non-English LM, the performances of different sized models, and the effect of recently introduced prompt optimization on in-context learning. To achieve this, we introduce HyperCLOVA, a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens. Enhanced by our Korean-specific tokenization, HyperCLOVA with our training configuration shows state-of-the-art in-context zero-shot and few-shot learning performances on various downstream tasks in Korean. Also, we show the performance benefits of prompt-based learning and demonstrate how it can be integrated into the prompt engineering pipeline. Then we discuss the possibility of materializing the No Code AI paradigm by providing AI prototyping capabilities to non-experts of ML by introducing HyperCLOVA studio, an interactive prompt engineering interface. Lastly, we demonstrate the potential of our methods with three successful in-house applications.

...read moreread less

Posted Content•

Less is More: Generating Grounded Navigation Instructions from Landmarks

[...]

Su Wang, Ceslee Montgomery, Jordi Orbay, Vighnesh Birodkar, Aleksandra Faust, Izzeddin Gur, Natasha Jaques, Austin Waters, Jason Baldridge, Peter Anderson - Show less +6 more

25 Nov 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: Marky-MT5 as mentioned in this paper is a multilingual, multi-task encoder-decoder system that uses a first stage landmark detector and a second stage generator to ground visual landmarks.

...read moreread less

Abstract: We study the automatic generation of navigation instructions from 360-degree images captured on indoor routes. Existing generators suffer from poor visual grounding, causing them to rely on language priors and hallucinate objects. Our MARKY-MT5 system addresses this by focusing on visual landmarks; it comprises a first stage landmark detector and a second stage generator -- a multimodal, multilingual, multitask encoder-decoder. To train it, we bootstrap grounded landmark annotations on top of the Room-across-Room (RxR) dataset. Using text parsers, weak supervision from RxR's pose traces, and a multilingual image-text encoder trained on 1.8b images, we identify 1.1m English, Hindi and Telugu landmark descriptions and ground them to specific regions in panoramas. On Room-to-Room, human wayfinders obtain success rates (SR) of 71% following MARKY-MT5's instructions, just shy of their 75% SR following human instructions -- and well above SRs with other generators. Evaluations on RxR's longer, diverse paths obtain 61-64% SRs on three languages. Generating such high-quality navigation instructions in novel environments is a step towards conversational navigation tools and could facilitate larger-scale training of instruction-following agents.

...read moreread less