Open AccessJournal Article
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel,Noam Shazeer,Adam Roberts,Katherine Lee,Sharan Narang,Michael Matena,Yanqi Zhou,Wei Li,Peter J. Liu +8 more
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.Abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.read more
Citations
More filters
Posted Content
COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs.
Jena D. Hwang,Chandra Bhagavatula,Ronan Le Bras,Jeff Da,Keisuke Sakaguchi,Antoine Bosselut,Yejin Choi +6 more
TL;DR: It is proposed that manually constructed CSKGs will never achieve the coverage necessary to be applicable in all situations encountered by NLP agents, and a new evaluation framework for testing the utility of KGs based on how effectively implicit knowledge representations can be learned from them is proposed.
Posted Content
ReZero is All You Need: Fast Convergence at Large Depth.
Thomas C. Bachlechner,Bodhisattwa Prasad Majumder,Huanru Henry Mao,Garrison W. Cottrell,Julian McAuley +4 more
TL;DR: This work shows that the simplest architecture change of gating each residual connection using a single zero-initialized parameter satisfies initial dynamical isometry and outperforms more complex approaches and is applied to language modeling and finds that it can easily train 120-layer Transformers.
Proceedings ArticleDOI
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Xiang Lisa Li,Percy Liang +1 more
TL;DR: The authors propose prefix-tuning, a lightweight alternative to finetuning for natural language generation tasks, which keeps language model parameters frozen and instead optimizes a sequence of continuous task-specific vectors, which they call the prefix.
Proceedings ArticleDOI
Document Ranking with a Pretrained Sequence-to-Sequence Model
TL;DR: Surprisingly, it is found that the choice of target tokens impacts effectiveness, even for words that are closely related semantically, which sheds some light on why the sequence-to-sequence formulation for document ranking is effective.
Proceedings ArticleDOI
Reformulating Unsupervised Style Transfer as Paraphrase Generation.
TL;DR: This paper reformulates unsupervised style transfer as a paraphrase generation problem, and presents a simple methodology based on fine-tuning pretrained language models on automatically generated paraphrase data that significantly outperforms state-of-the-art style transfer systems on both human and automatic evaluations.