Open AccessJournal Article
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel,Noam Shazeer,Adam Roberts,Katherine Lee,Sharan Narang,Michael Matena,Yanqi Zhou,Wei Li,Peter J. Liu +8 more
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.Abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.read more
Citations
More filters
Proceedings ArticleDOI
NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints
TL;DR: This work proposes NeuroLogic Decoding, a simple yet effective algorithm that enables neural language models – supervised or not – to generate fluent text while satisfying complex lexical constraints, and suggests the limit of large-scale neural networks for fine-grained controllable generation and the promise of inference-time algorithms.
Posted Content
Carbon Emissions and Large Neural Network Training.
David A. Patterson,Joseph E. Gonzalez,Quoc V. Le,Chen Liang,Lluís-Miquel Munguía,Daniel Rothchild,David R. So,Maud Texier,Jeffrey Dean +8 more
TL;DR: In this article, the authors calculate the energy use and carbon footprint of several recent large models, including T5, Meena, GShard, Switch Transformer, and GPT-3, and refine earlier estimates for the neural architecture search that found evolved transformer.
Posted Content
Contrastive Learning with Adversarial Perturbations for Conditional Text Generation.
TL;DR: This work proposes a principled method to generate positive and negative samples for contrastive learning of sequence-to-sequence models, and empirically shows that this method significantly improves the generalization of the seq2seq on three text generation tasks - machine translation, text summarization, and question generation.
Proceedings Article
Deberta: decoding-enhanced bert with disentangled attention
TL;DR: DeBERTa as discussed by the authors proposes a disentangled attention mechanism, where each word is represented using two vectors that encode its content and position, respectively, and the attention weights among words are computed by disentangling matrices on their contents and relative positions.
Posted Content
Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents
TL;DR: This paper releases the Longformer-based pretrained language model, named as Lawformer, for Chinese legal long documents understanding, and demonstrates that the model can achieve promising improvement on tasks with long documents as inputs.