Open AccessJournal Article
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel,Noam Shazeer,Adam Roberts,Katherine Lee,Sharan Narang,Michael Matena,Yanqi Zhou,Wei Li,Peter J. Liu +8 more
Reads0
Chats0
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.Abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.read more
Citations
More filters
Posted Content
Building Compact and Robust Deep Neural Networks with Toeplitz Matrices
TL;DR: In this article, the authors leverage the properties of structured matrices from the Toeplitz family to build compact and secure neural networks, which are not only accurate but also compact, easy to train, reliable and robust to adversarial examples.
Posted Content
Conditional Generation of Temporally-ordered Event Sequences
TL;DR: The authors use a denoising autoencoder to predict new events which fit into an existing temporally-ordered sequence, which can capture both temporality and common event co-occurrence.
Posted Content
Pretrained Language Models are Symbolic Mathematics Solvers too
TL;DR: This article proposed a sample efficient way of solving the symbolic tasks by first pretraining the transformer model with language translation and then fine-tuning the pretrained transformer model to solve the downstream task of symbolic mathematics.
Posted Content
Cross-Domain Reasoning via Template Filling.
TL;DR: This article explore the ability of sequence to sequence models to perform cross-domain reasoning and present a prompt-template-filling approach to enable sequence-to-sequence models to do crossdomain reasoning.
Proceedings Article
PermuteFormer: Efficient Relative Position Encoding for Long Sequences.
TL;DR: Permuteformer as discussed by the authors applies position-dependent transformation on queries and keys to encode positional information into the attention module, which is carefully crafted so that the final output of self-attention is not affected by absolute positions of tokens.