scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints

TL;DR: This work proposes NeuroLogic Decoding, a simple yet effective algorithm that enables neural language models – supervised or not – to generate fluent text while satisfying complex lexical constraints, and suggests the limit of large-scale neural networks for fine-grained controllable generation and the promise of inference-time algorithms.
Posted Content

Carbon Emissions and Large Neural Network Training.

TL;DR: In this article, the authors calculate the energy use and carbon footprint of several recent large models, including T5, Meena, GShard, Switch Transformer, and GPT-3, and refine earlier estimates for the neural architecture search that found evolved transformer.
Posted Content

Contrastive Learning with Adversarial Perturbations for Conditional Text Generation.

TL;DR: This work proposes a principled method to generate positive and negative samples for contrastive learning of sequence-to-sequence models, and empirically shows that this method significantly improves the generalization of the seq2seq on three text generation tasks - machine translation, text summarization, and question generation.
Proceedings Article

Deberta: decoding-enhanced bert with disentangled attention

TL;DR: DeBERTa as discussed by the authors proposes a disentangled attention mechanism, where each word is represented using two vectors that encode its content and position, respectively, and the attention weights among words are computed by disentangling matrices on their contents and relative positions.
Posted Content

Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents

TL;DR: This paper releases the Longformer-based pretrained language model, named as Lawformer, for Chinese legal long documents understanding, and demonstrates that the model can achieve promising improvement on tasks with long documents as inputs.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.