scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Reads0
Chats0
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Relative Positional Encoding for Transformers with Linear Complexity

TL;DR: Stochastic Positional Encoding (SPE) as discussed by the authors is an alternative to the classical additive (sinusoidal) PE and provably behaves like RPE for the linear-variants of the Transformer.
Posted Content

Superbizarre Is Not Superb: Derivational Morphology Improves BERT's Interpretation of Complex Words.

TL;DR: The authors showed that PLMs can be interpreted as serial dual-route models, i.e., the meanings of complex words are either stored or else need to be computed from the subwords, which implies that maximally meaningful input tokens should allow for the best generalization on new words.
Posted Content

Boosting Search Engines with Interactive Agents

TL;DR: The authors used machine reading to guide the selection of refinement terms from aggregated search results and then empowered agents with simple but effective search operators to exert fine-grained and transparent control over queries and search results.
Proceedings ArticleDOI

CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training

TL;DR: The Common Crawl Question Answering dataset (CCQA) as mentioned in this paper ) is a large-scale open-domain question-answering dataset with a previously unseen number of around 130 million multilingual question-answer pairs.
Posted Content

Large Scale Multi-Actor Generative Dialog Modeling

TL;DR: The authors introduced the Generative Conversation Control model, an augmented and fine-tuned GPT-2 language model that conditions on past reference conversations to probabilistically model multi-turn conversations in the actor's persona.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.