scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Reads0
Chats0
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Exploring the Limits of Out-of-Distribution Detection

TL;DR: In this article, a few-shot outlier exposure setting where a few examples from outlier classes may be available was explored, and a large-scale pre-trained transformers were used to improve the state-of-the-art on a range of near OOD tasks across different data modalities.
Proceedings ArticleDOI

TeaForN: Teacher-Forcing with N-grams

TL;DR: Teacher-Forcing with N-grams (TeaForN) as discussed by the authors uses a stack of N decoders trained to decode along a secondary time axis that allows model-parameter updates based on N prediction steps.
Proceedings Article

A Cascade Approach to Neural Abstractive Summarization with Content Selection and Fusion.

TL;DR: The authors presented an empirical study in favor of a cascade architecture to neural text summarization and showed that the performance of a cascaded pipeline that separately identifies important content pieces and stitches them together into a coherent text is comparable to or outranks that of end-to-end systems.
Posted Content

Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering.

TL;DR: In this paper, a visual retriever aims to retrieve relevant knowledge, and the visual reader seeks to predict answers based on given knowledge. And they introduce various ways to retrieve knowledge using text and images and two reader styles: classification and extraction.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.