scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

A cost-benefit analysis of cross-lingual transfer methods.

TL;DR: This paper analyzed cross-lingual methods in terms of their effectiveness, development and deployment costs, as well as their latencies at inference time and concluded that the best cross-language method is highly task-dependent.
Proceedings Article

Explainable Unsupervised Argument Similarity Rating with Abstract Meaning Representation and Conclusion Generation.

TL;DR: This paper proposed AMR-based argument similarity metrics that make argument similarity judgements more interpretable and may even support argument quality judgements, but they do not address the problem of referenceless evaluation of argumentative conclusion generations.
Proceedings Article

On the Influence of Masking Policies in Intermediate Pre-training

TL;DR: This paper performed a large-scale empirical study to investigate the effect of various masking policies in intermediate pre-training with nine selected tasks across three categories and found that the success of intermediate pretraining is dependent on appropriate pre-train corpus, selection of output format (i.e., masked spans or full sentence), and clear understanding of the role that MLM plays for the downstream task.
Posted Content

Cryptonite: A Cryptic Crossword Benchmark for Extreme Ambiguity in Language

TL;DR: Cryptonite as discussed by the authors is a large-scale dataset based on cryptic crosswords, which is both linguistically complex and naturally sourced, and each example in Cryptonite is a cryptic clue, a short phrase or sentence with a misleading surface reading, whose solving requires disambiguating semantic, syntactic, and phonetic wordplays.
Posted Content

Synthetic Data Generation for Grammatical Error Correction with Tagged Corruption Models

TL;DR: This article used error type tags from automatic annotation tools such as ERRANT to guide synthetic data generation, and compared several models that can produce an ungrammatical sentence given a clean sentence and an error type tag.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.