Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Posted Content

A cost-benefit analysis of cross-lingual transfer methods.

Guilherme Moraes Rosa, +4 more

- 14 May 2021 -

arXiv: Computation and Language

TL;DR: This paper analyzed cross-lingual methods in terms of their effectiveness, development and deployment costs, as well as their latencies at inference time and concluded that the best cross-language method is highly task-dependent.

...read moreread less

Proceedings Article

Explainable Unsupervised Argument Similarity Rating with Abstract Meaning Representation and Conclusion Generation.

Juri Opitz, +4 more

TL;DR: This paper proposed AMR-based argument similarity metrics that make argument similarity judgements more interpretable and may even support argument quality judgements, but they do not address the problem of referenceless evaluation of argumentative conclusion generations.

...read moreread less

Proceedings Article

On the Influence of Masking Policies in Intermediate Pre-training

Qinyuan Ye, +7 more

TL;DR: This paper performed a large-scale empirical study to investigate the effect of various masking policies in intermediate pre-training with nine selected tasks across three categories and found that the success of intermediate pretraining is dependent on appropriate pre-train corpus, selection of output format (i.e., masked spans or full sentence), and clear understanding of the role that MLM plays for the downstream task.

...read moreread less

Posted Content

Cryptonite: A Cryptic Crossword Benchmark for Extreme Ambiguity in Language

Avia Efrat, +3 more

- 01 Mar 2021 -

arXiv: Computation and Language

TL;DR: Cryptonite as discussed by the authors is a large-scale dataset based on cryptic crosswords, which is both linguistically complex and naturally sourced, and each example in Cryptonite is a cryptic clue, a short phrase or sentence with a misleading surface reading, whose solving requires disambiguating semantic, syntactic, and phonetic wordplays.

...read moreread less

Posted Content

Synthetic Data Generation for Grammatical Error Correction with Tagged Corruption Models

Felix Stahlberg, +1 more

- 27 May 2021 -

arXiv: Computation and Language

TL;DR: This article used error type tags from automatic annotation tools such as ERRANT to guide synthetic data generation, and compared several models that can produce an ungrammatical sentence given a clean sentence and an error type tag.

...read moreread less