Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

Chats0

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Posted Content

\'UFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5

David Samuel, +1 more

- 28 Oct 2021 -

arXiv: Computation and Language

TL;DR: This article presented the winning entry to the Multilingual Lexical Normalization (MultiLexNorm) shared task at W-NUT 2021 (van der Goot et al., 2021a), which evaluated lexical-normalization systems on 12 social media datasets in 11 languages.

...read moreread less

Proceedings Article

Graph-Based Decoding for Task Oriented Semantic Parsing

Jeremy R. Cole, +4 more

TL;DR: The authors formulate semantic parsing as a dependency parsing task, applying graph-based decoding techniques developed for syntactic parsing, and compare various decoding techniques given the same pre-trained Transformer encoder on the TOP dataset, including settings where training data is limited or contains only partially annotated examples.

...read moreread less

Posted Content

MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients

Chen Zhu, +6 more

- 21 Jun 2020 -

arXiv: Learning

TL;DR: This paper proposed an adaptive learning rate principle, in which the running mean of squared gradient in Adam is replaced by a weighted mean, with weights chosen to maximize the estimated variance of each coordinate.

...read moreread less

Proceedings Article

Monolingual and Cross-Lingual Acceptability Judgments with the Italian CoLA corpus.

Daniela Trotta, +3 more

TL;DR: This paper developed the ItaCoLA corpus, containing almost 10,000 sentences with acceptability judgments, which has been created following the same approach and the same steps as the English CoLA corpus.

...read moreread less

Patent

Retrieval-augmented language model pre-training and fine-tuning

Kenton Lee, +4 more

TL;DR: In this paper, a neural-network-based textual knowledge retriever is trained along with a language model to retrieve helpful information from a large unlabeled corpus, rather than requiring all potentially relevant information to be stored implicitly in the parameters of the neural network.

...read moreread less

Collapse

arXiv: Computation and Language

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Citations

\'UFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5

Graph-Based Decoding for Task Oriented Semantic Parsing

MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients

Monolingual and Cross-Lingual Acceptability Judgments with the Italian CoLA corpus.

Retrieval-augmented language model pre-training and fine-tuning

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Attention is All you Need

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Adam: A Method for Stochastic Optimization

Bleu: a Method for Automatic Evaluation of Machine Translation

Trending Questions (1)