scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Reads0
Chats0
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

\'UFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5

TL;DR: This article presented the winning entry to the Multilingual Lexical Normalization (MultiLexNorm) shared task at W-NUT 2021 (van der Goot et al., 2021a), which evaluated lexical-normalization systems on 12 social media datasets in 11 languages.
Proceedings Article

Graph-Based Decoding for Task Oriented Semantic Parsing

TL;DR: The authors formulate semantic parsing as a dependency parsing task, applying graph-based decoding techniques developed for syntactic parsing, and compare various decoding techniques given the same pre-trained Transformer encoder on the TOP dataset, including settings where training data is limited or contains only partially annotated examples.
Posted Content

MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients

TL;DR: This paper proposed an adaptive learning rate principle, in which the running mean of squared gradient in Adam is replaced by a weighted mean, with weights chosen to maximize the estimated variance of each coordinate.
Proceedings Article

Monolingual and Cross-Lingual Acceptability Judgments with the Italian CoLA corpus.

TL;DR: This paper developed the ItaCoLA corpus, containing almost 10,000 sentences with acceptability judgments, which has been created following the same approach and the same steps as the English CoLA corpus.
Patent

Retrieval-augmented language model pre-training and fine-tuning

TL;DR: In this paper, a neural-network-based textual knowledge retriever is trained along with a language model to retrieve helpful information from a large unlabeled corpus, rather than requiring all potentially relevant information to be stored implicitly in the parameters of the neural network.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.