scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Error Detection in Large-Scale Natural Language Understanding Systems Using Transformer Models

TL;DR: In this article, the authors combine utterance encodings from a RoBERTa model with the Nbest hypothesis produced by the production system and fine-tune end to end in a multitask setting using a small dataset of humanannotated utterances with domain classification errors.
Proceedings ArticleDOI

RewardsOfSum: Exploring Reinforcement Learning Rewards for Summarisation

TL;DR: This article propose two reward functions for abstractive summarization: the first function, referred to as RwB-Hinge, dynamically selects the samples for the gradient update, and the second function, nicknamed RISK, leverages a small pool of strong candidates to inform the reward.
Proceedings ArticleDOI

MCL@IITK at SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation using Augmented Data, Signals, and Transformers

TL;DR: This article used transformer-based language models to detect whether a given word common to both the sentences evokes the same meaning in a sentence pair classification task, achieving the best performance in the SemEval 2021 Task 2.
Posted Content

Unsupervised and Distributional Detection of Machine-Generated Text.

TL;DR: The authors proposed a method to detect machine-generated documents leveraging repeated higher-order n-grams, which they show over-appear in machine generated text as compared to human ones.
Posted Content

How not to Lie with a Benchmark: Rearranging NLP Leaderboards.

TL;DR: In this paper, the authors examine popular NLP benchmarks' overall scoring methods and rearrange the models by geometric and harmonic mean (appropriate for averaging rates) according to their reported results.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.