scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

WuDaoCorpora: A super large-scale Chinese corpora for pre-training language models

TL;DR: A super large-scale Chinese corpora WuDaoCorpora, containing about 3TB training data and 1.08 trillion Chinese characters, is introduced and the results show that the models trained on this corpora can achieve excellent performance in Chinese.

Data Augmentation using Pre-trained Transformer Models

TL;DR: This article showed that prepending the class labels to text sequences provides a simple yet effective way to condition the pre-trained models for data augmentation, and also showed that pre-learned Seq2Seq model outperforms other data augmentation methods in a low-resource setting.
Proceedings ArticleDOI

SLM: Learning a Discourse Language Representation with Sentence Unshuffling

TL;DR: Sentence-level Language Modeling is introduced, a new pre-training objective for learning a discourse language representation in a fully self-supervised manner by shuffling the sequence of input sentences and training a hierarchical transformer model to reconstruct the original ordering.
Posted Content

ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic.

TL;DR: The authors introduced two powerful deep bidirectional transformer-based models, ARBERT and MARBERT, for multi-dialectal Arabic language understanding evaluation, which achieved state-of-the-art results across the majority of tasks (37 out of 48 classification tasks, on the 42 datasets).
Posted Content

Retrieval Augmentation Reduces Hallucination in Conversation

TL;DR: This paper explore the use of neural retrieval-in-the-loop architectures for knowledge-grounded dialogue, a task that is arguably more challenging as it requires querying based on complex multi-turn dialogue context and generating conversationally coherent responses.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.