scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Reads0
Chats0
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

A Flexible Multi-Task Model for BERT Serving

TL;DR: In this paper, a BERT-based multi-task (MT) framework was proposed for iterative and incremental development of the tasks, which was based on the idea of partial fine-tuning, i.e. only fine-tune some top layers of BERT while keeping the other layers frozen.
Posted Content

Leveraging redundancy in attention with Reuse Transformers.

TL;DR: In this article, the authors propose a novel architecture that reuses attention scores computed in one layer in multiple subsequent layers, and demonstrate that reusing attention delivers performance equivalent to or better than standard transformers, while reducing both compute and memory usage.
Posted Content

Can Transformer Language Models Predict Psychometric Properties

TL;DR: This article found that transformer-based LMs can predict psychometric properties consistently well in certain categories but consistently poorly in others, thus providing new insights into fundamental similarities and differences between human and LM reasoning.
Posted Content

Sparse is Enough in Scaling Transformers.

TL;DR: Scaling Transformers as discussed by the authors use sparse layers to scale efficiently and perform unbatched decoding much faster than the standard Transformer as we scale up the model size, and they achieve state-of-the-art performance on long text summarization.
Proceedings ArticleDOI

Constructing a Relevance-oriented Dataset for Training Transformer Rankers for Medical Search

TL;DR: In this article, the authors introduce an effective approach to generate data from existing medical corpora for training transformer-based re-ranking models inspired by the fact that the title is usually closely relevant to the abstract in a scientific article, they create a dataset by regarding titles as queries and the corresponding abstracts as relevant documents, and use an unsupervised IR model (eg, BM25) to sample negative examples.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.