Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

Chats0

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Posted Content

A Flexible Multi-Task Model for BERT Serving

Tianwen Wei, +2 more

- 12 Jul 2021 -

arXiv: Computation and Language

TL;DR: In this paper, a BERT-based multi-task (MT) framework was proposed for iterative and incremental development of the tasks, which was based on the idea of partial fine-tuning, i.e. only fine-tune some top layers of BERT while keeping the other layers frozen.

...read moreread less

Posted Content

Leveraging redundancy in attention with Reuse Transformers.

Srinadh Bhojanapalli, +7 more

- 13 Oct 2021 -

arXiv: Learning

TL;DR: In this article, the authors propose a novel architecture that reuses attention scores computed in one layer in multiple subsequent layers, and demonstrate that reusing attention delivers performance equivalent to or better than standard transformers, while reducing both compute and memory usage.

...read moreread less

Posted Content

Can Transformer Language Models Predict Psychometric Properties

Antonio Laverghetta, +3 more

- 12 Jun 2021 -

arXiv: Computation and Language

TL;DR: This article found that transformer-based LMs can predict psychometric properties consistently well in certain categories but consistently poorly in others, thus providing new insights into fundamental similarities and differences between human and LM reasoning.

...read moreread less

Posted Content

Sparse is Enough in Scaling Transformers.

Sebastian Jaszczur, +6 more

- 24 Nov 2021 -

arXiv: Learning

TL;DR: Scaling Transformers as discussed by the authors use sparse layers to scale efficiently and perform unbatched decoding much faster than the standard Transformer as we scale up the model size, and they achieve state-of-the-art performance on long text summarization.

...read moreread less

Proceedings ArticleDOI

Constructing a Relevance-oriented Dataset for Training Transformer Rankers for Medical Search

Zhi Zheng, +1 more

TL;DR: In this article, the authors introduce an effective approach to generate data from existing medical corpora for training transformer-based re-ranking models inspired by the fact that the title is usually closely relevant to the abstract in a scientific article, they create a dataset by regarding titles as queries and the corresponding abstracts as relevant documents, and use an unsupervised IR model (eg, BM25) to sample negative examples.

...read moreread less