Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

AutoBERT-Zero: Evolving BERT Backbone from Scratch

- 28 Jun 2022 -

Proceedings of the ... AAAI Conference o...

TL;DR: This paper proposed an operation-priority neural architecture search (OP-NAS) algorithm, which optimizes both the search algorithm and evaluation of candidate models, and designed a Bi-branch Weight-Sharing (BIWS) training strategy for fast model evaluation.

...read moreread less

Posted Content

Generate & Rank: A Multi-task Framework for Math Word Problems

Jianhao Shen, +6 more

- 07 Sep 2021 -

arXiv: Computation and Language

TL;DR: This paper proposed Generate & Rank, a multi-task framework based on a generative pre-trained language model, which jointly learns from its own mistakes and is able to distinguish between correct and incorrect expressions.

...read moreread less

Proceedings Article

Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression

Canwen Xu, +5 more

TL;DR: The authors proposed two new metrics, label loyalty and probability loyalty, to measure how closely a compressed model mimics the original model (i.e., student) and explore the effect of compression with regard to robustness under adversarial attacks.

...read moreread less

Posted Content

NER-BERT: A Pre-trained Model for Low-Resource Entity Tagging.

Zihan Liu, +4 more

- 01 Dec 2021 -

arXiv: Computation and Language

TL;DR: Zhang et al. as mentioned in this paper constructed a massive NER corpus with a relatively high quality, and they pre-trained a NER-BERT model based on the created dataset.

...read moreread less

Posted Content

Open-Domain Conversational Search Assistant with Transformers

Rafael Ferreira, +3 more

- 20 Jan 2021 -

arXiv: Information Retrieval

TL;DR: The authors propose an open-domain abstractive conversational search agent pipeline to address two major challenges: first, conversation context-aware search and second, abstractive search-answers generation.

...read moreread less