Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

Chats0

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Applying Transfer Learning for Improving Domain-Specific Search Experience Using Query to Question Similarity

Ankush Chopra, +2 more

- 07 Jan 2021 -

arXiv: Computation and Language

TL;DR: In this paper, a framework for calculating similarities between a given input query and a set of predefined questions to retrieve the question which matches to it the most was discussed, which can be generalized for any domain-specific search engine and can be used in other domains as well.

...read moreread less

Posted Content

Does BERT Pretrained on Clinical Notes Reveal Sensitive Data

Eric Lehman, +4 more

- 15 Apr 2021 -

arXiv: Computation and Language

TL;DR: In this paper, a battery of approaches intended to recover Personal Health Information (PHI) from a trained BERT was designed to recover patient names and conditions with which they are associated.

...read moreread less

Proceedings Article

$Q^2$: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering.

Or Honovich, +5 more

TL;DR: This paper proposed an automatic evaluation metric for factual consistency in knowledge-grounded dialogue using automatic question generation and question answering, which compares answer spans using natural language inference (NLI), instead of token-based matching as done in previous work.

...read moreread less

Posted Content

Aligning the Pretraining and Finetuning Objectives of Language Models

Nuo Wang Pierse, +1 more

- 05 Feb 2020 -

arXiv: Computation and Language

TL;DR: It is demonstrated that explicitly aligning the pretraining objectives to the finetuning objectives in language model training significantly improves the finETuning task performance and reduces the minimum amount of finetuned examples required, which allows us to build language models with smaller sizes for tasks with less available training data.

...read moreread less

Proceedings ArticleDOI

Doing more with less: training large DNN models on commodity servers for the masses

Youjie Li, +3 more

TL;DR: In this article, the authors advocate rethinking how DNN frameworks schedule computation and move data to push the boundaries of training large models efficiently on modest multi-GPU deployments, and propose an approach to train large DNN models on commodity servers.

...read moreread less

Collapse

arXiv: Computation and Language

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Citations

Applying Transfer Learning for Improving Domain-Specific Search Experience Using Query to Question Similarity

Does BERT Pretrained on Clinical Notes Reveal Sensitive Data

$Q^2$: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering.

Aligning the Pretraining and Finetuning Objectives of Language Models

Doing more with less: training large DNN models on commodity servers for the masses

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Attention is All you Need

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Adam: A Method for Stochastic Optimization

Bleu: a Method for Automatic Evaluation of Machine Translation

Trending Questions (1)