Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

SB_NITK at MEDIQA 2021: Leveraging Transfer Learning for Question Summarization in Medical Domain

Spandana Balumuri, +2 more

TL;DR: In this article, the authors used transfer learning to improve the performance of summarization of consumer health questions in the MEDIQA 2021 Question Summarization shared task and achieved a ROUGE-2 F1 score of 0.139.

...read moreread less

Book ChapterDOI

Information Extraction/Entailment of Common Law and Civil Code

John Hudzina, +6 more

TL;DR: This paper evaluated different approaches on handling entailment tasks for small domain-specific data sets provided in the Competition on Legal Information Extraction/Entailment (COLIEE), which focused on legal information processing and finding textual entailment on legal data.

...read moreread less

Proceedings ArticleDOI

DuoRAT: Towards Simpler Text-to-SQL Models

Torsten Scholak, +4 more

- 21 Oct 2020 -

arXiv: Computation and Language

TL;DR: This paper begins by building DuoRAT, a re-implementation of the state-of-the-art RAT-SQL model that unlike RAC-SQL is using only relation-aware or vanilla transformers as the building blocks, and performs several ablation experiments using DuoR AT as the baseline model.

...read moreread less

Posted Content

Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language.

Hassan Akbari, +8 more

- 18 Nov 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A new model architecture for learning multi-modal neuro-symbolic representations for video captioning using a dictionary learning-based method that incorporates modality-specific inductive biases for the captioning task is proposed.

...read moreread less

Proceedings ArticleDOI

SUPER: SUb-Graph Parallelism for TransformERs

Arpan Jain, +6 more

TL;DR: In this article, the authors proposed sub-graph parallelism to accelerate the training of Transformer models and generalize the concept to any neural network with multiple branches, which can be used to speed up training of neural networks.

...read moreread less