Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Multi-Task Retrieval for Knowledge-Intensive Tasks

Jean Maillard, +6 more

TL;DR: This paper proposed a multi-task trained neural retrieval model to retrieve relevant contexts from a large corpus for tasks such as open-domain question answering and fact checking, which outperforms traditional methods like tf-idf and BM25.

...read moreread less

Posted Content

AWS CORD-19 Search: A neural search engine for COVID-19 literature

Parminder Bhatia, +14 more

- 01 Jan 2021 -

arXiv: Information Retrieval

TL;DR: AWS CORD-19 Search (ACS) is presented, a public, COVID-19 specific, neural search engine that is powered by several machine learning systems to support natural language based searches and is top performing across these systems yielding quality results.

...read moreread less

Proceedings ArticleDOI

Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature

Yu Wang, +14 more

- 25 Jun 2021 -

arXiv: Information Retrieval

TL;DR: In this paper, the authors propose a general approach for vertical search based on domain-specific pretraining and present a case study for the biomedical domain, which performs comparably or better than the best systems in the official TREC-COVID evaluation.

...read moreread less

Proceedings ArticleDOI

Hierarchical Speaker-Aware Sequence-to-Sequence Model for Dialogue Summarization

Yuejie Lei, +5 more

TL;DR: The authors proposed a hierarchical transformer-based model for dialogue summarization, which encodes dialogues from words to utterances and distinguishes the relationships between speakers and their corresponding personal pronouns clearly.

...read moreread less

Proceedings ArticleDOI

Of Non-Linearity and Commutativity in BERT

Sumu Zhao, +3 more

TL;DR: In this paper, the authors proposed a method to measure the degree of non-linearity of different elements of transformers and found that skip connections are an inefficient yet important architectural element and that they cannot simply be replaced by attention blocks without a degradation in performance.

...read moreread less