Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark

Tatiana Shavrina, +9 more

TL;DR: This paper introduces an advanced Russian general language understanding evaluation benchmark – Russian SuperGLUE and presents the first results of comparing multilingual models in the translated diagnostic test set and offers the first steps to further expanding or assessing State-of theart models independently of language.

...read moreread less

Proceedings ArticleDOI

Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction

Yaojie Lu, +8 more

TL;DR: The authors proposed Text2Event, a sequence-to-structure generation paradigm that can directly extract events from the text in an end-toend manner, which can achieve competitive performance using only record-level annotations in both supervised learning and transfer learning settings.

...read moreread less

Proceedings ArticleDOI

Improving Neural Topic Models using Knowledge Distillation

Alexander Miserlis Hoyle, +2 more

TL;DR: This work uses knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers to improve topic quality, and shows that the adaptable framework not only improves performance in the aggregate over all estimated topics, but also in head-to-head comparisons of aligned topics.

...read moreread less

Posted Content

Robustness Gym: Unifying the NLP Evaluation Landscape

Karan Goel, +8 more

- 13 Jan 2021 -

arXiv: Computation and Language

TL;DR: Robustness Gym as discussed by the authors is a simple and extensible evaluation toolkit that unifies four standard evaluation paradigms: subpopulations, transformations, evaluation sets, and adversarial attacks.

...read moreread less

Proceedings Article

Rethinking Positional Encoding in Language Pre-training

Guolin Ke, +2 more

TL;DR: The authors proposed a new positional encoding method called \textbf{T}ransformer with Untied positional embeddings (TUPE), which unties the symbol from other positions, making it easier to capture information from all positions.

...read moreread less