Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content•

Mathematical Reasoning via Self-supervised Skip-tree Training

[...]

Markus N. Rabe¹, Dennis Lee¹, Kshitij Bansal¹, Christian Szegedy¹•Institutions (1)

Google¹

08 Jun 2020-arXiv: Learning

TL;DR: It is found that models trained on the skip-tree task show surprisingly strong mathematical reasoning abilities, and outperform modelstrained on standard skip-sequence tasks.

...read moreread less

Abstract: We examine whether self-supervised language modeling applied to mathematical formulas enables logical reasoning. We suggest several logical reasoning tasks that can be used to evaluate language models trained on formal mathematical statements, such as type inference, suggesting missing assumptions and completing equalities. To train language models for formal mathematics, we propose a novel skip-tree task. We find that models trained on the skip-tree task show surprisingly strong mathematical reasoning abilities, and outperform models trained on standard skip-sequence tasks. We also analyze the models' ability to formulate new conjectures by measuring how often the predictions are provable and useful in other proofs.

...read moreread less

27 citations

Book Chapter•DOI•

Exploring Text-Transformers in AAAI 2021 Shared Task: COVID-19 Fake News Detection in English

[...]

Xiangyang Li¹, Yu Xia¹, Xiang Long², Zheng Li¹, Sujian Li¹ - Show less +1 more•Institutions (2)

Peking University¹, Beijing University of Posts and Telecommunications²

08 Feb 2021

TL;DR: This paper proposed an ensemble method of different pre-trained language models such as BERT, Roberta, Ernie, etc with various training strategies including warm-up, learning rate schedule and k-fold cross-validation for COVID-19 Fake News Detection in English.

...read moreread less

Abstract: In this paper, we describe our system for the AAAI 2021 shared task of COVID-19 Fake News Detection in English, where we achieved the 3rd position with the weighted \(F_1\) score of 0.9859 on the test set. Specifically, we proposed an ensemble method of different pre-trained language models such as BERT, Roberta, Ernie, etc. with various training strategies including warm-up, learning rate schedule and k-fold cross-validation. We also conduct an extensive analysis of the samples that are not correctly classified. The code is available at: https://github.com/archersama/3rd-solution-COVID19-Fake-News-Detection-in-English.

...read moreread less

27 citations

Proceedings Article•DOI•

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

[...]

18 Sep 2022

TL;DR: XLS-R as discussed by the authors is a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0, which is trained with up to 2B parameters on nearly half a million hours of publicly available speech audio in 128 languages.

...read moreread less

Abstract: This paper presents XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a million hours of publicly available speech audio in 128 languages, an order of magnitude more public data than the largest known prior work. Our evaluation covers a wide range of tasks, domains, data regimes and languages, both high and low-resource. On the CoVoST-2 speech translation benchmark, we improve the previous state of the art by an average of 7.4 BLEU over 21 translation directions into English. For speech recognition, XLS-R improves over the best known prior work on BABEL, MLS, CommonVoice as well as VoxPopuli, lowering error rates by 14-34% relative on average. XLS-R also sets a new state of the art on VoxLingua107 language identification. Moreover, we show that with sufficient model size, cross-lingual pretraining can outperform English-only pretraining when translating English speech into other languages, a setting which favors monolingual pretraining. We hope XLS-R can help to improve speech processing tasks for many more languages of the world.

...read moreread less

27 citations

Posted Content•

Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification

[...]

Shengding Hu, Ning Ding, Huadong Wang, Zhiyuan Liu, Juanzi Li, Maosong Sun - Show less +2 more

04 Aug 2021-arXiv: Computation and Language

TL;DR: In this article, the authors focus on incorporating external knowledge into the verbalizer, forming a knowledgeable prompt-tuning (KPT) to improve and stabilize the performance of pre-trained language models with task-specific prompts.

...read moreread less

Abstract: Tuning pre-trained language models (PLMs) with task-specific prompts has been a promising approach for text classification. Particularly, previous studies suggest that prompt-tuning has remarkable superiority in the low-data scenario over the generic fine-tuning methods with extra classifiers. The core idea of prompt-tuning is to insert text pieces, i.e., template, to the input and transform a classification problem into a masked language modeling problem, where a crucial step is to construct a projection, i.e., verbalizer, between a label space and a label word space. A verbalizer is usually handcrafted or searched by gradient descent, which may lack coverage and bring considerable bias and high variances to the results. In this work, we focus on incorporating external knowledge into the verbalizer, forming a knowledgeable prompt-tuning (KPT), to improve and stabilize prompt-tuning. Specifically, we expand the label word space of the verbalizer using external knowledge bases (KBs) and refine the expanded label word space with the PLM itself before predicting with the expanded label word space. Extensive experiments on zero and few-shot text classification tasks demonstrate the effectiveness of knowledgeable prompt-tuning.

...read moreread less

27 citations

Proceedings Article•DOI•

Differentiable Open-Ended Commonsense Reasoning

[...]

Bill Yuchen Lin¹, Haitian Sun², Bhuwan Dhingra², Manzil Zaheer², Xiang Ren¹, William W. Cohen² - Show less +2 more•Institutions (2)

University of Southern California¹, Google²

01 Jun 2021

TL;DR: DrFact is proposed, an efficient Differentiable model for multi-hop Reasoning over knowledge Facts, which outperforms strong baseline methods by a large margin and is evaluated to evaluate OpenCSR methods.

...read moreread less

Abstract: Current commonsense reasoning research focuses on developing models that use commonsense knowledge to answer multiple-choice questions. However, systems designed to answer multiple-choice questions may not be useful in applications that do not provide a small list of candidate answers to choose from. As a step towards making commonsense reasoning research more realistic, we propose to study open-ended commonsense reasoning (OpenCSR) — the task of answering a commonsense question without any pre-defined choices — using as a resource only a corpus of commonsense facts written in natural language. OpenCSR is challenging due to a large decision space, and because many questions require implicit multi-hop reasoning. As an approach to OpenCSR, we propose DrFact, an efficient Differentiable model for multi-hop Reasoning over knowledge Facts. To evaluate OpenCSR methods, we adapt several popular commonsense reasoning benchmarks, and collect multiple new answers for each test question via crowd-sourcing. Experiments show that DrFact outperforms strong baseline methods by a large margin.

...read moreread less

27 citations