Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

Chats0

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Posted Content

LAnoBERT : System Log Anomaly Detection based on BERT Masked Language Model.

Yukyung Lee, +2 more

- 18 Nov 2021 -

arXiv: Learning

TL;DR: In this article, a parser free system log anomaly detection method that uses the BERT model, exhibiting excellent natural language processing performance, is proposed, which learns the model through masked language modeling, which is a BERT-based pre-training method, and proceeds with unsupervised learning-based anomaly detection using the masked language modelling loss function per log key word during the inference process.

...read moreread less

Posted Content

ContraQA: Question Answering under Contradicting Contexts

Liangming Pan, +3 more

- 15 Oct 2021 -

arXiv: Computation and Language

TL;DR: This paper study the risk of misinformation to QA models by investigating the behavior of the QA model under contradicting contexts that are mixed with both real and fake information and build a misinformation-aware QA system as a counter-measure that integrates question answering and misinformation detection in a joint fashion.

...read moreread less

Proceedings Article

kFolden: k-Fold Ensemble for Out-Of-Distribution Detection.

Xiaoya Li, +7 more

TL;DR: This paper proposed a simple yet effective framework kFolden, which mimics the behaviors of OOD detection during training without the use of any external data, which induces k sub-models, each of which is trained on a subset with k-1 categories with the left category masked unknown to the sub-model.

...read moreread less

Posted Content

Constrained Text Generation with Global Guidance - Case Study on CommonGen.

Yixian Liu, +4 more

- 12 Mar 2021 -

arXiv: Computation and Language

TL;DR: This paper used reinforcement learning to address the limitation, measuring global constraints including fluency, common sense and concept coverage with a comprehensive score, which serves as the reward for reinforcement learning, and designed a guided decoding method at the word, fragment and sentence levels.

...read moreread less

Posted Content

Domain-robust VQA with diverse datasets and methods but no target labels

Mingda Zhang, +4 more

- 29 Mar 2021 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this article, the authors quantify domain shifts between popular VQA datasets, in both visual and textual space, and test the robustness of different families of visual question answering methods (two-stream, transformer and neuro-symbolic) to these shifts.

...read moreread less