Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

Chats0

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Book ChapterDOI

Towards an Italian Healthcare Knowledge Graph

Marco Postiglione

TL;DR: In this article, a transformer language model and a few-shot approach are used to construct a knowledge graph (KG) and then similarity-based deep learning techniques are applied on the constructed KG for downstream applications.

...read moreread less

Posted Content

Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting

Sheng-Chieh Lin, +5 more

- 05 May 2020 -

arXiv: Computation and Language

TL;DR: In this article, two conversational query reformulation methods, namely term importance estimation and neural query rewriting, are proposed to address query ambiguities in a multi-stage ad-hoc IR system.

...read moreread less

Posted Content

"You are grounded!": Latent Name Artifacts in Pre-trained Language Models

Vered Shwartz, +2 more

- 06 Apr 2020 -

arXiv: Computation and Language

TL;DR: The authors focus on artifacts associated with the representation of given names (e.g., Donald), which, depending on the corpus, may be associated with specific entities, as indicated by next token prediction.

...read moreread less

Proceedings ArticleDOI

Towards BERT-based Automatic ICD Coding: Limitations and Opportunities

Damian Pascual, +2 more

TL;DR: This article investigated ICD coding using PubMedBERT, a state-of-the-art transformer model for biomedical language understanding, and found that the difficulty of fine-tuning the model on long pieces of text is the main limitation for BERT-based models on ICD decoding.

...read moreread less

Proceedings ArticleDOI

IR From Bag-of-words to BERT and Beyond through Practical Experiments

Craig Macdonald, +2 more

TL;DR: In this article, the authors present a full-day tutorial on neural ranking techniques for ad hoc search using BERT and other pre-trained contextualized language models, such as T5 and BERT-PRF.

...read moreread less