scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Reads0
Chats0
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book ChapterDOI

Towards an Italian Healthcare Knowledge Graph

TL;DR: In this article, a transformer language model and a few-shot approach are used to construct a knowledge graph (KG) and then similarity-based deep learning techniques are applied on the constructed KG for downstream applications.
Posted Content

Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting

TL;DR: In this article, two conversational query reformulation methods, namely term importance estimation and neural query rewriting, are proposed to address query ambiguities in a multi-stage ad-hoc IR system.
Posted Content

"You are grounded!": Latent Name Artifacts in Pre-trained Language Models

TL;DR: The authors focus on artifacts associated with the representation of given names (e.g., Donald), which, depending on the corpus, may be associated with specific entities, as indicated by next token prediction.
Proceedings ArticleDOI

Towards BERT-based Automatic ICD Coding: Limitations and Opportunities

TL;DR: This article investigated ICD coding using PubMedBERT, a state-of-the-art transformer model for biomedical language understanding, and found that the difficulty of fine-tuning the model on long pieces of text is the main limitation for BERT-based models on ICD decoding.
Proceedings ArticleDOI

IR From Bag-of-words to BERT and Beyond through Practical Experiments

TL;DR: In this article, the authors present a full-day tutorial on neural ranking techniques for ad hoc search using BERT and other pre-trained contextualized language models, such as T5 and BERT-PRF.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.