Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content•

Language Models are Few-shot Multilingual Learners

[...]

Genta Indra Winata, Andrea Madotto, Zhaojiang Lin¹, Rosanne Liu, Jason Yosinski, Pascale Fung - Show less +2 more•Institutions (1)

Hong Kong University of Science and Technology¹

16 Sep 2021-arXiv: Computation and Language

TL;DR: The authors showed that given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones, and they are significantly better than random prediction.

...read moreread less

Abstract: General-purpose language models have demonstrated impressive capabilities, performing on par with state-of-the-art approaches on a range of downstream natural language processing (NLP) tasks and benchmarks when inferring instructions from very few examples. Here, we evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages without any parameter updates. We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones. Finally, we find the in-context few-shot cross-lingual prediction results of language models are significantly better than random prediction, and they are competitive compared to the existing state-of-the-art cross-lingual models.

...read moreread less

1 citations

Posted Content•

Supporting search engines with knowledge and context.

[...]

Nikos Voskarides

12 Feb 2021-arXiv: Information Retrieval

TL;DR: In this paper, the authors focus on multi-turn passage retrieval as an instance of conversational search, where the user interacts with the search engine to gather knowledge over large unstructured knowledge repositories.

...read moreread less

Abstract: Search engines leverage knowledge to improve information access. In order to effectively leverage knowledge, search engines should account for context, i.e., information about the user and query. In this thesis, we aim to support search engines in leveraging knowledge while accounting for context. In the first part of this thesis, we study how to make structured knowledge more accessible to the user when the search engine proactively provides such knowledge as context to enrich search results. As a first task, we study how to retrieve descriptions of knowledge facts from a text corpus. Next, we study how to automatically generate knowledge fact descriptions. And finally, we study how to contextualize knowledge facts, that is, to automatically find facts related to a query fact. In the second part of this thesis, we study how to improve interactive knowledge gathering. We focus on conversational search, where the user interacts with the search engine to gather knowledge over large unstructured knowledge repositories. We focus on multi-turn passage retrieval as an instance of conversational search. We propose to model query resolution as a term classification task and propose a method to address it. In the final part of this thesis, we focus on search engine support for professional writers in the news domain. We study how to support such writers create event-narratives by exploring knowledge from a corpus of news articles. We propose a dataset construction procedure for this task that relies on existing news articles to simulate incomplete narratives and relevant articles. We study the performance of multiple rankers, lexical and semantic, and provide insights into the characteristics of this task.

...read moreread less

1 citations

Spartans@LT-EDI-EACL2021: Inclusive Speech Detection using Pretrained Language Models

[...]

Megha Sharma, Gaurav Arora

01 Apr 2021

TL;DR: The authors used a transformer-based model RoBERTa using synthetically generated code-mixed data and used it in an ensemble along with their pre-trained ULMFiT model available from iNLTK.

...read moreread less

Abstract: We describe our system that ranked first in Hope Speech Detection (HSD) shared task and fourth in Offensive Language Identification (OLI) shared task, both in Tamil language. The goal of HSD and OLI is to identify if a code-mixed comment or post contains hope speech or offensive content respectively. We pre-train a transformer-based model RoBERTa using synthetically generated code-mixed data and use it in an ensemble along with their pre-trained ULMFiT model available from iNLTK.

...read moreread less

1 citations

Proceedings Article•DOI•

Shades of BLEU, Flavours of Success: The Case of MultiWOZ

[...]

Tomáš Nekvinda¹, Ondřej Dušek¹•Institutions (1)

Charles University in Prague¹

01 Aug 2021

TL;DR: In this article, the authors identify inconsistencies in data preprocessing and re-porting of three corpus-based metrics used on the MultiWOZ dataset, i.e., BLEU score and Inform &Success rates.

...read moreread less

Abstract: The MultiWOZ dataset (Budzianowski et al.,2018) is frequently used for benchmarkingcontext-to-response abilities of task-orienteddialogue systems. In this work, we identifyinconsistencies in data preprocessing and re-porting of three corpus-based metrics used onthis dataset, i.e., BLEU score and Inform &Success rates. We point out a few problemsof the MultiWOZ benchmark such as unsat-isfactory preprocessing, insufficient or under-specified evaluation metrics, or rigid database.We re-evaluate 7 end-to-end and 6 policy opti-mization models in as-fair-as-possible setups,and we show that their reported scores cannotbe directly compared. To facilitate compari-son of future systems, we release our stand-alone standardized evaluation scripts. We alsogive basic recommendations for corpus-basedbenchmarking in future works.

...read moreread less

1 citations

Proceedings Article•DOI•

Learning to Generate Task-Specific Adapters from Task Description

[...]

Qinyuan Ye¹, Xiang Ren¹•Institutions (1)

University of Southern California¹

01 Aug 2021

TL;DR: Hypter as mentioned in this paper improves text-to-text transformer's generalization ability to unseen tasks by training a hypernetwork to generate task-specific, light-weight adapters from task descriptions.

...read moreread less

Abstract: Pre-trained text-to-text transformers such as BART have achieved impressive performance across a range of NLP tasks. Recent study further shows that they can learn to generalize to novel tasks, by including task descriptions as part of the source sequence and training the model with (source, target) examples. At test time, these fine-tuned models can make inferences on new tasks using the new task descriptions as part of the input. However, this approach has potential limitations, as the model learns to solve individual (source, target) examples (i.e., at the instance level), instead of learning to solve tasks by taking all examples within a task as a whole (i.e., at the task level). To this end, we introduce Hypter, a framework that improves text-to-text transformer’s generalization ability to unseen tasks by training a hypernetwork to generate task-specific, light-weight adapters from task descriptions. Experiments on ZEST dataset and a synthetic SQuAD dataset demonstrate that Hypter improves upon fine-tuning baselines. Notably, when using BART-Large as the main network, Hypter brings 11.3% comparative improvement on ZEST dataset.

...read moreread less

1 citations