Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization

[...]

Andre Esteva¹, Kale Anuprit¹, Romain Paulus¹, Kazuma Hashimoto¹, Wenpeng Yin¹, Dragomir R. Radev¹, Richard Socher¹ - Show less +3 more•Institutions (1)

Salesforce.com¹

12 Apr 2021

TL;DR: CO-Search is presented, a semantic, multi-stage, search engine designed to handle complex queries over the COVID-19 literature, potentially aiding overburdened health workers in finding scientific answers and avoiding misinformation during a time of crisis.

...read moreread less

Abstract: The COVID-19 global pandemic has resulted in international efforts to understand, track, and mitigate the disease, yielding a significant corpus of COVID-19 and SARS-CoV-2-related publications across scientific disciplines. Throughout 2020, over 400,000 coronavirus-related publications have been collected through the COVID-19 Open Research Dataset. Here, we present CO-Search, a semantic, multi-stage, search engine designed to handle complex queries over the COVID-19 literature, potentially aiding overburdened health workers in finding scientific answers and avoiding misinformation during a time of crisis. CO-Search is built from two sequential parts: a hybrid semantic-keyword retriever, which takes an input query and returns a sorted list of the 1000 most relevant documents, and a re-ranker, which further orders them by relevance. The retriever is composed of a deep learning model (Siamese-BERT) that encodes query-level meaning, along with two keyword-based models (BM25, TF-IDF) that emphasize the most important words of a query. The re-ranker assigns a relevance score to each document, computed from the outputs of (1) a question-answering module which gauges how much each document answers the query, and (2) an abstractive summarization module which determines how well a query matches a generated summary of the document. To account for the relatively limited dataset, we develop a text augmentation technique which splits the documents into pairs of paragraphs and the citations contained in them, creating millions of (citation title, paragraph) tuples for training the retriever. We evaluate our system ( http://einstein.ai/covid ) on the data of the TREC-COVID information retrieval challenge, obtaining strong performance across multiple key information retrieval metrics.

...read moreread less

69 citations

Proceedings Article•

Long Range Arena : A Benchmark for Efficient Transformers

[...]

Yi Tay¹, Mostafa Dehghani¹, Samira Abnar², Yikang Shen³, Dara Bahri¹, Philip Pham¹, Jinfeng Rao⁴, Liu Yang¹, Sebastian Ruder¹, Donald Metzler¹ - Show less +6 more•Institutions (4)

Google¹, University of Amsterdam², Université de Montréal³, University of Maryland, College Park⁴

03 May 2021

TL;DR: The Long Range Arena benchmark as discussed by the authors is a suite of tasks consisting of sequences ranging from 1K to 16k tokens, encompassing a wide range of data types and modalities such as text, natural, synthetic images, and mathematical expressions requiring similarity, structural, and visual-spatial reasoning.

...read moreread less

Abstract: Transformers do not scale very well to long sequence lengths largely because of quadratic self-attention complexity. In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models. To this date, there is no well-established consensus on how to evaluate this class of models. Moreover, inconsistent benchmarking on a wide spectrum of tasks and datasets makes it difficult to assess relative model quality amongst many models. This paper proposes a systematic and unified benchmark, Long Range Arena, specifically focused on evaluating model quality under long-context scenarios. Our benchmark is a suite of tasks consisting of sequences ranging from 1K to 16K tokens, encompassing a wide range of data types and modalities such as text, natural, synthetic images, and mathematical expressions requiring similarity, structural, and visual-spatial reasoning. We systematically evaluate ten well-established long-range Transformer models (Reformers, Linformers, Linear Transformers, Sinkhorn Transformers, Performers, Synthesizers, Sparse Transformers, and Longformers) on our newly proposed benchmark suite. Long Range Arena paves the way towards better understanding this class of efficient Transformer models, facilitates more research in this direction, and presents new challenging tasks to tackle.

...read moreread less

68 citations

Posted Content•

An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models

[...]

Lifu Tu¹, Garima Lalwani², Spandana Gella², He He³•Institutions (3)

Toyota Technological Institute at Chicago¹, Amazon.com², New York University³

14 Jul 2020-arXiv: Computation and Language

TL;DR: This work proposes to use multi-task learning (MTL) to improve generalization in the case of extreme minority models, and shows that MTL with the right auxiliary tasks significantly improves performance on challenging examples without hurting the in-distribution performance.

...read moreread less

Abstract: Recent work has shown that pre-trained language models such as BERT improve robustness to spurious correlations in the dataset. Intrigued by these results, we find that the key to their success is generalization from a small amount of counterexamples where the spurious correlations do not hold. When such minority examples are scarce, pre-trained models perform as poorly as models trained from scratch. In the case of extreme minority, we propose to use multi-task learning (MTL) to improve generalization. Our experiments on natural language inference and paraphrase identification show that MTL with the right auxiliary tasks significantly improves performance on challenging examples without hurting the in-distribution performance. Further, we show that the gain from MTL mainly comes from improved generalization from the minority examples. Our results highlight the importance of data diversity for overcoming spurious correlations.

...read moreread less

67 citations

Proceedings Article•DOI•

The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents

[...]

Kurt Shuster¹, Da Ju¹, Stephen Roller¹, Emily Dinan¹, Y-Lan Boureau¹, Jason Weston¹ - Show less +2 more•Institutions (1)

Facebook¹

01 Jul 2020

TL;DR: D dodecaDialogue is introduced, a set of 12 tasks that measures if a conversational agent can communicate engagingly with personality and empathy, and that the multi-tasking in general provides gains to both text and image-based tasks using several metrics in both the fine-tune and task transfer settings.

...read moreread less

Abstract: We introduce dodecaDialogue: a set of 12 tasks that measures if a conversational agent can communicate engagingly with personality and empathy, ask questions, answer questions by utilizing knowledge resources, discuss topics and situations, and perceive and converse about images. By multi-tasking on such a broad large-scale set of data, we hope to both move towards and measure progress in producing a single unified agent that can perceive, reason and converse with humans in an open-domain setting. We show that such multi-tasking improves over a BERT pre-trained baseline, largely due to multi-tasking with very large dialogue datasets in a similar domain, and that the multi-tasking in general provides gains to both text and image-based tasks using several metrics in both the fine-tune and task transfer settings. We obtain state-of-the-art results on many of the tasks, providing a strong baseline for this challenge.

...read moreread less

65 citations

Posted Content•

NAS-Bench-NLP: Neural Architecture Search Benchmark for Natural Language Processing.

[...]

Nikita Klyuchnikov, Ilya Trofimov, Ekaterina Artemova, Mikhail Salnikov, Maxim V. Fedorov, Evgeny Burnaev - Show less +2 more

12 Jun 2020-arXiv: Learning

TL;DR: This work step outside the computer vision domain by leveraging the language modeling task, which is the core of natural language processing (NLP), and considers that the benchmark will provide more reliable empirical findings in the community and stimulate progress in developing new NAS methods well suited for recurrent architectures.

...read moreread less

Abstract: Neural Architecture Search (NAS) is a promising and rapidly evolving research area. Training a large number of neural networks requires an exceptional amount of computational power, which makes NAS unreachable for those researchers who have limited or no access to high-performance clusters and supercomputers. A few benchmarks with precomputed neural architectures performances have been recently introduced to overcome this problem and ensure more reproducible experiments. However, these benchmarks are only for the computer vision domain and, thus, are built from the image datasets and convolution-derived architectures. In this work, we step outside the computer vision domain by leveraging the language modeling task, which is the core of natural language processing (NLP). Our main contribution is as follows: we have provided search space of recurrent neural networks on the text datasets and trained 14k architectures within it; we have conducted both intrinsic and extrinsic evaluation of the trained models using datasets for semantic relatedness and language understanding evaluation; finally, we have tested several NAS algorithms to demonstrate how the precomputed results can be utilized. We believe that our results have high potential of usage for both NAS and NLP communities.

...read moreread less

65 citations