Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content•

TruthBot: An Automated Conversational Tool for Intent Learning, Curated Information Presenting, and Fake News Alerting

[...]

Ankur Gupta¹, Yash Varun¹, Prarthana Das¹, Nithya Muttineni¹, Parth Srivastava¹, Hamim Zafar¹, Tanmoy Chakraborty¹, Swaprava Nath² - Show less +4 more•Institutions (2)

Indian Institute of Technology Kanpur¹, Indraprastha Institute of Information Technology²

31 Jan 2021-arXiv: Social and Information Networks

TL;DR: The TruthBot as discussed by the authors is an all-in-one multilingual conversational chatbot designed for seeking truth (trustworthy and verified information) on specific topics, which helps users to obtain information specific to certain topics, fact-check information, and get recent news.

...read moreread less

Abstract: We present TruthBot, an all-in-one multilingual conversational chatbot designed for seeking truth (trustworthy and verified information) on specific topics. It helps users to obtain information specific to certain topics, fact-check information, and get recent news. The chatbot learns the intent of a query by training a deep neural network from the data of the previous intents and responds appropriately when it classifies the intent in one of the classes above. Each class is implemented as a separate module that uses either its own curated knowledge-base or searches the web to obtain the correct information. The topic of the chatbot is currently set to COVID-19. However, the bot can be easily customized to any topic-specific responses. Our experimental results show that each module performs significantly better than its closest competitor, which is verified both quantitatively and through several user-based surveys in multiple languages. TruthBot has been deployed in June 2020 and is currently running.

...read moreread less

Posted Content•

Unsupervised Source Separation By Steering Pretrained Music Models

[...]

Ethan Manilow, Patrick O'Reilly, Prem Seetharaman, Bryan Pardo

25 Oct 2021-arXiv: Sound

TL;DR: In this paper, an unsupervised method that repurposes deep models trained for music generation and music tagging for audio source separation, without any retraining, is presented, where an audio generation model is conditioned on an input mixture, producing a latent encoding of the audio used to generate audio.

...read moreread less

Abstract: We showcase an unsupervised method that repurposes deep models trained for music generation and music tagging for audio source separation, without any retraining. An audio generation model is conditioned on an input mixture, producing a latent encoding of the audio used to generate audio. This generated audio is fed to a pretrained music tagger that creates source labels. The cross-entropy loss between the tag distribution for the generated audio and a predefined distribution for an isolated source is used to guide gradient ascent in the (unchanging) latent space of the generative model. This system does not update the weights of the generative model or the tagger, and only relies on moving through the generative model's latent space to produce separated sources. We use OpenAI's Jukebox as the pretrained generative model, and we couple it with four kinds of pretrained music taggers (two architectures and two tagging datasets). Experimental results on two source separation datasets, show this approach can produce separation estimates for a wider variety of sources than any tested supervised or unsupervised system. This work points to the vast and heretofore untapped potential of large pretrained music models for audio-to-audio tasks like source separation.

...read moreread less

Context or No Context? A preliminary exploration of human-in-the-loop approach for Incremental Temporal Summarization in meetings

[...]

Nicole Beckage, Shachi H. Kumar, Saurav Sahay, Ramesh R. Manuvinakurike

01 Nov 2021

TL;DR: This paper examined the extent to which human abstractive summaries of the preceding increments (context) can be combined with extractive meeting dialogue to generate abstractive summarization, and found that previous context improves ROUGE scores.

...read moreread less

Abstract: Incremental meeting temporal summarization, summarizing relevant information of partial multi-party meeting dialogue, is emerging as the next challenge in summarization research. Here we examine the extent to which human abstractive summaries of the preceding increments (context) can be combined with extractive meeting dialogue to generate abstractive summaries. We find that previous context improves ROUGE scores. Our findings further suggest that contexts begin to outweigh the dialogue. Using keyphrase extraction and semantic role labeling (SRL), we find that SRL captures relevant information without overwhelming the the model architecture. By compressing the previous contexts by ~70%, we achieve better ROUGE scores over our baseline models. Collectively, these results suggest that context matters, as does the way in which context is presented to the model.

...read moreread less

Posted Content•

Tackling Multi-Answer Open-Domain Questions via a Recall-then-Verify Framework.

[...]

Zhihong Shao¹, Minlie Huang¹•Institutions (1)

Tsinghua University¹

16 Oct 2021-arXiv: Computation and Language

TL;DR: This paper propose a recall-then-verify framework, which separates the reasoning process of each answer so that they can make better use of retrieved evidence while also leveraging the power of large models under the same memory constraint.

...read moreread less

Abstract: Open domain questions are likely to be open-ended and ambiguous, leading to multiple valid answers. Existing approaches typically adopt the rerank-then-read framework, where a reader reads top-ranking evidence to predict answers. According to our empirical analyses, this framework is faced with three problems: to leverage the power of a large reader, the reranker is forced to select only a few relevant passages that cover diverse answers, which is non-trivial due to unknown effect on the reader's performance; the small reading budget also prevents the reader from making use of valuable retrieved evidence filtered out by the reranker; besides, as the reader generates predictions all at once based on all selected evidence, it may learn pathological dependencies among answers, i.e., whether to predict an answer may also depend on evidence of the other answers. To avoid these problems, we propose to tackle multi-answer open-domain questions with a recall-then-verify framework, which separates the reasoning process of each answer so that we can make better use of retrieved evidence while also leveraging the power of large models under the same memory constraint. Our framework achieves new state-of-the-art results on two multi-answer datasets, and predicts significantly more gold answers than a rerank-then-read system with an oracle reranker.

...read moreread less

Posted Content•

Flexible Generation of Natural Language Deductions

[...]

Kaj Bostrom¹, Xinyu Zhao¹, Swarat Chaudhuri², Greg Durrett¹•Institutions (2)

University of Texas at Austin¹, Rice University²

18 Apr 2021-arXiv: Computation and Language

TL;DR: The authors proposed ParaPattern, a method for building models to generate deductive inferences from diverse natural language inputs without direct human supervision, which can cope with variation in how meaning is expressed while remaining precise.

...read moreread less

Abstract: An interpretable system for open-domain reasoning needs to express its reasoning process in a transparent form. Natural language is an attractive representation for this purpose -- it is both highly expressive and easy for humans to understand. However, manipulating natural language statements in logically consistent ways is hard: models must cope with variation in how meaning is expressed while remaining precise. In this paper, we describe ParaPattern, a method for building models to generate deductive inferences from diverse natural language inputs without direct human supervision. We train BART-based models (Lewis et al., 2020) to generate the result of applying a particular logical operation to one or more premise statements. Crucially, we develop a largely automated pipeline for constructing suitable training examples from Wikipedia. We evaluate our models using out-of-domain sentence compositions from the QASC (Khot et al., 2020) and EntailmentBank (Dalvi et al., 2021) datasets as well as targeted perturbation sets. Our results show that our models are substantially more accurate and flexible than baseline systems. ParaPattern achieves 85% validity on examples of the 'substitution' operation from EntailmentBank without the use of any in-domain training data, matching the performance of a model fine-tuned for EntailmentBank. The full source code for our method is publicly available.

...read moreread less