Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•

Grounded Graph Decoding improves Compositional Generalization in Question Answering

[...]

Yu Gai, Paras Jain¹, Wendi Zhang, Joseph E. Gonzalez¹, Dawn Song¹, Ion Stoica¹ - Show less +2 more•Institutions (1)

University of California, Berkeley¹

01 Nov 2021

TL;DR: The authors propose Grounded Graph Decoding, a method to improve compositional generalization of language representations by grounding structured predictions with an attention mechanism, which enables the model to retain syntax information from the input that significantly improves generalization to complex inputs.

...read moreread less

Abstract: Question answering models struggle to generalize to novel compositions of training patterns. Current end-to-end models learn a flat input embedding which can lose input syntax context. Prior approaches improve generalization by learning permutation invariant models, but these methods do not scale to more complex train-test splits. We propose Grounded Graph Decoding, a method to improve compositional generalization of language representations by grounding structured predictions with an attention mechanism. Grounding enables the model to retain syntax information from the input that significantly improves generalization to complex inputs. By predicting a structured graph containing conjunctions of query clauses, we learn a group invariant representation without making assumptions on the target domain. Our model performs competitively on the Compositional Freebase Questions (CFQ) dataset, a challenging benchmark for compositional generalization in question answering. Especially, our model effectively solves the MCD1 split with 98% accuracy. All source is available at https://github.com/gaiyu0/cfq.

...read moreread less

Proceedings Article•

WhyAct: Identifying Action Reasons in Lifestyle Vlogs

[...]

Oana Ignat¹, Santiago Castro¹, Hanwen Miao¹, Weiji Li¹, Rada Mihalcea¹ - Show less +1 more•Institutions (1)

University of Michigan¹

01 Nov 2021

TL;DR: In this article, a multimodal model was proposed to automatically infer human action reasons in online videos, focusing on the widespread genre of lifestyle vlogs, in which people perform actions while verbally describing them.

...read moreread less

Abstract: We aim to automatically identify human action reasons in online videos. We focus on the widespread genre of lifestyle vlogs, in which people perform actions while verbally describing them. We introduce and make publicly available the WhyAct dataset, consisting of 1,077 visual actions manually annotated with their reasons. We describe a multimodal model that leverages visual and textual information to automatically infer the reasons corresponding to an action presented in the video.

...read moreread less

Posted Content•

Doc2Dict: Information Extraction as Text Generation

[...]

Benjamin Townsend, Eamon Ito-Fisher, Lily H. Zhang¹, Madison May²•Institutions (2)

Franklin W. Olin College of Engineering¹, New York University²

16 May 2021-arXiv: Computation and Language

TL;DR: Doc2Dict as discussed by the authors uses a transformer language model trained on existing database records to directly generate structured JSON, removing the workload associated with producing token-level annotations and taking advantage of a data source which is generally quite plentiful (e.g. database records).

...read moreread less

Abstract: Typically, information extraction (IE) requires a pipeline approach: first, a sequence labeling model is trained on manually annotated documents to extract relevant spans; then, when a new document arrives, a model predicts spans which are then post-processed and standardized to convert the information into a database entry. We replace this labor-intensive workflow with a transformer language model trained on existing database records to directly generate structured JSON. Our solution removes the workload associated with producing token-level annotations and takes advantage of a data source which is generally quite plentiful (e.g. database records). As long documents are common in information extraction tasks, we use gradient checkpointing and chunked encoding to apply our method to sequences of up to 32,000 tokens on a single GPU. Our Doc2Dict approach is competitive with more complex, hand-engineered pipelines and offers a simple but effective baseline for document-level information extraction. We release our Doc2Dict model and code to reproduce our experiments and facilitate future work.

...read moreread less

Proceedings Article•DOI•

Virtual assistance using question generation Answering

[...]

Carol Sebastian¹, Princeton Baretto¹, Sherwin Pillai¹, Supriya Kamoji¹•Institutions (1)

Fr. Conceicao Rodrigues College of Engineering¹

25 Jun 2021

TL;DR: In this paper, a virtual assistant that answers any questions pertinent to the context is introduced, without letting the users read the internal document(s) and with the help of Twilio API, a cloud-based architecture capable of generating question-answer pair is brought into the WhatsApp interface, offering a user-friendly experience.

...read moreread less

Abstract: Question Generation and Answering, being a complex undertaking, has gained significant attention in the earlier years. Although significant leap forwards are accomplished, it needs imperative optimization when used in a real-time system. The paper proposes an approach to enhance the orthodox methods to support the collegiate programs by introducing a virtual assistant that answers any questions pertinent to the context; without letting the users read the internal document(s). With the help of Twilio API, a cloud-based architecture capable of generating a question-answer pair is brought into the WhatsApp interface, offering a user-friendly experience.

...read moreread less

Posted Content•

Musical Speech: A Transformer-based Composition Tool

[...]

Jason d'Eon¹, Sri Harsha Dumpala¹, Chandramouli Shama Sastry¹, Dani Oore², Sageev Oore - Show less +1 more•Institutions (2)

Dalhousie University¹, Memorial University of Newfoundland²

02 Aug 2021-arXiv: Sound

TL;DR: In this paper, the authors propose a new compositional tool that can generate a musical outline of speech recorded/provided by the user for use as a musical building block in their compositions, allowing any user to use their own speech to generate musical material, while still being able to hear the direct connection between their recorded speech and the resulting music.

...read moreread less

Abstract: In this paper, we propose a new compositional tool that will generate a musical outline of speech recorded/provided by the user for use as a musical building block in their compositions. The tool allows any user to use their own speech to generate musical material, while still being able to hear the direct connection between their recorded speech and the resulting music. The tool is built on our proposed pipeline. This pipeline begins with speech-based signal processing, after which some simple musical heuristics are applied, and finally these pre-processed signals are passed through Transformer models trained on new musical tasks. We illustrate the effectiveness of our pipeline -- which does not require a paired dataset for training -- through examples of music created by musicians making use of our tool.

...read moreread less