scispace - formally typeset
Search or ask a question
Journal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article
01 Nov 2021
TL;DR: The authors propose Grounded Graph Decoding, a method to improve compositional generalization of language representations by grounding structured predictions with an attention mechanism, which enables the model to retain syntax information from the input that significantly improves generalization to complex inputs.
Abstract: Question answering models struggle to generalize to novel compositions of training patterns. Current end-to-end models learn a flat input embedding which can lose input syntax context. Prior approaches improve generalization by learning permutation invariant models, but these methods do not scale to more complex train-test splits. We propose Grounded Graph Decoding, a method to improve compositional generalization of language representations by grounding structured predictions with an attention mechanism. Grounding enables the model to retain syntax information from the input that significantly improves generalization to complex inputs. By predicting a structured graph containing conjunctions of query clauses, we learn a group invariant representation without making assumptions on the target domain. Our model performs competitively on the Compositional Freebase Questions (CFQ) dataset, a challenging benchmark for compositional generalization in question answering. Especially, our model effectively solves the MCD1 split with 98% accuracy. All source is available at https://github.com/gaiyu0/cfq.
Proceedings Article
Oana Ignat1, Santiago Castro1, Hanwen Miao1, Weiji Li1, Rada Mihalcea1 
01 Nov 2021
TL;DR: In this article, a multimodal model was proposed to automatically infer human action reasons in online videos, focusing on the widespread genre of lifestyle vlogs, in which people perform actions while verbally describing them.
Abstract: We aim to automatically identify human action reasons in online videos. We focus on the widespread genre of lifestyle vlogs, in which people perform actions while verbally describing them. We introduce and make publicly available the WhyAct dataset, consisting of 1,077 visual actions manually annotated with their reasons. We describe a multimodal model that leverages visual and textual information to automatically infer the reasons corresponding to an action presented in the video.
Posted Content
TL;DR: Doc2Dict as discussed by the authors uses a transformer language model trained on existing database records to directly generate structured JSON, removing the workload associated with producing token-level annotations and taking advantage of a data source which is generally quite plentiful (e.g. database records).
Abstract: Typically, information extraction (IE) requires a pipeline approach: first, a sequence labeling model is trained on manually annotated documents to extract relevant spans; then, when a new document arrives, a model predicts spans which are then post-processed and standardized to convert the information into a database entry. We replace this labor-intensive workflow with a transformer language model trained on existing database records to directly generate structured JSON. Our solution removes the workload associated with producing token-level annotations and takes advantage of a data source which is generally quite plentiful (e.g. database records). As long documents are common in information extraction tasks, we use gradient checkpointing and chunked encoding to apply our method to sequences of up to 32,000 tokens on a single GPU. Our Doc2Dict approach is competitive with more complex, hand-engineered pipelines and offers a simple but effective baseline for document-level information extraction. We release our Doc2Dict model and code to reproduce our experiments and facilitate future work.
Proceedings ArticleDOI
25 Jun 2021
TL;DR: In this paper, a virtual assistant that answers any questions pertinent to the context is introduced, without letting the users read the internal document(s) and with the help of Twilio API, a cloud-based architecture capable of generating question-answer pair is brought into the WhatsApp interface, offering a user-friendly experience.
Abstract: Question Generation and Answering, being a complex undertaking, has gained significant attention in the earlier years. Although significant leap forwards are accomplished, it needs imperative optimization when used in a real-time system. The paper proposes an approach to enhance the orthodox methods to support the collegiate programs by introducing a virtual assistant that answers any questions pertinent to the context; without letting the users read the internal document(s). With the help of Twilio API, a cloud-based architecture capable of generating a question-answer pair is brought into the WhatsApp interface, offering a user-friendly experience.
Posted Content
TL;DR: In this paper, the authors propose a new compositional tool that can generate a musical outline of speech recorded/provided by the user for use as a musical building block in their compositions, allowing any user to use their own speech to generate musical material, while still being able to hear the direct connection between their recorded speech and the resulting music.
Abstract: In this paper, we propose a new compositional tool that will generate a musical outline of speech recorded/provided by the user for use as a musical building block in their compositions. The tool allows any user to use their own speech to generate musical material, while still being able to hear the direct connection between their recorded speech and the resulting music. The tool is built on our proposed pipeline. This pipeline begins with speech-based signal processing, after which some simple musical heuristics are applied, and finally these pre-processed signals are passed through Transformer models trained on new musical tasks. We illustrate the effectiveness of our pipeline -- which does not require a paired dataset for training -- through examples of music created by musicians making use of our tool.
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.