Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content•

Mitigating Media Bias through Neutral Article Generation

[...]

Nayeon Lee¹, Yejin Bang, Andrea Madotto, Pascale Fung•Institutions (1)

Hong Kong University of Science and Technology¹

01 Apr 2021-arXiv: Computation and Language

TL;DR: This article proposed a new task, a single neutralized article generation out of multiple biased articles, to facilitate more efficient access to balanced and unbiased information, and provided baselines and multiple analyses to serve as a solid starting point for the proposed task.

...read moreread less

Abstract: Media bias can lead to increased political polarization, and thus, the need for automatic mitigation methods is growing. Existing mitigation work displays articles from multiple news outlets to provide diverse news coverage, but without neutralizing the bias inherent in each of the displayed articles. Therefore, we propose a new task, a single neutralized article generation out of multiple biased articles, to facilitate more efficient access to balanced and unbiased information. In this paper, we compile a new dataset NeuWS, define an automatic evaluation metric, and provide baselines and multiple analyses to serve as a solid starting point for the proposed task. Lastly, we obtain a human evaluation to demonstrate the alignment between our metric and human judgment.

...read moreread less

1 citations

Posted Content•

Joint Multimedia Event Extraction from Video and Article

[...]

Brian Chen¹, Xudong Lin¹, Christopher Thomas², Manling Li³, Shoya Yoshida, Lovish Chum, Heng Ji⁴, Shih-Fu Chang¹ - Show less +4 more•Institutions (4)

Columbia University¹, University of Pittsburgh², Rensselaer Polytechnic Institute³, University of Illinois at Urbana–Champaign⁴

27 Sep 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a self-supervised multimodal event coreference model was proposed to determine coreference between video events and text events without any manually annotated pairs, which achieved a 6.0% and 5.8% absolute F-score gain.

...read moreread less

Abstract: Visual and textual modalities contribute complementary information about events described in multimedia documents. Videos contain rich dynamics and detailed unfoldings of events, while text describes more high-level and abstract concepts. However, existing event extraction methods either do not handle video or solely target video while ignoring other modalities. In contrast, we propose the first approach to jointly extract events from video and text articles. We introduce the new task of Video MultiMedia Event Extraction (Video M2E2) and propose two novel components to build the first system towards this task. First, we propose the first self-supervised multimodal event coreference model that can determine coreference between video events and text events without any manually annotated pairs. Second, we introduce the first multimodal transformer which extracts structured event information jointly from both videos and text documents. We also construct and will publicly release a new benchmark of video-article pairs, consisting of 860 video-article pairs with extensive annotations for evaluating methods on this task. Our experimental results demonstrate the effectiveness of our proposed method on our new benchmark dataset. We achieve 6.0% and 5.8% absolute F-score gain on multimodal event coreference resolution and multimedia event extraction.

...read moreread less

1 citations

Proceedings Article•

Exploring a Unified Sequence-To-Sequence Transformer for Medical Product Safety Monitoring in Social Media

[...]

Shivam Raval¹, Hooman Sedghamiz¹, Enrico Santus², Tuka Alhanai³, Mohammad M. Ghassemi⁴, Emmanuele Chersoni - Show less +2 more•Institutions (4)

Bayer¹, Massachusetts Institute of Technology², New York University Abu Dhabi³, Michigan State University⁴

01 Nov 2021

TL;DR: The authors proposed a sequence-to-sequence approach for detecting adverse events in social media text. But their work was limited to the detection and extraction of medical products. But they achieved good results on several English benchmarks (F1 = 0.71, 12.7% relative improvement for AE detection; Strict F1 =0.713, 12 4% improvement for extraction).

...read moreread less

Abstract: Adverse Events (AE) are harmful events resulting from the use of medical products. Although social media may be crucial for early AE detection, the sheer scale of this data makes it logistically intractable to analyze using human agents, with NLP representing the only low-cost and scalable alternative. In this paper, we frame AE Detection and Extraction as a sequence-to-sequence problem using the T5 model architecture and achieve strong performance improvements over the baselines on several English benchmarks (F1 = 0.71, 12.7% relative improvement for AE Detection; Strict F1 = 0.713, 12.4% relative improvement for AE Extraction). Motivated by the strong commonalities between AE tasks, the class imbalance in AE benchmarks, and the linguistic and structural variety typical of social media texts, we propose a new strategy for multi-task training that accounts, at the same time, for task and dataset characteristics. Our approach increases model robustness, leading to further performance gains. Finally, our framework shows some language transfer capabilities, obtaining higher performance than Multilingual BERT in zero-shot learning on French data.

...read moreread less

1 citations

Posted Content•

Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers

[...]

Shane Storks¹, Joyce Y. Chai¹•Institutions (1)

University of Michigan¹

10 Sep 2021-arXiv: Computation and Language

TL;DR: This article proposed a measure of prediction coherence to evaluate pre-trained language models through a more informative evaluation than accuracy on text classification tasks, and applied their framework to two existing language understanding benchmarks with different properties.

...read moreread less

Abstract: As large-scale, pre-trained language models achieve human-level and superhuman accuracy on existing language understanding tasks, statistical bias in benchmark data and probing studies have recently called into question their true capabilities. For a more informative evaluation than accuracy on text classification tasks can offer, we propose evaluating systems through a novel measure of prediction coherence. We apply our framework to two existing language understanding benchmarks with different properties to demonstrate its versatility. Our experimental results show that this evaluation framework, although simple in ideas and implementation, is a quick, effective, and versatile measure to provide insight into the coherence of machines' predictions.

...read moreread less

1 citations

Proceedings Article•DOI•

FastSeq: Make Sequence Generation Faster

[...]

Yu Yan¹, Fei Hu, Jiusheng Chen¹, Nikhil Bhendawade¹, Ting Ye, Yeyun Gong¹, Nan Duan¹, Desheng Cui, Bingyu Chi, Ruofei Zhang¹ - Show less +6 more•Institutions (1)

Microsoft¹

01 Aug 2021

TL;DR: FastSeq as mentioned in this paper proposes an attention cache optimization, an efficient algorithm for detecting repeated n-grams, and an asynchronous generation pipeline with parallel I/O to accelerate sequence generation without accuracy loss.

...read moreread less

Abstract: Transformer-based models have made tremendous impacts in natural language generation. However the inference speed is a bottleneck due to large model size and intensive computing involved in auto-regressive decoding process. We develop FastSeq framework to accelerate sequence generation without accuracy loss. The proposed optimization techniques include an attention cache optimization, an efficient algorithm for detecting repeated n-grams, and an asynchronous generation pipeline with parallel I/O. These optimizations are general enough to be applicable to Transformer-based models (e.g., T5, GPT2, and UniLM). Our benchmark results on a set of widely used and diverse models demonstrate 4-9x inference speed gain. Additionally, FastSeq is easy to use with a simple one-line code change. The source code is available at https://github.com/microsoft/fastseq.

...read moreread less

1 citations