Open AccessJournal Article
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel,Noam Shazeer,Adam Roberts,Katherine Lee,Sharan Narang,Michael Matena,Yanqi Zhou,Wei Li,Peter J. Liu +8 more
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.Abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.read more
Citations
More filters
Proceedings ArticleDOI
SB_NITK at MEDIQA 2021: Leveraging Transfer Learning for Question Summarization in Medical Domain
TL;DR: In this article, the authors used transfer learning to improve the performance of summarization of consumer health questions in the MEDIQA 2021 Question Summarization shared task and achieved a ROUGE-2 F1 score of 0.139.
Book ChapterDOI
Information Extraction/Entailment of Common Law and Civil Code
John Hudzina,Kanika Madan,Dhivya Chinnappa,Jinane Harmouche,Hiroko Bretz,Andrew Vold,Frank Schilder +6 more
TL;DR: This paper evaluated different approaches on handling entailment tasks for small domain-specific data sets provided in the Competition on Legal Information Extraction/Entailment (COLIEE), which focused on legal information processing and finding textual entailment on legal data.
Proceedings ArticleDOI
DuoRAT: Towards Simpler Text-to-SQL Models
TL;DR: This paper begins by building DuoRAT, a re-implementation of the state-of-the-art RAT-SQL model that unlike RAC-SQL is using only relation-aware or vanilla transformers as the building blocks, and performs several ablation experiments using DuoR AT as the baseline model.
Posted Content
Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language.
Hassan Akbari,Hamid Palangi,Jianwei Yang,Sudha Rao,Asli Celikyilmaz,Roland Fernandez,Paul Smolensky,Jianfeng Gao,Shih-Fu Chang +8 more
TL;DR: A new model architecture for learning multi-modal neuro-symbolic representations for video captioning using a dictionary learning-based method that incorporates modality-specific inductive biases for the captioning task is proposed.
Proceedings ArticleDOI
SUPER: SUb-Graph Parallelism for TransformERs
Arpan Jain,Tim Moon,Tom Benson,Hari Subramoni,Sam Ade Jacobs,Dhabaleswar K. Panda,Brian Van Essen +6 more
TL;DR: In this article, the authors proposed sub-graph parallelism to accelerate the training of Transformer models and generalize the concept to any neural network with multiple branches, which can be used to speed up training of neural networks.