Open AccessJournal Article
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel,Noam Shazeer,Adam Roberts,Katherine Lee,Sharan Narang,Michael Matena,Yanqi Zhou,Wei Li,Peter J. Liu +8 more
Reads0
Chats0
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.Abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.read more
Citations
More filters
Proceedings ArticleDOI
Identifying inherent disagreement in natural language inference
TL;DR: This paper investigates how to tease systematic inferences apart from disagreement items, and proposes Artificial Annotators (AAs) to simulate the uncertainty in the annotation process by capturing the modes in annotations.
Posted Content
Dealing with Typos for BERT-based Passage Retrieval and Ranking
Shengyao Zhuang,Guido Zuccon +1 more
TL;DR: This article proposed a simple typos-aware training framework for Dense Retriever (DR) and BERT re-ranker (BERT) for passage retrieval and ranking.
Journal ArticleDOI
Exploring the Data Efficiency of Cross-Lingual Post-Training in Pretrained Language Models
TL;DR: Quantitative results from intrinsic and extrinsic evaluations show that the novel cross-lingual post-training approach outperforms several massively multilingual and monolingual pretrained language models in most settings and improves the data efficiency by a factor of up to 32 compared tomonolingual training.
Posted Content
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Andrew Jaegle,Sebastian Borgeaud,Jean-Baptiste Alayrac,Carl Doersch,Catalin Ionescu,David Ding,Skanda Koppula,Daniel Zoran,Andrew Brock,Evan Shelhamer,Olivier J. Hénaff,Matthew Botvinick,Andrew Zisserman,Oriol Vinyals,Joao Carreira +14 more
TL;DR: Perceiver IO as mentioned in this paper proposes to learn to flexibly query the model's latent space to produce outputs of arbitrary size and semantics, and achieves state-of-the-art results on tasks with highly structured output spaces.
Proceedings ArticleDOI
Shortformer: Better Language Modeling using Shorter Inputs
TL;DR: This article showed that adding absolute position embeddings to queries and keys instead of to word embedding improves perplexity and speed up the training of a language model with short input length.