Open AccessJournal Article
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel,Noam Shazeer,Adam Roberts,Katherine Lee,Sharan Narang,Michael Matena,Yanqi Zhou,Wei Li,Peter J. Liu +8 more
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.Abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.read more
Citations
More filters
Posted Content
How Optimal is Greedy Decoding for Extractive Question Answering
TL;DR: The authors showed that greedy decoding quickly converges towards the performance of exact extractive decoding with the introduction of a few training examples, becoming more extractive and increasingly likelier to generate the most probable span as the training set grows.
Proceedings ArticleDOI
BERT meets Cranfield: Uncovering the Properties of Full Ranking on Fully Labeled Data
Negin Ghasemi,Djoerd Hiemstra +1 more
TL;DR: In this article, the authors investigate BERT-based rankers performance on the Cranfield collection, which comes with full relevance judgment on all documents in the collection, as opposed to the BERTbased re-ranker and BM25.
Posted ContentDOI
Sequence-to-Sequence Piano Transcription with Transformers
TL;DR: In this paper, a generic encoder-decoder Transformer with standard decoding methods is used to translate spectrogram inputs directly to MIDI-like output events for several transcription tasks.
Journal ArticleDOI
gobbli: A uniform interface to deep learning for text in Python
Jason Nance,Peter Baumgartner +1 more
Proceedings ArticleDOI
Disentangled Sequence to Sequence Learning for Compositional Generalization
TL;DR: The authors proposed an extension to sequence-to-sequence models which encourages disentanglement by adaptively reencoding (at each time step) the source input, which makes it easier for the encoder to exploit specialized information for each prediction rather than capturing it all in a single forward pass.