Open AccessJournal Article
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel,Noam Shazeer,Adam Roberts,Katherine Lee,Sharan Narang,Michael Matena,Yanqi Zhou,Wei Li,Peter J. Liu +8 more
Reads0
Chats0
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.Abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.read more
Citations
More filters
Posted Content
\'UFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5
David Samuel,Milan Straka +1 more
TL;DR: This article presented the winning entry to the Multilingual Lexical Normalization (MultiLexNorm) shared task at W-NUT 2021 (van der Goot et al., 2021a), which evaluated lexical-normalization systems on 12 social media datasets in 11 languages.
Proceedings Article
Graph-Based Decoding for Task Oriented Semantic Parsing
TL;DR: The authors formulate semantic parsing as a dependency parsing task, applying graph-based decoding techniques developed for syntactic parsing, and compare various decoding techniques given the same pre-trained Transformer encoder on the TOP dataset, including settings where training data is limited or contains only partially annotated examples.
Posted Content
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients
TL;DR: This paper proposed an adaptive learning rate principle, in which the running mean of squared gradient in Adam is replaced by a weighted mean, with weights chosen to maximize the estimated variance of each coordinate.
Proceedings Article
Monolingual and Cross-Lingual Acceptability Judgments with the Italian CoLA corpus.
TL;DR: This paper developed the ItaCoLA corpus, containing almost 10,000 sentences with acceptability judgments, which has been created following the same approach and the same steps as the English CoLA corpus.
Patent
Retrieval-augmented language model pre-training and fine-tuning
TL;DR: In this paper, a neural-network-based textual knowledge retriever is trained along with a language model to retrieve helpful information from a large unlabeled corpus, rather than requiring all potentially relevant information to be stored implicitly in the parameters of the neural network.