Open AccessJournal Article
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel,Noam Shazeer,Adam Roberts,Katherine Lee,Sharan Narang,Michael Matena,Yanqi Zhou,Wei Li,Peter J. Liu +8 more
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.Abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.read more
Citations
More filters
Proceedings ArticleDOI
RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark
Tatiana Shavrina,Alena Fenogenova,Emelyanov Anton,Denis Shevelev,Ekaterina Artemova,Valentin Malykh,Vladislav Mikhailov,Maria Tikhonova,Andrey Chertok,Andrey Evlampiev +9 more
TL;DR: This paper introduces an advanced Russian general language understanding evaluation benchmark – Russian SuperGLUE and presents the first results of comparing multilingual models in the translated diagnostic test set and offers the first steps to further expanding or assessing State-of theart models independently of language.
Proceedings ArticleDOI
Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction
TL;DR: The authors proposed Text2Event, a sequence-to-structure generation paradigm that can directly extract events from the text in an end-toend manner, which can achieve competitive performance using only record-level annotations in both supervised learning and transfer learning settings.
Proceedings ArticleDOI
Improving Neural Topic Models using Knowledge Distillation
TL;DR: This work uses knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers to improve topic quality, and shows that the adaptable framework not only improves performance in the aggregate over all estimated topics, but also in head-to-head comparisons of aligned topics.
Posted Content
Robustness Gym: Unifying the NLP Evaluation Landscape
Karan Goel,Nazneen Fatema Rajani,Jesse Vig,Samson Tan,Jason Wu,Stephan Zheng,Caiming Xiong,Mohit Bansal,Christopher Ré +8 more
TL;DR: Robustness Gym as discussed by the authors is a simple and extensible evaluation toolkit that unifies four standard evaluation paradigms: subpopulations, transformations, evaluation sets, and adversarial attacks.
Proceedings Article
Rethinking Positional Encoding in Language Pre-training
Guolin Ke,Di He,Tie-Yan Liu +2 more
TL;DR: The authors proposed a new positional encoding method called \textbf{T}ransformer with Untied positional embeddings (TUPE), which unties the symbol from other positions, making it easier to capture information from all positions.