Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Applications of Artificial Intelligence on the Modeling and Optimization for Analog and Mixed-Signal Circuits: A Review

[...]

Morteza Fayazi¹, Zachary Colter¹, Ehsan Afshari¹, Ronald G. Dreslinski¹•Institutions (1)

University of Michigan¹

23 Mar 2021-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: The basic concepts in AI especially the ones that are more suitable to this application are introduced and the main approaches as well as the pros and cons of each method are discussed.

...read moreread less

Abstract: Recently, there have been many studies attempting to take advantage of advancements in Artificial Intelligence (AI) in Analog and Mixed-Signal (AMS) circuit design. Automated circuit sizing optimization and improving the accuracy of performance models are the two predominant uses of AI in AMS circuit design. This paper first introduces and explains the basic concepts in AI especially the ones that are more suitable to this application. Next, it surveys some recent studies of various AI techniques for AMS circuit design. Then, it discusses the main approaches as well as the pros and cons of each method. Finally, it gives meaningful insights about the current challenges and open issues, as well as recommends approaches for specific applications.

...read moreread less

22 citations

Proceedings Article•DOI•

Focused Attention Improves Document-Grounded Generation

[...]

Shrimai Prabhumoye¹, Kazuma Hashimoto², Yingbo Zhou², Alan W. Black¹, Ruslan Salakhutdinov¹ - Show less +1 more•Institutions (2)

Carnegie Mellon University¹, Salesforce.com²

26 Apr 2021

TL;DR: This work introduces two novel adaptations of large scale pre-trained encoder-decoder models focusing on building context driven representation of the document and enabling specific attention to the information in the document.

...read moreread less

Abstract: Document grounded generation is the task of using the information provided in a document to improve text generation. This work focuses on two different document grounded generation tasks: Wikipedia Update Generation task and Dialogue response generation. Our work introduces two novel adaptations of large scale pre-trained encoder-decoder models focusing on building context driven representation of the document and enabling specific attention to the information in the document. Additionally, we provide a stronger BART baseline for these tasks. Our proposed techniques outperform existing methods on both automated (at least 48% increase in BLEU-4 points) and human evaluation for closeness to reference and relevance to the document. Furthermore, we perform comprehensive manual inspection of the generated output and categorize errors to provide insights into future directions in modeling these tasks.

...read moreread less

22 citations

Posted Content•

Unlocking Compositional Generalization in Pre-trained Models Using Intermediate Representations.

[...]

Jonathan Herzig¹, Peter Shaw², Ming-Wei Chang², Kelvin Guu², Panupong Pasupat², Yuan Zhang² - Show less +2 more•Institutions (2)

Tel Aviv University¹, Google²

15 Apr 2021-arXiv: Computation and Language

TL;DR: This article study the impact of intermediate representations on compositional generalization in pre-trained seq2seq models, without changing the model architecture at all, and identify key aspects for designing effective representations.

...read moreread less

Abstract: Sequence-to-sequence (seq2seq) models are prevalent in semantic parsing, but have been found to struggle at out-of-distribution compositional generalization. While specialized model architectures and pre-training of seq2seq models have been proposed to address this issue, the former often comes at the cost of generality and the latter only shows limited success. In this paper, we study the impact of intermediate representations on compositional generalization in pre-trained seq2seq models, without changing the model architecture at all, and identify key aspects for designing effective representations. Instead of training to directly map natural language to an executable form, we map to a reversible or lossy intermediate representation that has stronger structural correspondence with natural language. The combination of our proposed intermediate representations and pre-trained models is surprisingly effective, where the best combinations obtain a new state-of-the-art on CFQ (+14.8 accuracy points) and on the template-splits of three text-to-SQL datasets (+15.0 to +19.4 accuracy points). This work highlights that intermediate representations provide an important and potentially overlooked degree of freedom for improving the compositional generalization abilities of pre-trained seq2seq models.

...read moreread less

22 citations

Proceedings Article•DOI•

GAIA: A Transfer Learning System of Object Detection that Fits Your Needs

[...]

Xingyuan Bu¹, Junran Peng², Junjie Yan³, Tieniu Tan², Zhaoxiang Zhang² - Show less +1 more•Institutions (3)

Beijing Institute of Technology¹, Chinese Academy of Sciences², SenseTime³

20 Jun 2021

TL;DR: GAIA as mentioned in this paper is a transfer learning system for object detection that provides powerful pre-trained weights, selecting models that conform to downstream demands such as latency constraints and specified data domains, and collecting relevant data for practitioners who have very few data points for their tasks.

...read moreread less

Abstract: Transfer learning with pre-training on large-scale datasets has played an increasingly significant role in computer vision and natural language processing recently. However, as there exist numerous application scenarios that have distinctive demands such as certain latency constraints and specialized data distributions, it is prohibitively expensive to take advantage of large-scale pre-training for per-task requirements. In this paper, we focus on the area of object detection and present a transfer learning system named GAIA, which could automatically and efficiently give birth to customized solutions according to heterogeneous downstream needs. GAIA is capable of providing powerful pre-trained weights, selecting models that conform to downstream demands such as latency constraints and specified data domains, and collecting relevant data for practitioners who have very few datapoints for their tasks. With GAIA, we achieve promising results on COCO, Objects365, Open Images, Caltech, CityPersons, and UODB which is a collection of datasets including KITTI, VOC, WiderFace, DOTA, Clipart, Comic, and more. Taking COCO as an ex-ample, GAIA is able to efficiently produce models covering a wide range of latency from 16ms to 53ms, and yields AP from 38.2 to 46.5 without whistles and bells. To benefit every practitioner in the community of object detection, GAIA is released at https://github.com/GAIA-vision.

...read moreread less

22 citations

Proceedings Article•DOI•

Fool Me Twice: Entailment from Wikipedia Gamification

[...]

Julian Martin Eisenschlos¹, Bhuwan Dhingra¹, Jannis Bulian¹, Benjamin Börschinger¹, Jordan Boyd-Graber² - Show less +1 more•Institutions (2)

Google¹, University of Maryland, College Park²

01 Jun 2021

TL;DR: FoolMeTwice (FM2 for short), a large dataset of challenging entailment pairs collected through a fun multi-player game that leads to diverse strategies for crafting claims, such as temporal inference and diverting to unrelated evidence, results in higher quality data for the entailment and evidence retrieval tasks.

...read moreread less

Abstract: We release FoolMeTwice (FM2 for short), a large dataset of challenging entailment pairs collected through a fun multi-player game. Gamification encourages adversarial examples, drastically lowering the number of examples that can be solved using “shortcuts” compared to other popular entailment datasets. Players are presented with two tasks. The first task asks the player to write a plausible claim based on the evidence from a Wikipedia page. The second one shows two plausible claims written by other players, one of which is false, and the goal is to identify it before the time runs out. Players “pay” to see clues retrieved from the evidence pool: the more evidence the player needs, the harder the claim. Game-play between motivated players leads to diverse strategies for crafting claims, such as temporal inference and diverting to unrelated evidence, and results in higher quality data for the entailment and evidence retrieval tasks. We open source the dataset and the game code.

...read moreread less

22 citations