Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content•

End-to-End Conversational Search for Online Shopping with Utterance Transfer

[...]

Liqiang Xiao¹, Jun Ma², Xin Luna Dong², Pascual Martínez-Gómez², Nasser Zalmout³, Wei Chen, Tong Zhao², Hao He¹, Yaohui Jin¹ - Show less +5 more•Institutions (3)

Shanghai Jiao Tong University¹, Amazon.com², New York University³

12 Sep 2021-arXiv: Computation and Language

TL;DR: In this article, an end-to-end conversational search system that deeply combines the dialog system with search is proposed. But, building such systems from scratch faces real word challenges from both imperfect product schema/knowledge and lack of training dialog.

...read moreread less

Abstract: Successful conversational search systems can present natural, adaptive and interactive shopping experience for online shopping customers. However, building such systems from scratch faces real word challenges from both imperfect product schema/knowledge and lack of training dialog this http URL this work we first propose ConvSearch, an end-to-end conversational search system that deeply combines the dialog system with search. It leverages the text profile to retrieve products, which is more robust against imperfect product schema/knowledge compared with using product attributes alone. We then address the lack of data challenges by proposing an utterance transfer approach that generates dialogue utterances by using existing dialog from other domains, and leveraging the search behavior data from e-commerce retailer. With utterance transfer, we introduce a new conversational search dataset for online shopping. Experiments show that our utterance transfer method can significantly improve the availability of training dialogue data without crowd-sourcing, and the conversational search system significantly outperformed the best tested baseline.

...read moreread less

Posted Content•

Recent Advances in Automated Question Answering In Biomedical Domain.

[...]

Krishanu Das Baksi¹•Institutions (1)

Tata Consultancy Services¹

10 Nov 2021-arXiv: Artificial Intelligence

TL;DR: In this paper, the authors introduce the basic methodologies used for developing general domain QA systems, followed by a thorough investigation of different aspects of biomedical QA system, including benchmark datasets and several proposed approaches, both using structured databases and collection of texts.

...read moreread less

Abstract: The objective of automated Question Answering (QA) systems is to provide answers to user queries in a time efficient manner. The answers are usually found in either databases (or knowledge bases) or a collection of documents commonly referred to as the corpus. In the past few decades there has been a proliferation of acquisition of knowledge and consequently there has been an exponential growth in new scientific articles in the field of biomedicine. Therefore, it has become difficult to keep track of all the information in the domain, even for domain experts. With the improvements in commercial search engines, users can type in their queries and get a small set of documents most relevant for answering their query, as well as relevant snippets from the documents in some cases. However, it may be still tedious and time consuming to manually look for the required information or answers. This has necessitated the development of efficient QA systems which aim to find exact and precise answers to user provided natural language questions in the domain of biomedicine. In this paper, we introduce the basic methodologies used for developing general domain QA systems, followed by a thorough investigation of different aspects of biomedical QA systems, including benchmark datasets and several proposed approaches, both using structured databases and collection of texts. We also explore the limitations of current systems and explore potential avenues for further advancement.

...read moreread less

Posted Content•

Generative chemical transformer: attention makes neural machine learn molecular geometric structures via text

[...]

Hyun-seung Kim, Jonggeol Na, Won Bo Lee

27 Feb 2021-arXiv: Learning

TL;DR: In this article, a neural machine that creates molecules that meet some desired conditions based on a deep understanding of chemical language (generative chemical Transformer, GCT) is proposed.

...read moreread less

Abstract: Chemical formula is an artificial language that expresses molecules as text. Neural machines that have learned chemical language can be used as a tool for inverse molecular design. Here, we propose a neural machine that creates molecules that meet some desired conditions based on a deep understanding of chemical language (generative chemical Transformer, GCT). Attention-mechanism in GCT allows a deeper understanding of molecular structures, beyond the limitations of chemical language itself that cause semantic discontinuity, by paying attention to characters sparsely. We investigate the significance of language models to inverse molecular design problems by quantitatively evaluating the quality of generated molecules. GCT generates highly realistic chemical strings that satisfy both a chemical rule and grammars of a language. Molecules parsed from generated strings simultaneously satisfy the multiple target properties and are various for a single condition set. GCT generates de novo molecules, and this is done in a short time that human experts cannot. These advances will contribute to improving the quality of human life by accelerating the process of desired material discovery.

...read moreread less

Posted Content•

Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

[...]

Aditya Gupta¹, Jiacheng Xu², Shyam Upadhyay¹, Diyi Yang³, Manaal Faruqui¹ - Show less +1 more•Institutions (3)

Google¹, University of Texas at Austin², Shanghai Jiao Tong University³

08 Jun 2021-arXiv: Computation and Language

TL;DR: Disfl-QA as discussed by the authors is a dataset where humans introduce contextual disfluencies in previously fluent questions, which require a more comprehensive understanding of the text than what was necessary in prior datasets.

...read moreread less

Abstract: Disfluencies is an under-studied topic in NLP, even though it is ubiquitous in human conversation. This is largely due to the lack of datasets containing disfluencies. In this paper, we present a new challenge question answering dataset, Disfl-QA, a derivative of SQuAD, where humans introduce contextual disfluencies in previously fluent questions. Disfl-QA contains a variety of challenging disfluencies that require a more comprehensive understanding of the text than what was necessary in prior datasets. Experiments show that the performance of existing state-of-the-art question answering models degrades significantly when tested on Disfl-QA in a zero-shot setting.We show data augmentation methods partially recover the loss in performance and also demonstrate the efficacy of using gold data for fine-tuning. We argue that we need large-scale disfluency datasets in order for NLP models to be robust to them. The dataset is publicly available at: this https URL.

...read moreread less

Posted Content•

A Survey of Quantum Theory Inspired Approaches to Information Retrieval

[...]

Sagar Uprety¹, Dimitris Gkoumas¹, Dawei Song²•Institutions (2)

Open University¹, Beijing Institute of Technology²

08 Jul 2020-arXiv: Information Retrieval

TL;DR: In this article, the authors present a survey of the research done in this area, aiming to show the landscape of the field and draw a road-map of future research directions.

...read moreread less

Abstract: Since 2004, researchers have been using the mathematical framework of Quantum Theory (QT) in Information Retrieval (IR). QT offers a generalized probability and logic framework. Such a framework has been shown capable of unifying the representation, ranking and user cognitive aspects of IR, and helpful in developing more dynamic, adaptive and context-aware IR systems. Although Quantum-inspired IR is still a growing area, a wide array of work in different aspects of IR has been done and produced promising results. This paper presents a survey of the research done in this area, aiming to show the landscape of the field and draw a road-map of future directions.

...read moreread less