Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

On Compositional Generalization of Neural Machine Translation

[...]

Yafu Li, Yongjing Yin, Yulong Chen, Yue Zhang¹•Institutions (1)

Westlake University¹

01 Aug 2021

TL;DR: The authors quantitatively analyze effects of various factors using compound translation error rate, then demonstrate that the NMT model fails badly on compositional generalization, although it performs remarkably well under traditional metrics.

...read moreread less

Abstract: Modern neural machine translation (NMT) models have achieved competitive performance in standard benchmarks such as WMT. However, there still exist significant issues such as robustness, domain generalization, etc. In this paper, we study NMT models from the perspective of compositional generalization by building a benchmark dataset, CoGnition, consisting of 216k clean and consistent sentence pairs. We quantitatively analyze effects of various factors using compound translation error rate, then demonstrate that the NMT model fails badly on compositional generalization, although it performs remarkably well under traditional metrics.

...read moreread less

7 citations

Proceedings Article•DOI•

CHIME: Cross-passage Hierarchical Memory Network for Generative Review Question Answering.

[...]

Junru Lu, Gabriele Pergola¹, Lin Gui¹, Binyang Li², Yulan He¹ - Show less +1 more•Institutions (2)

University of Warwick¹, University of International Relations²

01 Dec 2020

TL;DR: The efficacy of the proposed architecture in the multi-passage generative QA is shown, outperforming the state-of-the-art baselines with better syntactically well-formed answers and increased precision in addressing the questions of the AmazonQA review dataset.

...read moreread less

Abstract: We introduce CHIME, a cross-passage hierarchical memory network for question answering (QA) via text generation. It extends XLNet introducing an auxiliary memory module consisting of two components: the context memory collecting cross-passage evidences, and the answer memory working as a buffer continually refining the generated answers. Empirically, we show the efficacy of the proposed architecture in the multi-passage generative QA, outperforming the state-of-the-art baselines with better syntactically well-formed answers and increased precision in addressing the questions of the AmazonQA review dataset. An additional qualitative analysis revealed the interpretability introduced by the memory module.

...read moreread less

7 citations

Posted Content•

Improving BERT with Syntax-aware Local Attention

[...]

Zhongli Li¹, Qingyu Zhou¹, Chao Li², Ke Xu, Yunbo Cao² - Show less +1 more•Institutions (2)

Beijing Institute of Technology¹, Tencent²

30 Dec 2020-arXiv: Computation and Language

TL;DR: The authors propose a syntax-aware local attention, where the attention scopes are restrained based on the distances in the syntactic structure, which can be integrated with pre-trained language models, such as BERT, to render the model to focus on syntactically relevant words.

...read moreread less

Abstract: Pre-trained Transformer-based neural language models, such as BERT, have achieved remarkable results on varieties of NLP tasks. Recent works have shown that attention-based models can benefit from more focused attention over local regions. Most of them restrict the attention scope within a linear span, or confine to certain tasks such as machine translation and question answering. In this paper, we propose a syntax-aware local attention, where the attention scopes are restrained based on the distances in the syntactic structure. The proposed syntax-aware local attention can be integrated with pretrained language models, such as BERT, to render the model to focus on syntactically relevant words. We conduct experiments on various single-sentence benchmarks, including sentence classification and sequence labeling tasks. Experimental results show consistent gains over BERT on all benchmark datasets. The extensive studies verify that our model achieves better performance owing to more focused attention over syntactically relevant words.

...read moreread less

7 citations

Book Chapter•DOI•

Comparison of Czech Transformers on Text Classification Tasks.

[...]

Jan Lehečka¹, Jan Švec¹•Institutions (1)

University of West Bohemia¹

22 Nov 2021

TL;DR: In this paper, the authors present their progress in pre-training monolingual Transformers for Czech and contribute to the research community by releasing their models for public, and compared them with relevant public models, trained (at least partially) for Czech.

...read moreread less

Abstract: In this paper, we present our progress in pre-training monolingual Transformers for Czech and contribute to the research community by releasing our models for public. The need for such models emerged from our effort to employ Transformers in our language-specific tasks, but we found the performance of the published multilingual models to be very limited. Since the multilingual models are usually pre-trained from 100+ languages, most of low-resourced languages (including Czech) are under-represented in these models. At the same time, there is a huge amount of monolingual training data available in web archives like Common Crawl. We have pre-trained and publicly released two monolingual Czech Transformers and compared them with relevant public models, trained (at least partially) for Czech. The paper presents the Transformers pre-training procedure as well as a comparison of pre-trained models on text classification task from various domains.

...read moreread less

7 citations

Proceedings Article•

SituatedQA: Incorporating Extra-Linguistic Contexts into QA

[...]

Michael J. Q. Zhang, Eunsol Choi¹•Institutions (1)

University of Texas at Austin¹

13 Sep 2021

TL;DR: SituatedQA as discussed by the authors ) is an open-retrieval QA dataset where systems must produce the correct answer to a question given the temporal or geographical context, where the answer may change depending on the extra-linguistic contexts (when and where the question was asked).

...read moreread less

Abstract: Answers to the same question may change depending on the extra-linguistic contexts (when and where the question was asked). To study this challenge, we introduce SituatedQA, an open-retrieval QA dataset where systems must produce the correct answer to a question given the temporal or geographical context. To construct SituatedQA, we first identify such questions in existing QA datasets. We find that a significant proportion of information seeking questions have context-dependent answers (e.g. roughly 16.5% of NQ-Open). For such context-dependent questions, we then crowdsource alternative contexts and their corresponding answers. Our study shows that existing models struggle with producing answers that are frequently updated or from uncommon locations. We further quantify how existing models, which are trained on data collected in the past, fail to generalize to answering questions asked in the present, even when provided with an updated evidence corpus (a roughly 15 point drop in accuracy). Our analysis suggests that open-retrieval QA benchmarks should incorporate extra-linguistic context to stay relevant globally and in the future. Our data, code, and datasheet are available at https://situatedqa.github.io/.

...read moreread less

7 citations