Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Towards Transparent Interactive Semantic Parsing via Step-by-Step Correction

[...]

01 Jan 2022

TL;DR: This paper propose an interactive semantic parsing framework that explains the predicted logical forms step by step in natural language and enables the user to make corrections through natural-language feedback for individual steps, aiming to increase the transparency of the parsing process and help the user trust the final answer.

...read moreread less

Abstract: Existing studies on semantic parsing focus on mapping a natural-language utterance to a logical form (LF) in one turn. However, because natural language may contain ambiguity and variability, this is a difficult challenge. In this work, we investigate an interactive semantic parsing framework that explains the predicted LF step by step in natural language and enables the user to make corrections through natural-language feedback for individual steps. We focus on question answering over knowledge bases (KBQA) as an instantiation of our framework, aiming to increase the transparency of the parsing process and help the user trust the final answer. We construct INSPIRED, a crowdsourced dialogue dataset derived from the ComplexWebQuestions dataset. Our experiments show that this framework has the potential to greatly improve overall parse accuracy. Furthermore, we develop a pipeline for dialogue simulation to evaluate our framework w.r.t. a variety of state-of-the-art KBQA models without further crowdsourcing effort. The results demonstrate that our framework promises to be effective across such models.

...read moreread less

1 citations

Proceedings Article•DOI•

Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills

[...]

01 Jan 2022

TL;DR: The authors propose to leverage semi-structured tables, and automatically generate at scale question-paragraph pairs, where answering the question requires reasoning over multiple facts in the paragraph and add a pre-training step over this synthetic data, which includes examples that require 16 different reasoning skills such as number comparison, conjunction and fact composition.

...read moreread less

Abstract: Models pre-trained with a language modeling objective possess ample world knowledge and language skills, but are known to struggle in tasks that require reasoning. In this work, we propose to leverage semi-structured tables, and automatically generate at scale question-paragraph pairs, where answering the question requires reasoning over multiple facts in the paragraph. We add a pre-training step over this synthetic data, which includes examples that require 16 different reasoning skills such as number comparison, conjunction, and fact composition. To improve data efficiency, we sample examples from reasoning skills where the model currently errs. We evaluate our approach on three reasoning-focused reading comprehension datasets, and show that our model, PReasM, substantially outperforms T5, a popular pre-trained encoder-decoder model. Moreover, sampling examples based on model errors leads to faster training and higher performance.

...read moreread less

1 citations

Posted Content•

Relative Positional Encoding for Transformers with Linear Complexity

[...]

Antoine Liutkus¹, Ondřej Cífka², Shih-Lun Wu³, Umut Simsekli¹, Yi-Hsuan Yang⁴, Gael Richard² - Show less +2 more•Institutions (4)

French Institute for Research in Computer Science and Automation¹, Télécom ParisTech², National Taiwan University³, Academia Sinica⁴

18 May 2021-arXiv: Learning

TL;DR: Stochastic Positional Encoding (SPE) as discussed by the authors is an alternative to the classical additive (sinusoidal) PE and provably behaves like RPE for the linear-variants of the Transformer.

...read moreread less

Abstract: Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity. In the meantime, relative positional encoding (RPE) was proposed as beneficial for classical Transformers and consists in exploiting lags instead of absolute positions for inference. Still, RPE is not available for the recent linear-variants of the Transformer, because it requires the explicit computation of the attention matrix, which is precisely what is avoided by such methods. In this paper, we bridge this gap and present Stochastic Positional Encoding as a way to generate PE that can be used as a replacement to the classical additive (sinusoidal) PE and provably behaves like RPE. The main theoretical contribution is to make a connection between positional encoding and cross-covariance structures of correlated Gaussian processes. We illustrate the performance of our approach on the Long-Range Arena benchmark and on music generation.

...read moreread less

1 citations

Posted Content•

R2-D2: A Modular Baseline for Open-Domain Question Answering

[...]

Martin Fajcik¹, Martin Docekal¹, Karel Ondrej¹, Pavel Smrz¹•Institutions (1)

Brno University of Technology¹

08 Sep 2021-arXiv: Computation and Language

TL;DR: This paper proposed R2-D2 (Rank twice, reaD twice), which is composed of a retriever, passage re-ranker, extractive reader, generative reader and a mechanism that aggregates the final prediction from all system's components.

...read moreread less

Abstract: This work presents a novel four-stage open-domain QA pipeline R2-D2 (Rank twice, reaD twice). The pipeline is composed of a retriever, passage reranker, extractive reader, generative reader and a mechanism that aggregates the final prediction from all system's components. We demonstrate its strength across three open-domain QA datasets: NaturalQuestions, TriviaQA and EfficientQA, surpassing state-of-the-art on the first two. Our analysis demonstrates that: (i) combining extractive and generative reader yields absolute improvements up to 5 exact match and it is at least twice as effective as the posterior averaging ensemble of the same models with different parameters, (ii) the extractive reader with fewer parameters can match the performance of the generative reader on extractive QA datasets.

...read moreread less

1 citations

Proceedings Article•DOI•

LU-BZU at SemEval-2021 Task 2: Word2Vec and Lemma2Vec performance in Arabic Word-in-Context disambiguation

[...]

Moustafa Al-Hajj¹, Mustafa Jarrar²•Institutions (2)

Lebanese University¹, Birzeit University²

01 Aug 2021

TL;DR: This paper presented a set of experiments to evaluate and compare between CBOW Word2vec and Lemma2Vec models for Arabic word-in-context (WiC) disambiguation without using sense inventories or sense embeddings.

...read moreread less

Abstract: This paper presents a set of experiments to evaluate and compare between the performance of using CBOW Word2Vec and Lemma2Vec models for Arabic Word-in-Context (WiC) disambiguation without using sense inventories or sense embeddings. As part of the SemEval-2021 Shared Task 2 on WiC disambiguation, we used the dev.ar-ar dataset (2k sentence pairs) to decide whether two words in a given sentence pair carry the same meaning. We used two Word2Vec models: Wiki-CBOW, a pre-trained model on Arabic Wikipedia, and another model we trained on large Arabic corpora of about 3 billion tokens. Two Lemma2Vec models was also constructed based on the two Word2Vec models. Each of the four models was then used in the WiC disambiguation task, and then evaluated on the SemEval-2021 test.ar-ar dataset. At the end, we reported the performance of different models and compared between using lemma-based and word-based models.

...read moreread less

1 citations