Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Training Adaptive Computation for Open-Domain Question Answering with Computational Constraints

[...]

Yuxiang Wu¹, Pasquale Minervini¹, Pontus Stenetorp¹, Sebastian Riedel²•Institutions (2)

University College London¹, Facebook²

01 Aug 2021

TL;DR: APE as mentioned in this paper is an adaptive passage encoder that can be applied to an existing Open-Domain Question Answering (ODQA) model and can be trained efficiently on a single GPU.

...read moreread less

Abstract: Adaptive Computation (AC) has been shown to be effective in improving the efficiency of Open-Domain Question Answering (ODQA) systems. However, the current AC approaches require tuning of all model parameters, and training state-of-the-art ODQA models requires significant computational resources that may not be available for most researchers. We propose Adaptive Passage Encoder, an AC method that can be applied to an existing ODQA model and can be trained efficiently on a single GPU. It keeps the parameters of the base ODQA model fixed, but it overrides the default layer-by-layer computation of the encoder with an AC policy that is trained to optimise the computational efficiency of the model. Our experimental results show that our method improves upon a state-of-the-art model on two datasets, and is also more accurate than previous AC methods due to the stronger base ODQA model. All source code and datasets are available at https://github.com/uclnlp/APE.

...read moreread less

Posted Content•

Monolingual and Cross-Lingual Acceptability Judgments with the Italian CoLA corpus

[...]

Daniela Trotta¹, Raffaele Guarasci², Elisa Leonardelli, Sara Tonelli³•Institutions (3)

University of Salerno¹, National Research Council², fondazione bruno kessler³

24 Sep 2021-arXiv: Computation and Language

TL;DR: This article developed the ItaCoLA corpus, containing almost 10,000 sentences with acceptability judgments, which has been created following the same approach and the same steps as the English CoLA corpus.

...read moreread less

Abstract: The development of automated approaches to linguistic acceptability has been greatly fostered by the availability of the English CoLA corpus, which has also been included in the widely used GLUE benchmark. However, this kind of research for languages other than English, as well as the analysis of cross-lingual approaches, has been hindered by the lack of resources with a comparable size in other languages. We have therefore developed the ItaCoLA corpus, containing almost 10,000 sentences with acceptability judgments, which has been created following the same approach and the same steps as the English one. In this paper we describe the corpus creation, we detail its content, and we present the first experiments on this new resource. We compare in-domain and out-of-domain classification, and perform a specific evaluation of nine linguistic phenomena. We also present the first cross-lingual experiments, aimed at assessing whether multilingual transformerbased approaches can benefit from using sentences in two languages during fine-tuning.

...read moreread less

Proceedings Article•

End-to-End Conversational Search for Online Shopping with Utterance Transfer

[...]

Liqiang Xiao¹, Jun Ma², Xin Luna Dong², Pascual Martínez-Gómez², Nasser Zalmout³, Wei Chen, Tong Zhao², Hao He¹, Yaohui Jin¹ - Show less +5 more•Institutions (3)

Shanghai Jiao Tong University¹, Amazon.com², New York University³

01 Nov 2021

TL;DR: In this paper, an end-to-end conversational search system that deeply combines the dialog system with search is proposed, which is more robust against imperfect product schema/knowledge compared with using product attributes alone.

...read moreread less

Abstract: Successful conversational search systems can present natural, adaptive and interactive shopping experience for online shopping customers. However, building such systems from scratch faces real word challenges from both imperfect product schema/knowledge and lack of training dialog data. In this work we first propose ConvSearch, an end-to-end conversational search system that deeply combines the dialog system with search. It leverages the text profile to retrieve products, which is more robust against imperfect product schema/knowledge compared with using product attributes alone. We then address the lack of data challenges by proposing an utterance transfer approach that generates dialogue utterances by using existing dialog from other domains, and leveraging the search behavior data from e-commerce retailer. With utterance transfer, we introduce a new conversational search dataset for online shopping. Experiments show that our utterance transfer method can significantly improve the availability of training dialogue data without crowd-sourcing, and the conversational search system significantly outperformed the best tested baseline.

...read moreread less

Posted Content•

Empathetic BERT2BERT Conversational Model: Learning Arabic Language Generation with Little Data

[...]

Tarek Naous¹, Wissam Antoun¹, Reem A. Mahmoud¹, Hazem Hajj¹•Institutions (1)

American University of Beirut¹

07 Mar 2021-arXiv: Computation and Language

TL;DR: The authors proposed a transformer-based encoder-decoder initialized with AraBERT parameters to enable empathetic behavior in Arabic dialogue agents, which achieved a low perplexity value of 17.0 and an increase in 5 BLEU points compared to the previous state of the art model.

...read moreread less

Abstract: Enabling empathetic behavior in Arabic dialogue agents is an important aspect of building human-like conversational models. While Arabic Natural Language Processing has seen significant advances in Natural Language Understanding (NLU) with language models such as AraBERT, Natural Language Generation (NLG) remains a challenge. The shortcomings of NLG encoder-decoder models are primarily due to the lack of Arabic datasets suitable to train NLG models such as conversational agents. To overcome this issue, we propose a transformer-based encoder-decoder initialized with AraBERT parameters. By initializing the weights of the encoder and decoder with AraBERT pre-trained weights, our model was able to leverage knowledge transfer and boost performance in response generation. To enable empathy in our conversational model, we train it using the ArabicEmpatheticDialogues dataset and achieve high performance in empathetic response generation. Specifically, our model achieved a low perplexity value of 17.0 and an increase in 5 BLEU points compared to the previous state-of-the-art model. Also, our proposed model was rated highly by 85 human evaluators, validating its high capability in exhibiting empathy while generating relevant and fluent responses in open-domain settings.

...read moreread less

Posted Content•

JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge Graphs

[...]

Pei Ke¹, Haozhe Ji¹, Yu Ran, Xin Cui, Liwei Wang², Linfeng Song³, Xiaoyan Zhu¹, Minlie Huang¹ - Show less +4 more•Institutions (3)

Tsinghua University¹, Peking University², Tencent³

19 Jun 2021-arXiv: Computation and Language

TL;DR: JointGT as discussed by the authors proposes a graph-text joint representation learning model for knowledge graph-to-text (KG-totext) generation, which uses a structure-aware semantic aggregation module to preserve the graph structure.

...read moreread less

Abstract: Existing pre-trained models for knowledge-graph-to-text (KG-to-text) generation simply fine-tune text-to-text pre-trained models such as BART or T5 on KG-to-text datasets, which largely ignore the graph structure during encoding and lack elaborate pre-training tasks to explicitly model graph-text alignments. To tackle these problems, we propose a graph-text joint representation learning model called JointGT. During encoding, we devise a structure-aware semantic aggregation module which is plugged into each Transformer layer to preserve the graph structure. Furthermore, we propose three new pre-training tasks to explicitly enhance the graph-text alignment including respective text / graph reconstruction, and graph-text alignment in the embedding space via Optimal Transport. Experiments show that JointGT obtains new state-of-the-art performance on various KG-to-text datasets.

...read moreread less