Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Does reusing pre-trained NLP model propagate bugs?

[...]

Mohna Chakraborty¹•Institutions (1)

Iowa State University¹

20 Aug 2021

TL;DR: This paper mined 80 posts from Stack Overflow related to BERT and found four types of bugs observed in clients' code: fairness, fairness, 28.75% are parameter, 15% are token, and 16.25% are version related bugs.

...read moreread less

Abstract: In this digital era, the textual content has become a seemingly ubiquitous part of our life. Natural Language Processing (NLP) empowers machines to comprehend the intricacies of textual data and eases human-computer interaction. Advancement in language modeling, continual learning, availability of a large amount of linguistic data, and large-scale computational power have made it feasible to train models for downstream tasks related to text analysis, including safety-critical ones, e.g., medical, airlines, etc. Compared to other deep learning (DL) models, NLP-based models are widely reused for various tasks. However, the reuse of pre-trained models in a new setting is still a complex task due to the limitations of the training dataset, model structure, specification, usage, etc. With this motivation, we study BERT, a vastly used language model (LM), from the direction of reusing in the code. We mined 80 posts from Stack Overflow related to BERT and found 4 types of bugs observed in clients’ code. Our results show that 13.75% are fairness, 28.75% are parameter, 15% are token, and 16.25% are version-related bugs.

...read moreread less

1 citations

Proceedings Article•

A Finer-grain Universal Dialogue Semantic Structures based Model For Abstractive Dialogue Summarization

[...]

Yuejie Lei¹, Fujia Zheng, Yuanmeng Yan¹, Keqing He¹, Weiran Xu¹ - Show less +1 more•Institutions (1)

Beijing University of Posts and Telecommunications¹

01 Nov 2021

TL;DR: The authors propose an end-to-end Transformer-based model FinDS for abstractive dialogue summarization that leverages Finer-grain universal Dialogue semantic Structures to model dialogue and generate better summaries.

...read moreread less

Abstract: Although abstractive summarization models have achieved impressive results on document summarization tasks, their performance on dialogue modeling is much less satisfactory due to the crude and straight methods for dialogue encoding. To address this question, we propose a novel end-to-end Transformer-based model FinDS for abstractive dialogue summarization that leverages Finer-grain universal Dialogue semantic Structures to model dialogue and generates better summaries. Experiments on the SAMsum dataset show that FinDS outperforms various dialogue summarization approaches and achieves new state-of-the-art (SOTA) ROUGE results. Finally, we apply FinDS to a more complex scenario, showing the robustness of our model. We also release our source code.

...read moreread less

1 citations

Proceedings Article•DOI•

Making Transformers Solve Compositional Tasks

[...]

01 Jan 2022

TL;DR: The authors explored the design space of Transformer models and found that the inductive biases given to the model by several design decisions significantly impact compositional generalization, and identified Transformer configurations that generalize compositionally significantly better than previously reported in the literature in many compositional tasks.

...read moreread less

Abstract: Several studies have reported the inability of Transformer models to generalize compositionally, a key type of generalization in many NLP tasks such as semantic parsing. In this paper we explore the design space of Transformer models showing that the inductive biases given to the model by several design decisions significantly impact compositional generalization. We identified Transformer configurations that generalize compositionally significantly better than previously reported in the literature in many compositional tasks. We achieve state-of-the-art results in a semantic parsing compositional generalization benchmark (COGS), and a string edit operation composition benchmark (PCFG).

...read moreread less

1 citations

CIVIC-UPM at CheckThat! 2021: Integration of Transformers in Misinformation Detection and Topic Classification.

[...]

Álvaro Huertas-García, Javier Huertas-Tato, Alejandro Martín, David Camacho

01 Jan 2021

1 citations

Posted Content•

RockGPT: Reconstructing three-dimensional digital rocks from single two-dimensional slice from the perspective of video generation.

[...]

Qiang Zheng, Dongxiao Zhang

05 Aug 2021-arXiv: Image and Video Processing

TL;DR: Wang et al. as discussed by the authors proposed a new framework, named RockGPT, which is composed of VQ-VAE and conditional GPT, to synthesize 3D samples based on a single 2D slice from the perspective of video generation.

...read moreread less

Abstract: Random reconstruction of three-dimensional (3D) digital rocks from two-dimensional (2D) slices is crucial for elucidating the microstructure of rocks and its effects on pore-scale flow in terms of numerical modeling, since massive samples are usually required to handle intrinsic uncertainties. Despite remarkable advances achieved by traditional process-based methods, statistical approaches and recently famous deep learning-based models, few works have focused on producing several kinds of rocks with one trained model and allowing the reconstructed samples to satisfy certain given properties, such as porosity. To fill this gap, we propose a new framework, named RockGPT, which is composed of VQ-VAE and conditional GPT, to synthesize 3D samples based on a single 2D slice from the perspective of video generation. The VQ-VAE is utilized to compress high-dimensional input video, i.e., the sequence of continuous rock slices, to discrete latent codes and reconstruct them. In order to obtain diverse reconstructions, the discrete latent codes are modeled using conditional GPT in an autoregressive manner, while incorporating conditional information from a given slice, rock type, and porosity. We conduct two experiments on five kinds of rocks, and the results demonstrate that RockGPT can produce different kinds of rocks with the same model, and the reconstructed samples can successfully meet certain specified porosities. In a broader sense, through leveraging the proposed conditioning scheme, RockGPT constitutes an effective way to build a general model to produce multiple kinds of rocks simultaneously that also satisfy user-defined properties.

...read moreread less

1 citations