scispace - formally typeset
Search or ask a question
Journal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article
03 May 2021
TL;DR: This paper showed that models with limited capacity primarily learn to exploit biases in the dataset and leverage the errors of such limited capacity models to train a more robust model in a product of experts, thus bypassing the need to handcraft a biased model.
Abstract: State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended underlying task Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available We consider cases where the bias issues may not be explicitly identified, and show a method for training models that learn to ignore these problematic correlations Our approach relies on the observation that models with limited capacity primarily learn to exploit biases in the dataset We can leverage the errors of such limited capacity models to train a more robust model in a product of experts, thus bypassing the need to hand-craft a biased model We show the effectiveness of this method to retain improvements in out-of-distribution settings even if no particular bias is targeted by the biased model

21 citations

Posted Content
TL;DR: There have been tremendous developments of BQA in the past two decades, which can be classified into five distinctive approaches: classic, information retrieval, machine reading comprehension, knowledge base and question entailment approaches as mentioned in this paper.
Abstract: Automatic Question Answering (QA) has been successfully applied in various domains such as search engines and chatbots. Biomedical QA (BQA), as an emerging QA task, enables innovative applications to effectively perceive, access and understand complex biomedical knowledge. There have been tremendous developments of BQA in the past two decades, which we classify into 5 distinctive approaches: classic, information retrieval, machine reading comprehension, knowledge base and question entailment approaches. In this survey, we introduce available datasets and representative methods of each BQA approach in detail. Despite the developments, BQA systems are still immature and rarely used in real-life settings. We identify and characterize several key challenges in BQA that might lead to this issue, and discuss some potential future directions to explore.

21 citations

Posted ContentDOI
25 May 2021-bioRxiv
TL;DR: This article proposed ProteinBERT, a deep language model specifically designed for proteins, which consists of masked language modeling combined with a novel task of Gene Ontology (GO) annotation prediction.
Abstract: Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences. However, existing models and pretraining methods are designed and optimized for text analysis. We introduce ProteinBERT, a deep language model specifically designed for proteins. Our pretraining scheme consists of masked language modeling combined with a novel task of Gene Ontology (GO) annotation prediction. We introduce novel architectural elements that make the model highly efficient and flexible to very large sequence lengths. The architecture of ProteinBERT consists of both local and global representations, allowing end-to-end processing of these types of inputs and outputs. ProteinBERT obtains state-of-the-art performance on multiple benchmarks covering diverse protein properties (including protein structure, post translational modifications and biophysical attributes), despite using a far smaller model than competing deep-learning methods. Overall, ProteinBERT provides an efficient framework for rapidly training protein predictors, even with limited labeled data. Code and pretrained model weights are available at https://github.com/nadavbra/protein_bert.

21 citations

Posted Content
TL;DR: This paper investigates multiple ways to automatically generate rationales using pre-trained language models, neural knowledge models, and distant supervision from related tasks, and trains generative models capable of composing explanatory rationales for unseen instances.
Abstract: The black-box nature of neural models has motivated a line of research that aims to generate natural language rationales to explain why a model made certain predictions. Such rationale generation models, to date, have been trained on dataset-specific crowdsourced rationales, but this approach is costly and is not generalizable to new tasks and domains. In this paper, we investigate the extent to which neural models can reason about natural language rationales that explain model predictions, relying only on distant supervision with no additional annotation cost for human-written rationales. We investigate multiple ways to automatically generate rationales using pre-trained language models, neural knowledge models, and distant supervision from related tasks, and train generative models capable of composing explanatory rationales for unseen instances. We demonstrate our approach on the defeasible inference task, a nonmonotonic reasoning task in which an inference may be strengthened or weakened when new information (an update) is introduced. Our model shows promises at generating post-hoc rationales explaining why an inference is more or less likely given the additional information, however, it mostly generates trivial rationales reflecting the fundamental limitations of neural language models. Conversely, the more realistic setup of jointly predicting the update or its type and generating rationale is more challenging, suggesting an important future direction.

21 citations

Proceedings ArticleDOI
Yi Liao, Xin Jiang1, Qun Liu1
24 Apr 2020
TL;DR: This article proposed a probabilistic masking scheme for the Masked Language Model (PMLM) and showed that it is equivalent to an autoregressive permutated language model.
Abstract: Masked language model and autoregressive language model are two types of language models. While pretrained masked language models such as BERT overwhelm the line of natural language understanding (NLU) tasks, autoregressive language models such as GPT are especially capable in natural language generation (NLG). In this paper, we propose a probabilistic masking scheme for the masked language model, which we call probabilistically masked language model (PMLM). We implement a specific PMLM with a uniform prior distribution on the masking ratio named u-PMLM. We prove that u-PMLM is equivalent to an autoregressive permutated language model. One main advantage of the model is that it supports text generation in arbitrary order with surprisingly good quality, which could potentially enable new applications over traditional unidirectional generation. Besides, the pretrained u-PMLM also outperforms BERT on a bunch of downstream NLU tasks.

20 citations

Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.