scispace - formally typeset
Open AccessPosted Content

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

TLDR
This article propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text, and analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting.
Abstract
Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language. We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads can solve the downstream task, 2) prompt tuning obtains downstream guarantees with weaker non-degeneracy conditions, and 3) our recovery guarantees for the memory-augmented HMM are stronger than for the vanilla HMM because task-relevant information is easier to recover from the long-term memory. Experiments on synthetically generated data from HMMs back our theoretical findings.

read more

Citations
More filters
Posted Content

On the Opportunities and Risks of Foundation Models.

Rishi Bommasani, +113 more
- 16 Aug 2021 - 
TL;DR: The authors provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e. g.g. model architectures, training procedures, data, systems, security, evaluation, theory) to their applications.
Posted Content

OpenPrompt: An Open-source Framework for Prompt-learning

TL;DR: OpenPrompt as discussed by the authors is a toolkit for prompt learning over pre-trained language models (PLMs), which can combine different PLMs, task formats, and prompting modules in a unified paradigm.
Proceedings ArticleDOI

OpenPrompt: An Open-source Framework for Prompt-learning

TL;DR: Ding et al. as discussed by the authors presented a system demonstration at the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (ACLS).
Posted Content

Self-supervised Learning is More Robust to Dataset Imbalance.

TL;DR: In this article, the authors investigate self-supervised learning under dataset imbalance and propose a re-weighted regularization technique that consistently improves the SSL representation quality on imbalanced datasets with several evaluation criteria, closing the small gap between balanced and imbalance datasets with the same number of examples.
Posted Content

Analysing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets

TL;DR: In this paper, the authors tried to uncover how much of MLM's success in the machine reading comprehension tasks comes from the correlation between masking length distribution and answer length in MRC dataset.
References
More filters
Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Proceedings ArticleDOI

Deep contextualized word representations

TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).
Book

Probabilistic graphical models : principles and techniques

TL;DR: The framework of probabilistic graphical models, presented in this book, provides a general approach for causal reasoning and decision making under uncertainty, allowing interpretable models to be constructed and then manipulated by reasoning algorithms.
Journal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Posted Content

End-To-End Memory Networks

TL;DR: A neural network with a recurrent attention model over a possibly large external memory that is trained end-to-end, and hence requires significantly less supervision during training, making it more generally applicable in realistic settings.
Related Papers (5)