Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Open AccessPosted Content

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

- 17 Jun 2021 -

TLDR

This article propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text, and analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting.

Abstract:

Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language. We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads can solve the downstream task, 2) prompt tuning obtains downstream guarantees with weaker non-degeneracy conditions, and 3) our recovery guarantees for the memory-augmented HMM are stronger than for the vanilla HMM because task-relevant information is easier to recover from the long-term memory. Experiments on synthetically generated data from HMMs back our theoretical findings.

Citations

PDF

Open Access

More filters

Posted Content

On the Opportunities and Risks of Foundation Models.

Rishi Bommasani, +113 more

- 16 Aug 2021 -

arXiv: Learning

TL;DR: The authors provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e. g.g. model architectures, training procedures, data, systems, security, evaluation, theory) to their applications.

...read moreread less

Posted Content

OpenPrompt: An Open-source Framework for Prompt-learning

Ning Ding, +6 more

- 03 Nov 2021 -

arXiv: Computation and Language

TL;DR: OpenPrompt as discussed by the authors is a toolkit for prompt learning over pre-trained language models (PLMs), which can combine different PLMs, task formats, and prompting modules in a unified paradigm.

...read moreread less

Proceedings ArticleDOI

OpenPrompt: An Open-source Framework for Prompt-learning

TL;DR: Ding et al. as discussed by the authors presented a system demonstration at the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (ACLS).

...read moreread less

Posted Content

Self-supervised Learning is More Robust to Dataset Imbalance.

Hong Liu, +3 more

- 11 Oct 2021 -

arXiv: Learning

TL;DR: In this article, the authors investigate self-supervised learning under dataset imbalance and propose a re-weighted regularization technique that consistently improves the SSL representation quality on imbalanced datasets with several evaluation criteria, closing the small gap between balanced and imbalance datasets with the same number of examples.

...read moreread less

Posted Content

Analysing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets

Changchang. Zeng, +2 more

- 29 Sep 2021 -

arXiv: Computation and Language

TL;DR: In this paper, the authors tried to uncover how much of MLM's success in the machine reading comprehension tasks comes from the correlation between masking length distribution and answer length in MRC dataset.

...read moreread less

References

PDF

Open Access

More filters

Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Proceedings ArticleDOI

Deep contextualized word representations

Matthew E. Peters, +6 more

TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).

...read moreread less

Book

Probabilistic graphical models : principles and techniques

Daniel L. Koller, +1 more

TL;DR: The framework of probabilistic graphical models, presented in this book, provides a general approach for causal reasoning and decision making under uncertainty, allowing interpretable models to be constructed and then manipulated by reasoning algorithms.

...read moreread less

Journal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

...read moreread less

Posted Content

End-To-End Memory Networks

Sainbayar Sukhbaatar, +3 more

- 31 Mar 2015 -

arXiv: Neural and Evolutionary Computing

TL;DR: A neural network with a recurrent attention model over a possibly large external memory that is trained end-to-end, and hence requires significantly less supervision during training, making it more generally applicable in realistic settings.

...read moreread less

Collapse

IEEE Transactions on Audio, Speech, and ...

Implicit Language Model in LSTM for OCR

Ekraam Sabir, +2 more

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Citations

On the Opportunities and Risks of Foundation Models.

OpenPrompt: An Open-source Framework for Prompt-learning

OpenPrompt: An Open-source Framework for Prompt-learning

Self-supervised Learning is More Robust to Dataset Imbalance.

Analysing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets

References

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Deep contextualized word representations

Probabilistic graphical models : principles and techniques

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

End-To-End Memory Networks

Related Papers (5)

Speech recognition using neural networks

The Structure of Deep Neural Network for Interpretable Transfer Learning

Comparing Hidden Markov Models and Long Short Term Memory Neural Networks for Learning Action Representations

Flat-Start Single-Stage Discriminatively Trained HMM-Based Models for ASR

Implicit Language Model in LSTM for OCR