A Bayesian/Information Theoretic Model of Learning to Learn viaMultiple Task Sampling

doi:10.1023/A:1007327622663

Open AccessJournal ArticleDOI

A Bayesian/Information Theoretic Model of Learning to Learn viaMultiple Task Sampling

Jonathan Baxter

- 01 Jul 1997 -

Machine Learning

- Vol. 28, Iss: 1, pp 7-39

TLDR

It is argued that for many common machine learning problems, although in general the authors do not know the true (objective) prior for the problem, they do have some idea of a set of possible priors to which the true prior belongs.

Abstract:

A Bayesian model of learning to learn by sampling from multiple tasks is presented. The multiple tasks are themselves generated by sampling from a distribution over an environment of related tasks. Such an environment is shown to be naturally modelled within a Bayesian context by the concept of an objective prior distribution. It is argued that for many common machine learning problems, although in general we do not know the true (objective) prior for the problem, we do have some idea of a set of possible priors to which the true prior belongs. It is shown that under these circumstances a learner can use Bayesian inference to learn the true prior by learning sufficiently many tasks from the environment. In addition, bounds are given on the amount of information required to learn a task when it is simultaneously learnt with several other tasks. The bounds show that if the learner has little knowledge of the true prior, but the dimensionality of the true prior is small, then sampling multiple tasks is highly advantageous. The theory is applied to the problem of learning a common feature set or equivalently a low-dimensional-representation (LDR) for an environment of related tasks.

A Bayesian/Information Theoretic Model of Learning to Learn viaMultiple Task Sampling

Citations

Learning Deep Architectures for AI

An Overview of Multi-Task Learning in Deep Neural Networks

Regularized multi--task learning

Practical recommendations for gradient-based training of deep architectures

Practical recommendations for gradient-based training of deep architectures

References

Elements of information theory

Stastical Decision Theory and Bayesian Analysis.

Approximation capabilities of multilayer feedforward networks

Statistical Decision Theory and Bayesian Analysis

Bayesian interpolation

Related Papers (5)

Multitask Learning

An Overview of Multi-Task Learning in Deep Neural Networks

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

Deep Residual Learning for Image Recognition

Regularized multi--task learning