Diffusion of Context and Credit Information in Markovian Models

Open AccessPosted Content

Diffusion of Context and Credit Information in Markovian Models

- 01 Oct 1995 -

TLDR

This article studied the problem of ergodicity of transition probability matrices in Markovian models, such as hidden Markov models (HMMs), and how it makes very difficult the task of learning to represent long-term context for sequential data.

Abstract:

This paper studies the problem of ergodicity of transition probability matrices in Markovian models, such as hidden Markov models (HMMs), and how it makes very difficult the task of learning to represent long-term context for sequential data. This phenomenon hurts the forward propagation of long-term context information, as well as learning a hidden state representation to represent long-term context, which depends on propagating credit information backwards in time. Using results from Markov chain theory, we show that this problem of diffusion of context and credit is reduced when the transition probabilities approach 0 or 1, i.e., the transition probability matrices are sparse and the model essentially deterministic. The results found in this paper apply to learning approaches based on continuous optimization, such as gradient descent and the Baum-Welch algorithm.

Citations

PDF

Open Access

More filters

Book

Machine Learning : A Probabilistic Perspective

Kevin P. Murphy

TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

...read moreread less

Dynamic bayesian networks: representation, inference and learning

Kevin Murphy, +1 more

TL;DR: This thesis will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in Dbns, and how to learn DBN models from sequential data.

...read moreread less

Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies

Sepp Hochreiter, +1 more

TL;DR: D3EGF(FIH)J KMLONPEGQSRPETN UCV.

...read moreread less

Proceedings Article

Hierarchical Recurrent Neural Networks for Long-Term Dependencies

Salah El Hihi, +1 more

TL;DR: This paper proposes to use a more general type of a-priori knowledge, namely that the temporal dependencies are structured hierarchically, which implies that long-term dependencies are represented by variables with a long time scale.

...read moreread less

Journal ArticleDOI

Input-output HMMs for sequence processing

Yoshua Bengio, +1 more

- 01 Sep 1996 -

IEEE Transactions on Neural Networks

TL;DR: It is demonstrated that IOHMMs are well suited for solving grammatical inference problems on a benchmark problem and able to map input sequences to output sequences, using the same processing style as recurrent neural networks.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Maximum likelihood from incomplete data via the EM algorithm

Arthur P. Dempster, +2 more

- 01 Sep 1977 -

Journal of the royal statistical society...

Book ChapterDOI

Learning internal representations by error propagation

David E. Rumelhart, +2 more

TL;DR: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion.

...read moreread less

Journal ArticleDOI

Learning long-term dependencies with gradient descent is difficult

Yoshua Bengio, +2 more

- 01 Mar 1994 -

IEEE Transactions on Neural Networks

TL;DR: This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.

...read moreread less

Journal ArticleDOI

A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains

Leonard E. Baum, +3 more

- 01 Feb 1970 -

Annals of Mathematical Statistics

Journal ArticleDOI

A learning algorithm for continually running fully recurrent neural networks

Ronald J. Williams, +1 more

- 01 Jun 1989 -

Neural Computation

TL;DR: The exact form of a gradient-following learning algorithm for completely recurrent networks running in continually sampled time is derived and used as the basis for practical algorithms for temporal supervised learning tasks.

...read moreread less