Open AccessPosted Content
Diffusion of Context and Credit Information in Markovian Models
Yoshua Bengio,Paolo Frasconi +1 more
TLDR
This article studied the problem of ergodicity of transition probability matrices in Markovian models, such as hidden Markov models (HMMs), and how it makes very difficult the task of learning to represent long-term context for sequential data.Abstract:
This paper studies the problem of ergodicity of transition probability matrices in Markovian models, such as hidden Markov models (HMMs), and how it makes very difficult the task of learning to represent long-term context for sequential data. This phenomenon hurts the forward propagation of long-term context information, as well as learning a hidden state representation to represent long-term context, which depends on propagating credit information backwards in time. Using results from Markov chain theory, we show that this problem of diffusion of context and credit is reduced when the transition probabilities approach 0 or 1, i.e., the transition probability matrices are sparse and the model essentially deterministic. The results found in this paper apply to learning approaches based on continuous optimization, such as gradient descent and the Baum-Welch algorithm.read more
Citations
More filters
Book
Machine Learning : A Probabilistic Perspective
TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
Dynamic bayesian networks: representation, inference and learning
Kevin Murphy,Stuart Russell +1 more
TL;DR: This thesis will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in Dbns, and how to learn DBN models from sequential data.
Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies
Sepp Hochreiter,Yoshua Bengio +1 more
TL;DR: D3EGF(FIH)J KMLONPEGQSRPETN UCV.
Proceedings Article
Hierarchical Recurrent Neural Networks for Long-Term Dependencies
Salah El Hihi,Yoshua Bengio +1 more
TL;DR: This paper proposes to use a more general type of a-priori knowledge, namely that the temporal dependencies are structured hierarchically, which implies that long-term dependencies are represented by variables with a long time scale.
Journal ArticleDOI
Input-output HMMs for sequence processing
Yoshua Bengio,Paolo Frasconi +1 more
TL;DR: It is demonstrated that IOHMMs are well suited for solving grammatical inference problems on a benchmark problem and able to map input sequences to output sequences, using the same processing style as recurrent neural networks.
References
More filters
Journal ArticleDOI
Maximum likelihood from incomplete data via the EM algorithm
Book ChapterDOI
Learning internal representations by error propagation
TL;DR: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion.
Journal ArticleDOI
Learning long-term dependencies with gradient descent is difficult
TL;DR: This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.
Journal ArticleDOI
A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains
Journal ArticleDOI
A learning algorithm for continually running fully recurrent neural networks
Ronald J. Williams,David Zipser +1 more
TL;DR: The exact form of a gradient-following learning algorithm for completely recurrent networks running in continually sampled time is derived and used as the basis for practical algorithms for temporal supervised learning tasks.
Related Papers (5)
Learning control of finite Markov chains with unknown transition probabilities
Mitsuo Sato,K. Abe,H. Takeda +2 more