Open AccessProceedings Article
Diffusion of Credit in Markovian Models
Yoshua Bengio,Paolo Frasconi +1 more
- Vol. 7, pp 553-560
TLDR
Using results from Markov chain theory, it is shown that the problem of diffusion is reduced if the transition probabilities approach 0 or 1, and under this condition, standard HMMs have very limited modeling capabilities, but input/output HMMs can still perform interesting computations.Abstract:
This paper studies the problem of diffusion in Markovian models, such as hidden Markov models (HMMs) and how it makes very difficult the task of learning of long-term dependencies in sequences. Using results from Markov chain theory, we show that the problem of diffusion is reduced if the transition probabilities approach 0 or 1. Under this condition, standard HMMs have very limited modeling capabilities, but input/output HMMs can still perform interesting computations.read more
Citations
More filters
Proceedings Article
Hierarchical Recurrent Neural Networks for Long-Term Dependencies
Salah El Hihi,Yoshua Bengio +1 more
TL;DR: This paper proposes to use a more general type of a-priori knowledge, namely that the temporal dependencies are structured hierarchically, which implies that long-term dependencies are represented by variables with a long time scale.
Journal ArticleDOI
Discovery and segmentation of activities in video
M. Brand,V. Kettnaker +1 more
TL;DR: In this article, Hidden Markov Models (HMMs) are used to organize observed activity into meaningful states by minimizing the entropy of the joint distribution of the HMMs' internal state machine.
Journal ArticleDOI
Input-output HMMs for sequence processing
Yoshua Bengio,Paolo Frasconi +1 more
TL;DR: It is demonstrated that IOHMMs are well suited for solving grammatical inference problems on a benchmark problem and able to map input sequences to output sequences, using the same processing style as recurrent neural networks.
Journal ArticleDOI
Structure learning in conditional probability models via an entropic prior and parameter extinction
TL;DR: An entropic prior is introduced for multinomial parameter estimation problems and the resulting models show superior generalization to held-out test data, and a guarantee that any such deletion will increase the posterior probability of the model.
Journal ArticleDOI
Clifford Support Vector Machines for Classification, Regression, and Recurrence
TL;DR: It is shown that one can apply CSVM for classification and regression and also to build a recurrent CSVM, which is an attractive approach for the multiple input multiple output processing of high-dimensional geometric entities.
References
More filters
Journal ArticleDOI
Learning long-term dependencies with gradient descent is difficult
TL;DR: This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.
Book
Non-negative Matrices and Markov Chains
TL;DR: Finite Non-Negative Matrices as mentioned in this paper are a generalization of finite stochastic matrices, and finite non-negative matrices have been studied extensively in the literature.
Proceedings Article
An Input Output HMM Architecture
Yoshua Bengio,Paolo Frasconi +1 more
TL;DR: A recurrent architecture having a modular structure that has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation is introduced.
Journal ArticleDOI
Unified integration of explicit knowledge and learning by example in recurrent networks
TL;DR: A novel unified approach for integrating explicit knowledge and learning by example in recurrent networks is proposed, which is accomplished by using a technique based on linear programming, instead of learning from random initial weights.
Proceedings Article
Credit Assignment through Time: Alternatives to Backpropagation
Yoshua Bengio,Paolo Frasconi +1 more
TL;DR: This work considers and compares alternative algorithms and architectures on tasks for which the span of the input/output dependencies can be controlled and shows performance qualitatively superior to that obtained with backpropagation.