scispace - formally typeset
Search or ask a question

Showing papers by "Yoshua Bengio published in 1995"


01 Jan 1995
TL;DR: Title Type pattern recognition with neural networks in c++ PDF pattern recognition and neural networks PDF Neural networks for pattern recognition advanced texts in econometrics PDF neural networks for applied sciences and engineering from fundamentals to complex pattern recognition PDF
Abstract: Title Type pattern recognition with neural networks in c++ PDF pattern recognition and neural networks PDF neural networks for pattern recognition advanced texts in econometrics PDF neural networks for applied sciences and engineering from fundamentals to complex pattern recognition PDF an introduction to biological and artificial neural networks for pattern recognition spie tutorial text vol tt04 tutorial texts in optical engineering PDF

3,328 citations


Proceedings Article
27 Nov 1995
TL;DR: This paper proposes to use a more general type of a-priori knowledge, namely that the temporal dependencies are structured hierarchically, which implies that long-term dependencies are represented by variables with a long time scale.
Abstract: We have already shown that extracting long-term dependencies from sequential data is difficult, both for determimstic dynamical systems such as recurrent networks, and probabilistic models such as hidden Markov models (HMMs) or input/output hidden Markov models (IOHMMs). In practice, to avoid this problem, researchers have used domain specific a-priori knowledge to give meaning to the hidden or state variables representing past context. In this paper, we propose to use a more general type of a-priori knowledge, namely that the temporal dependencies are structured hierarchically. This implies that long-term dependencies are represented by variables with a long time scale. This principle is applied to a recurrent network which includes delays and multiple time scales. Experiments confirm the advantages of such structures. A similar approach is proposed for HMMs and IOHMMs.

363 citations


Journal ArticleDOI
TL;DR: A new approach for on-line recognition of handwritten words written in unconstrained mixed style by fitting a model of the word structure using the EM algorithm to minimize word-level errors.
Abstract: We introduce a new approach for on-line recognition of handwritten words written in unconstrained mixed style. The preprocessor performs a word-level normalization by fitting a model of the word structure using the EM algorithm. Words are then coded into low resolution "annotated images" where each pixel contains information about trajectory direction and curvature. The recognizer is a convolution network that can be spatially replicated. From the network output, a hidden Markov model produces word scores. The entire system is globally trained to minimize word-level errors.

152 citations


Proceedings Article
27 Nov 1995
TL;DR: Recurrent neural networks with feedback into the input units for handling two types of data analysis problems, including static data and sequential data, when some of the input variables are missing or are available at different frequencies.
Abstract: In this paper we propose recurrent neural networks with feedback into the input units for handling two types of data analysis problems. On the one hand, this scheme can be used for static data when some of the input variables are missing. On the other hand, it can also be used for sequential data, when some of the input variables are missing or are available at different frequencies. Unlike in the case of probabilistic models (e.g. Gaussian) of the missing variables, the network does not attempt to model the distribution of the missing variables given the observed variables. Instead it is a more "discriminant" approach that fills in the missing variables for the sole purpose of minimizing a learning criterion (e.g., to minimize an output error).

84 citations


Posted Content
TL;DR: This article studied the problem of ergodicity of transition probability matrices in Markovian models, such as hidden Markov models (HMMs), and how it makes very difficult the task of learning to represent long-term context for sequential data.
Abstract: This paper studies the problem of ergodicity of transition probability matrices in Markovian models, such as hidden Markov models (HMMs), and how it makes very difficult the task of learning to represent long-term context for sequential data. This phenomenon hurts the forward propagation of long-term context information, as well as learning a hidden state representation to represent long-term context, which depends on propagating credit information backwards in time. Using results from Markov chain theory, we show that this problem of diffusion of context and credit is reduced when the transition probabilities approach 0 or 1, i.e., the transition probability matrices are sparse and the model essentially deterministic. The results found in this paper apply to learning approaches based on continuous optimization, such as gradient descent and the Baum-Welch algorithm.

38 citations


Journal ArticleDOI
TL;DR: This paper shows that the problem of diffusion of context and credit is reduced when the transition probabilities approach 0 or 1, i.e., the transition probability matrices are sparse and the model essentially deterministic.
Abstract: This paper studies the problem of ergodicity of transition probability matrices in Markovian models, such as hidden Markov models (HMMs), and how it makes very difficult the task of learning to represent long-term context for sequential data. This phenomenon hurts the forward propagation of long-term context information, as well as learning a hidden state representation to represent long-term context, which depends on propagating credit information backwards in time. Using results from Markov chain theory, we show that this problem of diffusion of context and credit is reduced when the transition probabilities approach 0 or 1, i.e., the transition probability matrices are sparse and the model essentially deterministic. The results found in this paper apply to learning approaches based on continuous optimization, such as gradient descent and the Baum-Welch algorithm.

38 citations


Journal ArticleDOI
TL;DR: A framework where a learning rule can be optimized within a parametric learning rule space and a theoretical study of their generalization properties when estimated from a set of learning tasks and tested over another set of tasks is presented.
Abstract: In this paper, we present a framework where a learning rule can be optimized within a parametric learning rule space. We define what we callparametric learning rules and present a theoretical study of theirgeneralization properties when estimated from a set of learning tasks and tested over another set of tasks. We corroborate the results of this study with practical experiments.

35 citations