scispace - formally typeset
Search or ask a question

Showing papers by "Yoshua Bengio published in 1994"


Journal ArticleDOI
TL;DR: This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.
Abstract: Recurrent neural networks can be used to map input sequences to output sequences, such as for recognition, production or prediction problems. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals. We show why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases. These results expose a trade-off between efficient learning by gradient descent and latching on information for long periods. Based on an understanding of this problem, alternatives to standard gradient descent are considered. >

7,309 citations


Proceedings Article
01 Jan 1994
TL;DR: It is shown that the K-Means algorithm actually minimizes the quantization error using the very fast Newton algorithm.
Abstract: This paper studies the convergence properties of the well known K-Means clustering algorithm. The K-Means algorithm can be described either as a gradient descent algorithm or by slightly extending the mathematics of the EM algorithm to this hard threshold case. We show that the K-Means algorithm actually minimizes the quantization error using the very fast Newton algorithm.

476 citations


Proceedings Article
01 Jan 1994
TL;DR: A recurrent architecture having a modular structure that has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation is introduced.
Abstract: We introduce a recurrent architecture having a modular structure and we formulate a training procedure based on the EM algorithm. The resulting model has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation.

344 citations


Proceedings ArticleDOI
09 Oct 1994
TL;DR: A new approach for online recognition of handwritten words written in unconstrained mixed style where each pixel contains information about trajectory direction and curvature is introduced.
Abstract: We introduce a new approach for online recognition of handwritten words written in unconstrained mixed style. Words are represented by low resolution "annotated images" where each pixel contains information about trajectory direction and curvature. The recognizer is a convolutional network which can be spatially replicated. From the network output, a hidden Markov model produces word scores. The entire system is globally trained to minimize word-level errors.

70 citations


Proceedings ArticleDOI
27 Jun 1994
TL;DR: Experiments on classification tasks suggest genetic programming finds better learning rules than other optimization methods, and the best rule found with genetic programming outperformed the well-known backpropagation algorithm for a given set of tasks.
Abstract: In previous work we explained how to use standard optimization methods such as simulated annealing, gradient descent and genetic algorithms to optimize a parametric function which could be used as a learning rule for neural networks. To use these methods, we had to choose a fixed number of parameters and a rigid form for the learning rule. In this article, we propose to use genetic programming to find not only the values of rule parameters but also the optimal number of parameters and the form of the rule. Experiments on classification tasks suggest genetic programming finds better learning rules than other optimization methods. Furthermore, the best rule found with genetic programming outperformed the well-known backpropagation algorithm for a given set of tasks. >

53 citations


Proceedings Article
01 Jan 1994
TL;DR: A geometrical model of the word spatial structure is fitted to the pen trajectory using the EM algorithm and the fitting process maximizes the likelihood of the trajectory given the model and a set a priors on its parameters.
Abstract: We introduce a new approach to normalizing words written with an electronic stylus that applies to all styles of handwriting (upper case, lower case, printed, cursive, or mixed). A geometrical model of the word spatial structure is fitted to the pen trajectory using the EM algorithm. The fitting process maximizes the likelihood of the trajectory given the model and a set a priors on its parameters. The method was evaluated and integrated to a recognition system that combines neural networks and hidden Markov models.

34 citations


Proceedings ArticleDOI
09 Oct 1994
TL;DR: A geometrical model of the word spatial structure is fitted to the pen trajectory using the expectation-maximisation algorithm, which maximizes the likelihood of the trajectory given the model and a set a priors on its parameters.
Abstract: We introduce a new approach to normalizing words written with an electronic stylus that applies to all styles of handwriting (upper case, lower case, printed, cursive, or mixed). A geometrical model of the word spatial structure is fitted to the pen trajectory using the expectation-maximisation algorithm. The fitting process maximizes the likelihood of the trajectory given the model and a set a priors on its parameters. The method was evaluated and integrated to a recognition system that combines neural networks and hidden Markov models.

25 citations


Proceedings Article
01 Jan 1994
TL;DR: Using results from Markov chain theory, it is shown that the problem of diffusion is reduced if the transition probabilities approach 0 or 1, and under this condition, standard HMMs have very limited modeling capabilities, but input/output HMMs can still perform interesting computations.
Abstract: This paper studies the problem of diffusion in Markovian models, such as hidden Markov models (HMMs) and how it makes very difficult the task of learning of long-term dependencies in sequences. Using results from Markov chain theory, we show that the problem of diffusion is reduced if the transition probabilities approach 0 or 1. Under this condition, standard HMMs have very limited modeling capabilities, but input/output HMMs can still perform interesting computations.

19 citations


Proceedings ArticleDOI
31 Oct 1994
TL;DR: A new electronic pen-based visitors registration system (PENGUIN) whose goal is to expand and modernize the visitor sign-in procedure at Bell Laboratories, using a pen-interface coupled with a powerful and accurate on-line handwriting recognition module.
Abstract: We describe a new electronic pen-based visitors registration system (PENGUIN) whose goal is to expand and modernize the visitor sign-in procedure at Bell Laboratories. The system uses a pen-interface (i.e. tablet-display) in what is essentially a form filling application. Our pen-interface is coupled with a powerful and accurate on-line handwriting recognition module. A database of AT&T employees (the visitors' hosts) and country names is used to check the recognition module outputs, in order to find the best match. The system provides assistance to the guard at one of the guard stations in routing visitors to their hosts. All the entered data are stored electronically. Initial testing shows that PENGUIN system performs reliably and with high accuracy. It retrieves the correct host name with 97% accuracy and the correct visitors citizenship with 99% accuracy. The system is robust and easy to use for both visitors and guards. >

8 citations