Showing papers by "Yoshua Bengio published in 1994"

PDF

Open Access

Journal Article•DOI•

Learning long-term dependencies with gradient descent is difficult

[...]

Yoshua Bengio¹, Patrice Y. Simard², Paolo Frasconi³•Institutions (3)

Université de Montréal¹, AT&T², University of Florence³

01 Mar 1994-IEEE Transactions on Neural Networks

TL;DR: This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.

...read moreread less

Abstract: Recurrent neural networks can be used to map input sequences to output sequences, such as for recognition, production or prediction problems. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals. We show why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases. These results expose a trade-off between efficient learning by gradient descent and latching on information for long periods. Based on an understanding of this problem, alternatives to standard gradient descent are considered. >

...read moreread less

7,309 citations

Proceedings Article•

Convergence Properties of the K-Means Algorithms

[...]

Léon Bottou, Yoshua Bengio¹•Institutions (1)

Université de Montréal¹

01 Jan 1994

TL;DR: It is shown that the K-Means algorithm actually minimizes the quantization error using the very fast Newton algorithm.

...read moreread less

Abstract: This paper studies the convergence properties of the well known K-Means clustering algorithm. The K-Means algorithm can be described either as a gradient descent algorithm or by slightly extending the mathematics of the EM algorithm to this hard threshold case. We show that the K-Means algorithm actually minimizes the quantization error using the very fast Newton algorithm.

...read moreread less

476 citations

Proceedings Article•

An Input Output HMM Architecture

[...]

Yoshua Bengio¹, Paolo Frasconi²•Institutions (2)

Université de Montréal¹, University of Florence²

01 Jan 1994

TL;DR: A recurrent architecture having a modular structure that has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation is introduced.

...read moreread less

Abstract: We introduce a recurrent architecture having a modular structure and we formulate a training procedure based on the EM algorithm. The resulting model has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation.

...read moreread less

344 citations

Proceedings Article•DOI•

Word-level training of a handwritten word recognizer based on convolutional neural networks

[...]

Y. Le Cun¹, Yoshua Bengio², Yoshua Bengio³, Yoshua Bengio⁴•Institutions (4)

Bell Labs¹, École Polytechnique de Montréal², AT&T³, Alcatel-Lucent⁴

09 Oct 1994

TL;DR: A new approach for online recognition of handwritten words written in unconstrained mixed style where each pixel contains information about trajectory direction and curvature is introduced.

...read moreread less

Abstract: We introduce a new approach for online recognition of handwritten words written in unconstrained mixed style. Words are represented by low resolution "annotated images" where each pixel contains information about trajectory direction and curvature. The recognizer is a convolutional network which can be spatially replicated. From the network output, a hidden Markov model produces word scores. The entire system is globally trained to minimize word-level errors.

...read moreread less

70 citations

Proceedings Article•DOI•

Use of genetic programming for the search of a new learning rule for neural networks

[...]

Samy Bengio¹, Yoshua Bengio¹, Jocelyn Cloutier¹•Institutions (1)

Université de Montréal¹

27 Jun 1994

TL;DR: Experiments on classification tasks suggest genetic programming finds better learning rules than other optimization methods, and the best rule found with genetic programming outperformed the well-known backpropagation algorithm for a given set of tasks.

...read moreread less

Abstract: In previous work we explained how to use standard optimization methods such as simulated annealing, gradient descent and genetic algorithms to optimize a parametric function which could be used as a learning rule for neural networks. To use these methods, we had to choose a fixed number of parameters and a rigid form for the learning rule. In this article, we propose to use genetic programming to find not only the values of rule parameters but also the optimal number of parameters and the form of the rule. Experiments on classification tasks suggest genetic programming finds better learning rules than other optimization methods. Furthermore, the best rule found with genetic programming outperformed the well-known backpropagation algorithm for a given set of tasks. >

...read moreread less

53 citations

Proceedings Article•

Word normalization for on-line handwritten word recognition

[...]

Yoshua Bengio¹, Yoshua Bengio², Yoshua Bengio³, Yann LeCun²•Institutions (3)

École Polytechnique de Montréal¹, AT&T², Alcatel-Lucent³

01 Jan 1994

TL;DR: A geometrical model of the word spatial structure is fitted to the pen trajectory using the EM algorithm and the fitting process maximizes the likelihood of the trajectory given the model and a set a priors on its parameters.

...read moreread less

Abstract: We introduce a new approach to normalizing words written with an electronic stylus that applies to all styles of handwriting (upper case, lower case, printed, cursive, or mixed). A geometrical model of the word spatial structure is fitted to the pen trajectory using the EM algorithm. The fitting process maximizes the likelihood of the trajectory given the model and a set a priors on its parameters. The method was evaluated and integrated to a recognition system that combines neural networks and hidden Markov models.

...read moreread less

34 citations

Proceedings Article•DOI•

Word normalization for online handwritten word recognition

[...]

Yoshua Bengio¹, Y. Le Cun•Institutions (1)

Université de Montréal¹

09 Oct 1994

TL;DR: A geometrical model of the word spatial structure is fitted to the pen trajectory using the expectation-maximisation algorithm, which maximizes the likelihood of the trajectory given the model and a set a priors on its parameters.

...read moreread less

Abstract: We introduce a new approach to normalizing words written with an electronic stylus that applies to all styles of handwriting (upper case, lower case, printed, cursive, or mixed). A geometrical model of the word spatial structure is fitted to the pen trajectory using the expectation-maximisation algorithm. The fitting process maximizes the likelihood of the trajectory given the model and a set a priors on its parameters. The method was evaluated and integrated to a recognition system that combines neural networks and hidden Markov models.

...read moreread less

25 citations

Proceedings Article•

Diffusion of Credit in Markovian Models

[...]

Yoshua Bengio¹, Paolo Frasconi²•Institutions (2)

Université de Montréal¹, University of Florence²

01 Jan 1994

TL;DR: Using results from Markov chain theory, it is shown that the problem of diffusion is reduced if the transition probabilities approach 0 or 1, and under this condition, standard HMMs have very limited modeling capabilities, but input/output HMMs can still perform interesting computations.

...read moreread less

Abstract: This paper studies the problem of diffusion in Markovian models, such as hidden Markov models (HMMs) and how it makes very difficult the task of learning of long-term dependencies in sequences. Using results from Markov chain theory, we show that the problem of diffusion is reduced if the transition probabilities approach 0 or 1. Under this condition, standard HMMs have very limited modeling capabilities, but input/output HMMs can still perform interesting computations.

...read moreread less

19 citations

Proceedings Article•DOI•

Pen-based visitor registration system (PENGUIN)

[...]

N. Matic¹, D. Henderson¹, Y. Le Cun¹, Yoshua Bengio¹•Institutions (1)

Bell Labs¹

31 Oct 1994

TL;DR: A new electronic pen-based visitors registration system (PENGUIN) whose goal is to expand and modernize the visitor sign-in procedure at Bell Laboratories, using a pen-interface coupled with a powerful and accurate on-line handwriting recognition module.

...read moreread less

Abstract: We describe a new electronic pen-based visitors registration system (PENGUIN) whose goal is to expand and modernize the visitor sign-in procedure at Bell Laboratories. The system uses a pen-interface (i.e. tablet-display) in what is essentially a form filling application. Our pen-interface is coupled with a powerful and accurate on-line handwriting recognition module. A database of AT&T employees (the visitors' hosts) and country names is used to check the recognition module outputs, in order to find the best match. The system provides assistance to the guard at one of the guard stations in routing visitors to their hosts. All the entered data are stored electronically. Initial testing shows that PENGUIN system performs reliably and with high accuracy. It retrieves the correct host name with 97% accuracy and the correct visitors citizenship with 99% accuracy. The system is robust and easy to use for both visitors and guards. >

...read moreread less

8 citations