Top 7 papers published by Yoshua Bengio from Université de Montréal in 1995

[...]

Yann LeCun, Yoshua Bengio¹, Yoshua Bengio², Yoshua Bengio³•Institutions (3)

AT&T¹, Alcatel-Lucent², École Polytechnique de Montréal³

01 Jan 1995

TL;DR: Title Type pattern recognition with neural networks in c++ PDF pattern recognition and neural networks PDF Neural networks for pattern recognition advanced texts in econometrics PDF neural networks for applied sciences and engineering from fundamentals to complex pattern recognition PDF

...read moreread less

Abstract: Title Type pattern recognition with neural networks in c++ PDF pattern recognition and neural networks PDF neural networks for pattern recognition advanced texts in econometrics PDF neural networks for applied sciences and engineering from fundamentals to complex pattern recognition PDF an introduction to biological and artificial neural networks for pattern recognition spie tutorial text vol tt04 tutorial texts in optical engineering PDF

...read moreread less

3,328 citations

Proceedings Article•

Hierarchical Recurrent Neural Networks for Long-Term Dependencies

[...]

Salah El Hihi¹, Yoshua Bengio¹•Institutions (1)

Université de Montréal¹

27 Nov 1995

TL;DR: This paper proposes to use a more general type of a-priori knowledge, namely that the temporal dependencies are structured hierarchically, which implies that long-term dependencies are represented by variables with a long time scale.

...read moreread less

Abstract: We have already shown that extracting long-term dependencies from sequential data is difficult, both for determimstic dynamical systems such as recurrent networks, and probabilistic models such as hidden Markov models (HMMs) or input/output hidden Markov models (IOHMMs). In practice, to avoid this problem, researchers have used domain specific a-priori knowledge to give meaning to the hidden or state variables representing past context. In this paper, we propose to use a more general type of a-priori knowledge, namely that the temporal dependencies are structured hierarchically. This implies that long-term dependencies are represented by variables with a long time scale. This principle is applied to a recurrent network which includes delays and multiple time scales. Experiments confirm the advantages of such structures. A similar approach is proposed for HMMs and IOHMMs.

...read moreread less

363 citations

Journal Article•DOI•

LeRec: a NN/HMM hybrid for on-line handwriting recognition

[...]

Yoshua Bengio¹, Yann LeCun¹, Craig R. Nohl¹, Chris Burges¹•Institutions (1)

Bell Labs¹

01 Nov 1995-Neural Computation

TL;DR: A new approach for on-line recognition of handwritten words written in unconstrained mixed style by fitting a model of the word structure using the EM algorithm to minimize word-level errors.

...read moreread less

Abstract: We introduce a new approach for on-line recognition of handwritten words written in unconstrained mixed style. The preprocessor performs a word-level normalization by fitting a model of the word structure using the EM algorithm. Words are then coded into low resolution "annotated images" where each pixel contains information about trajectory direction and curvature. The recognizer is a convolution network that can be spatially replicated. From the network output, a hidden Markov model produces word scores. The entire system is globally trained to minimize word-level errors.

...read moreread less

152 citations

Proceedings Article•

Recurrent Neural Networks for Missing or Asynchronous Data

[...]

Yoshua Bengio¹, Francois Gingras¹•Institutions (1)

Université de Montréal¹

27 Nov 1995

TL;DR: Recurrent neural networks with feedback into the input units for handling two types of data analysis problems, including static data and sequential data, when some of the input variables are missing or are available at different frequencies.

...read moreread less

Abstract: In this paper we propose recurrent neural networks with feedback into the input units for handling two types of data analysis problems. On the one hand, this scheme can be used for static data when some of the input variables are missing. On the other hand, it can also be used for sequential data, when some of the input variables are missing or are available at different frequencies. Unlike in the case of probabilistic models (e.g. Gaussian) of the missing variables, the network does not attempt to model the distribution of the missing variables given the observed variables. Instead it is a more "discriminant" approach that fills in the missing variables for the sole purpose of minimizing a learning criterion (e.g., to minimize an output error).

...read moreread less

84 citations

Posted Content•

Diffusion of Context and Credit Information in Markovian Models

[...]

Yoshua Bengio¹, Paolo Frasconi²•Institutions (2)

Université de Montréal¹, University of Florence²

01 Oct 1995-arXiv: Artificial Intelligence

TL;DR: This article studied the problem of ergodicity of transition probability matrices in Markovian models, such as hidden Markov models (HMMs), and how it makes very difficult the task of learning to represent long-term context for sequential data.

...read moreread less

Abstract: This paper studies the problem of ergodicity of transition probability matrices in Markovian models, such as hidden Markov models (HMMs), and how it makes very difficult the task of learning to represent long-term context for sequential data. This phenomenon hurts the forward propagation of long-term context information, as well as learning a hidden state representation to represent long-term context, which depends on propagating credit information backwards in time. Using results from Markov chain theory, we show that this problem of diffusion of context and credit is reduced when the transition probabilities approach 0 or 1, i.e., the transition probability matrices are sparse and the model essentially deterministic. The results found in this paper apply to learning approaches based on continuous optimization, such as gradient descent and the Baum-Welch algorithm.

...read moreread less

38 citations

Journal Article•DOI•

Diffusion of context and credit information in Markovian models

[...]

Yoshua Bengio¹, Paolo Frasconi²•Institutions (2)

Université de Montréal¹, University of Florence²

01 Jun 1995-Journal of Artificial Intelligence Research

TL;DR: This paper shows that the problem of diffusion of context and credit is reduced when the transition probabilities approach 0 or 1, i.e., the transition probability matrices are sparse and the model essentially deterministic.

...read moreread less

Abstract: This paper studies the problem of ergodicity of transition probability matrices in Markovian models, such as hidden Markov models (HMMs), and how it makes very difficult the task of learning to represent long-term context for sequential data. This phenomenon hurts the forward propagation of long-term context information, as well as learning a hidden state representation to represent long-term context, which depends on propagating credit information backwards in time. Using results from Markov chain theory, we show that this problem of diffusion of context and credit is reduced when the transition probabilities approach 0 or 1, i.e., the transition probability matrices are sparse and the model essentially deterministic. The results found in this paper apply to learning approaches based on continuous optimization, such as gradient descent and the Baum-Welch algorithm.

...read moreread less

38 citations

Journal Article•DOI•

On the search for new learning rules for ANNs

[...]

Samy Bengio¹, Yoshua Bengio², Jocelyn Cloutier²•Institutions (2)

Orange S.A.¹, Université de Montréal²

01 Jul 1995-Neural Processing Letters

TL;DR: A framework where a learning rule can be optimized within a parametric learning rule space and a theoretical study of their generalization properties when estimated from a set of learning tasks and tested over another set of tasks is presented.

...read moreread less

Abstract: In this paper, we present a framework where a learning rule can be optimized within a parametric learning rule space. We define what we callparametric learning rules and present a theoretical study of theirgeneralization properties when estimated from a set of learning tasks and tested over another set of tasks. We corroborate the results of this study with practical experiments.

...read moreread less

35 citations

Showing papers by "Yoshua Bengio published in 1995"