Showing papers by "Yoshua Bengio published in 1997"

PDF

Open Access

Proceedings Article•DOI•

Global training of document processing systems using graph transformer networks

[...]

Léon Bottou¹, Yoshua Bengio², Y. Le Cun²•Institutions (2)

17 Jun 1997

TL;DR: A new machine learning paradigm called Graph Transformer Networks is proposed that extends the applicability of gradient-based learning algorithms to systems composed of modules that take graphs as inputs and produce graphs as output.

...read moreread less

Abstract: We propose a new machine learning paradigm called Graph Transformer Networks that extends the applicability of gradient-based learning algorithms to systems composed of modules that take graphs as inputs and produce graphs as output. Training is performed by computing gradients of a global objective function with respect to all the parameters in the system using a kind of back-propagation procedure. A complete check reading system based on these concepts is described. The system uses convolutional neural network character recognizers, combined with global training techniques to provide record accuracy on business and personal checks. It is presently deployed commercially and reads million of checks per month.

...read moreread less

125 citations

Journal Article•DOI•

Using a Financial Training Criterion Rather than a Prediction Criterion

[...]

Yoshua Bengio¹•Institutions (1)

Université de Montréal¹

01 Aug 1997-International Journal of Neural Systems

TL;DR: It is found with noisy time series that better results can be obtained when the model is directly trained in order to maximize the financial criterion of interest, here gains and losses incurred during trading.

...read moreread less

Abstract: The application of this work is to decision making with financial time series, using learning algorithms. The traditional approach is to train a model using a prediction criterion, such as minimizing the squared error between predictions and actual values of a dependent variable, or maximizing the likelihood of a conditional model of the dependent variable. We find here with noisy time series that better results can be obtained when the model is directly trained in order to maximize the financial criterion of interest, here gains and losses (including those due to transactions) incurred during trading. Experiments were performed on portfolio selection with 35 Canadian stocks.

...read moreread less

66 citations

Proceedings Article•

Training Methods for Adaptive Boosting of Neural Networks

[...]

Holger Schwenk¹, Yoshua Bengio¹•Institutions (1)

Université de Montréal¹

01 Dec 1997

TL;DR: This paper uses AdaBoost to improve the performances of neural networks and compares training methods based on sampling the training set and weighting the cost function.

...read moreread less

Abstract: "Boosting" is a general method for improving the performance of any learning algorithm that consistently generates classifiers which need to perform only slightly better than random guessing. A recently proposed and very promising boosting algorithm is AdaBoost [5]. It has been applied with great success to several benchmark machine learning problems using rather simple learning algorithms [4], and decision trees [1, 2, 6]. In this paper we use AdaBoost to improve the performances of neural networks. We compare training methods based on sampling the training set and weighting the cost function. Our system achieves about 1.4% error on a data base of online handwritten digits from more than 200 writers. Adaptive boosting of a multi-layer network achieved 1.5% error on the UCI Letters and 8.1 % error on the UCI satellite data set.

...read moreread less

57 citations

Proceedings Article•DOI•

Reading checks with multilayer graph transformer networks

[...]

Yann Le Cun¹, Léon Bottou², Yoshua Bengio²•Institutions (2)

Bell Labs¹, AT&T²

21 Apr 1997

TL;DR: A new machine learning paradigm called multilayer graph transformer network is proposed that extends the applicability of gradient-based learning algorithms to systems composed of modules that take graphs as input and produce graphs as output.

...read moreread less

Abstract: We propose a new machine learning paradigm called multilayer graph transformer network that extends the applicability of gradient-based learning algorithms to systems composed of modules that take graphs as input and produce graphs as output. A complete check reading system based on this concept is described. The system combines convolutional neural network character recognizers with graph-based stochastic models trained cooperatively at the document level. It is deployed commercially and reads million of business and personal checks per month with record accuracy.

...read moreread less

49 citations

Book Chapter•DOI•

AdaBoosting Neural Networks: Application to on-line Character Recognition

[...]

Holger Schwenk¹, Yoshua Bengio¹, Yoshua Bengio²•Institutions (2)

Université de Montréal¹, Bell Labs²

08 Oct 1997

TL;DR: AdaBoost is used to improve the performances of a strong learning algorithm: a neural network based on-line character recognition system and it is shown that it can be used to learn automatically a great variety of writing styles even when the amount of training data for each style varies a lot.

...read moreread less

Abstract: “Boosting” is a general method for improving the performance of any weak learning algorithm that consistently generates classifiers which need to perform only slightly better than random guessing. A recently proposed and very promising boosting algorithm is AdaBoost [4]. It has been applied with great success to several benchmark machine learning problems using rather simple learning algorithms [3], in particular decision trees [1,2,5]. In this paper we use AdaBoost to improve the performances of a strong learning algorithm: a neural network based on-line character recognition system. In particular we will show that it can be used to learn automatically a great variety of writing styles even when the amount of training data for each style varies a lot. Our system achieves about 1.4 % error on a handwritten digit data base of more than 200 writers.

...read moreread less

44 citations

Patent•

Module for constructing trainable modular network in which each module inputs and outputs data structured as a graph

[...]

Yoshua Bengio¹, Léon Bottou¹, Yann LeCun¹•Institutions (1)

AT&T¹

11 Mar 1997

TL;DR: In this paper, a check reading system based on graph transformer networks is described, which uses convolutional neural network character recognizers, combined with global training techniques to provide record accuracy on business and personal checks.

...read moreread less

Abstract: A machine learning paradigm called Graph Transformer Networks extends the applicability of gradient-based learning algorithms to systems composed of modules that take graphs as inputs and produce graphs as output. Training is performed by computing gradients of a global objective function with respect to all the parameters in the system using a kind of back-propagation procedure. A complete check reading system based on these concept is described. The system uses convolutional neural network character recognizers, combined with global training techniques to provides record accuracy on business and personal checks.

...read moreread less

42 citations

Proceedings Article•

Discriminative feature and model design for automatic speech recognition

[...]

Mazin G. Rahim, Yoshua Bengio¹, Yoshua Bengio², Yoshua Bengio³, Yann LeCun - Show less +1 more•Institutions (3)

École Polytechnique de Montréal¹, AT&T², Alcatel-Lucent³

01 Jan 1997

TL;DR: This paper compares the use of linear and non-linear functional transformations when applied to conventional recognition features, such as spectrum or cepstrum and provides a framework for integrated feature and model training when using class-speci c transformations.

...read moreread less

Abstract: AUTOMATIC SPEECH RECOGNITION Mazin Rahim, Yoshua Bengio and Yann LeCun AT&T Labs Research, 600 Mountain Avenue, Murray Hill, New Jersey 07974, USA ABSTRACT A system for discriminative feature and model design is presented for automatic speech recognition. Training based on minimum classi cation error with a single objective function is applied for designing a set of parallel networks performing feature transformation and a set of hidden Markov models performing speech recognition. This paper compares the use of linear and non-linear functional transformations when applied to conventional recognition features, such as spectrum or cepstrum. It also provides a framework for integrated feature and model training when using class-speci c transformations. Experimental results on telephone-based connected digit recognition are presented.

...read moreread less

17 citations

Proceedings Article•

Shared Context Probabilistic Transducers

[...]

Yoshua Bengio¹, Samy Bengio, Jean-Franc Isabelle, Yoram Singer²•Institutions (2)

Université de Montréal¹, AT&T Labs²

01 Dec 1997

TL;DR: A new, more compact, transducer model in which one shares the parameters of distributions associated to contexts yielding similar conditional output distributions is proposed.

...read moreread less

Abstract: Recently, a model for supervised learning of probabilistic transducers represented by suffix trees was introduced. However, this algorithm tends to build very large trees, requiring very large amounts of computer memory. In this paper, we propose a new, more compact, transducer model in which one shares the parameters of distributions associated to contexts yielding similar conditional output distributions. We illustrate the advantages of the proposed algorithm with comparative experiments on inducing a noun phrase recognizer.

...read moreread less

4 citations

A Memory-Efficient Adaptive Huffman Coding Algorithm for Very Large Sets of Symbols Revisited

[...]

Steven Pigeon¹, Yoshua Bengio¹•Institutions (1)

Université de Montréal¹

01 Jan 1997

TL;DR: In this article, a modification of the M algorithm that allows negative updates is presented, where negative updates are used to maintain a window over the source, where symbols enter the window at its right and leave it at its left, after w steps (the window width).

...read moreread less

Abstract: While algorithm M (presented in A Memory-Efficient Huffman Adaptive Coding Algorithm for Very Large Sets of Symbols, by Steven Pigeon & Yoshua Bengio, Universite de Montreal technical report #1081 [1]) converges to the entropy of the signal, it also assumes that the characteristics of the signal are stationary, that is, that they do not change over time and that successive adjustments, ever decreasing in their magnitude, will lead to a reasonable approximation of the entropy. While this is true for some data, it is clearly not true for some other. We present here a modification of the M algorithm that allows negative updates. Negative updates are used to maintain a window over the source. Symbols enter the window at its right and will leave it at its left, after w steps (the window width). The algorithm presented here allows us to update correctly the weights of the symbols in the symbol tree. Here, we will also have negative migration or demotion , while we only had positive migration or

...read moreread less

3 citations