scispace - formally typeset
Search or ask a question

Showing papers by "Yoshua Bengio published in 1997"


Proceedings ArticleDOI
17 Jun 1997
TL;DR: A new machine learning paradigm called Graph Transformer Networks is proposed that extends the applicability of gradient-based learning algorithms to systems composed of modules that take graphs as inputs and produce graphs as output.
Abstract: We propose a new machine learning paradigm called Graph Transformer Networks that extends the applicability of gradient-based learning algorithms to systems composed of modules that take graphs as inputs and produce graphs as output. Training is performed by computing gradients of a global objective function with respect to all the parameters in the system using a kind of back-propagation procedure. A complete check reading system based on these concepts is described. The system uses convolutional neural network character recognizers, combined with global training techniques to provide record accuracy on business and personal checks. It is presently deployed commercially and reads million of checks per month.

125 citations


Journal ArticleDOI
TL;DR: It is found with noisy time series that better results can be obtained when the model is directly trained in order to maximize the financial criterion of interest, here gains and losses incurred during trading.
Abstract: The application of this work is to decision making with financial time series, using learning algorithms. The traditional approach is to train a model using a prediction criterion, such as minimizing the squared error between predictions and actual values of a dependent variable, or maximizing the likelihood of a conditional model of the dependent variable. We find here with noisy time series that better results can be obtained when the model is directly trained in order to maximize the financial criterion of interest, here gains and losses (including those due to transactions) incurred during trading. Experiments were performed on portfolio selection with 35 Canadian stocks.

66 citations


Proceedings Article
01 Dec 1997
TL;DR: This paper uses AdaBoost to improve the performances of neural networks and compares training methods based on sampling the training set and weighting the cost function.
Abstract: "Boosting" is a general method for improving the performance of any learning algorithm that consistently generates classifiers which need to perform only slightly better than random guessing. A recently proposed and very promising boosting algorithm is AdaBoost [5]. It has been applied with great success to several benchmark machine learning problems using rather simple learning algorithms [4], and decision trees [1, 2, 6]. In this paper we use AdaBoost to improve the performances of neural networks. We compare training methods based on sampling the training set and weighting the cost function. Our system achieves about 1.4% error on a data base of online handwritten digits from more than 200 writers. Adaptive boosting of a multi-layer network achieved 1.5% error on the UCI Letters and 8.1 % error on the UCI satellite data set.

57 citations


Proceedings ArticleDOI
21 Apr 1997
TL;DR: A new machine learning paradigm called multilayer graph transformer network is proposed that extends the applicability of gradient-based learning algorithms to systems composed of modules that take graphs as input and produce graphs as output.
Abstract: We propose a new machine learning paradigm called multilayer graph transformer network that extends the applicability of gradient-based learning algorithms to systems composed of modules that take graphs as input and produce graphs as output. A complete check reading system based on this concept is described. The system combines convolutional neural network character recognizers with graph-based stochastic models trained cooperatively at the document level. It is deployed commercially and reads million of business and personal checks per month with record accuracy.

49 citations


Book ChapterDOI
08 Oct 1997
TL;DR: AdaBoost is used to improve the performances of a strong learning algorithm: a neural network based on-line character recognition system and it is shown that it can be used to learn automatically a great variety of writing styles even when the amount of training data for each style varies a lot.
Abstract: “Boosting” is a general method for improving the performance of any weak learning algorithm that consistently generates classifiers which need to perform only slightly better than random guessing. A recently proposed and very promising boosting algorithm is AdaBoost [4]. It has been applied with great success to several benchmark machine learning problems using rather simple learning algorithms [3], in particular decision trees [1,2,5]. In this paper we use AdaBoost to improve the performances of a strong learning algorithm: a neural network based on-line character recognition system. In particular we will show that it can be used to learn automatically a great variety of writing styles even when the amount of training data for each style varies a lot. Our system achieves about 1.4 % error on a handwritten digit data base of more than 200 writers.

44 citations


Patent
Yoshua Bengio1, Léon Bottou1, Yann LeCun1
11 Mar 1997
TL;DR: In this paper, a check reading system based on graph transformer networks is described, which uses convolutional neural network character recognizers, combined with global training techniques to provide record accuracy on business and personal checks.
Abstract: A machine learning paradigm called Graph Transformer Networks extends the applicability of gradient-based learning algorithms to systems composed of modules that take graphs as inputs and produce graphs as output. Training is performed by computing gradients of a global objective function with respect to all the parameters in the system using a kind of back-propagation procedure. A complete check reading system based on these concept is described. The system uses convolutional neural network character recognizers, combined with global training techniques to provides record accuracy on business and personal checks.

42 citations


Proceedings Article
01 Jan 1997
TL;DR: This paper compares the use of linear and non-linear functional transformations when applied to conventional recognition features, such as spectrum or cepstrum and provides a framework for integrated feature and model training when using class-speci c transformations.
Abstract: AUTOMATIC SPEECH RECOGNITION Mazin Rahim, Yoshua Bengio and Yann LeCun AT&T Labs Research, 600 Mountain Avenue, Murray Hill, New Jersey 07974, USA ABSTRACT A system for discriminative feature and model design is presented for automatic speech recognition. Training based on minimum classi cation error with a single objective function is applied for designing a set of parallel networks performing feature transformation and a set of hidden Markov models performing speech recognition. This paper compares the use of linear and non-linear functional transformations when applied to conventional recognition features, such as spectrum or cepstrum. It also provides a framework for integrated feature and model training when using class-speci c transformations. Experimental results on telephone-based connected digit recognition are presented.

17 citations


Proceedings Article
01 Dec 1997
TL;DR: A new, more compact, transducer model in which one shares the parameters of distributions associated to contexts yielding similar conditional output distributions is proposed.
Abstract: Recently, a model for supervised learning of probabilistic transducers represented by suffix trees was introduced. However, this algorithm tends to build very large trees, requiring very large amounts of computer memory. In this paper, we propose a new, more compact, transducer model in which one shares the parameters of distributions associated to contexts yielding similar conditional output distributions. We illustrate the advantages of the proposed algorithm with comparative experiments on inducing a noun phrase recognizer.

4 citations


01 Jan 1997
TL;DR: In this article, a modification of the M algorithm that allows negative updates is presented, where negative updates are used to maintain a window over the source, where symbols enter the window at its right and leave it at its left, after w steps (the window width).
Abstract: While algorithm M (presented in A Memory-Efficient Huffman Adaptive Coding Algorithm for Very Large Sets of Symbols, by Steven Pigeon & Yoshua Bengio, Universite de Montreal technical report #1081 [1]) converges to the entropy of the signal, it also assumes that the characteristics of the signal are stationary, that is, that they do not change over time and that successive adjustments, ever decreasing in their magnitude, will lead to a reasonable approximation of the entropy. While this is true for some data, it is clearly not true for some other. We present here a modification of the M algorithm that allows negative updates. Negative updates are used to maintain a window over the source. Symbols enter the window at its right and will leave it at its left, after w steps (the window width). The algorithm presented here allows us to update correctly the weights of the symbols in the symbol tree. Here, we will also have negative migration or demotion , while we only had positive migration or

3 citations