scispace - formally typeset
Search or ask a question

Showing papers by "Yoshua Bengio published in 1998"


Journal ArticleDOI
01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

42,067 citations



Journal ArticleDOI
TL;DR: A new image compression technique called DjVu is presented that enables fast transmission of document images over low-speed connections, while faithfully reproducing the visual aspect of the document, including color, fonts, pictures, and paper texture.

312 citations


Proceedings ArticleDOI
30 Mar 1998
TL;DR: The Z-coder is a new adaptive data compression coder for coding binary data, derived from the Golomb (1966) run-length coder, and retains most of the speed and simplicity of the earlier coder.
Abstract: We present the Z-coder, a new adaptive data compression coder for coding binary data. The Z-coder is derived from the Golomb (1966) run-length coder, and retains most of the speed and simplicity of the earlier coder. The Z-coder can also be thought of as a multiplication-free approximate arithmetic coder, showing the close relationship between run-length coding and arithmetic coding. The Z-coder improves upon existing arithmetic coders by its speed and its principled design. We present a derivation of the Z-coder as well as details of the construction of its adaptive probability estimation table.

59 citations


Proceedings ArticleDOI
22 Apr 1998
TL;DR: Presents a new image compression technique called DjVu that is specifically geared towards the compression of high-resolution, high-quality images of scanned documents in color, and is available as a plug-in for popular Web browsers.
Abstract: Presents a new image compression technique called "DjVu" that is specifically geared towards the compression of high-resolution, high-quality images of scanned documents in color. With DjVu, any screen connected to the Internet can access and display images of scanned pages while faithfully reproducing the font, color, drawings, pictures and paper texture. A typical magazine page in color at 300 dpi can be compressed down to between 40 to 60 KBytes, approximately 5 to 10 times better than JPEG for a similar level of subjective quality. Black-and-white documents are typically 15 to 30 KBytes at 300 dpi, or 4 to 8 times better than CCITT-G4. A real-time, memory-efficient version of the decoder was implemented, and is available as a plug-in for popular Web browsers.

32 citations



Proceedings ArticleDOI
30 Mar 1998
TL;DR: A new algorithm for adaptive Huffman coding, called algorithm M, that uses space proportional to the number of frequency classes, and uses a tree with leaves that represent sets of symbols with the same frequency, rather than individual symbols.
Abstract: Summary form only given. The problem of computing the minimum redundancy codes as we observe symbols one by one has received a lot of attention. However, existing algorithms implicitly assumes that either we have a small alphabet or that we have an arbitrary amount of memory at our disposal for the creation of a coding tree. In real life applications one may need to encode symbols coming from a much larger alphabet, for e.g. coding integers. We introduce a new algorithm for adaptive Huffman coding, called algorithm M, that uses space proportional to the number of frequency classes. The algorithm uses a tree with leaves that represent sets of symbols with the same frequency, rather than individual symbols. The code for each symbol is therefore composed of a prefix (specifying the set, or the leaf of the tree) and a suffix (specifying the symbol within the set of same-frequency symbols). The algorithm uses only two operations to remain as close as possible to the optimal: set migration and rebalancing. We analyze the computational complexity of algorithm M, and point to its advantages in terms of low memory complexity and fast decoding. Comparative experiments were performed with algorithm M on the Calgary corpus, with static Huffman coding as well as with another adaptive Huffman coding algorithms, algorithm /spl Lambda/ of Vitter. Experiments show that M performs comparably or better than the other algorithms but requires much less memory. Finally, we present an improved algorithm, M/sup +/, for non-stationary data, which models the distribution of the data in a fixed-size window in the data sequence.

9 citations


Posted Content
TL;DR: In this article, the authors show that better results can be obtained when the model is directly trained in order to maximize the financial criterion of interest, here gains and losses (including those due to transactions) incurred during trading.
Abstract: The application of this work is to decision taking with financial time-series, using learning algorithms. The traditional approach is to train a model using a prediction criterion, such as minimizing the squared error between predictions and actual values of a dependent variable, or maximizing the likelihood of a conditional model of the dependent variable. We find here with noisy time-series that better results can be obtained when the model is directly trained in order to maximize the financial criterion of interest, here gains and losses (including those due to transactions) incurred during trading. Experiments were performed on portfolio selection with 35 Canadian stocks Ce rapport presente une application des algorithmes d'apprentissage aux series chronologiques financieres. L'approche traditionnelle est basee sur l'estimation d'un modele de prediction, qui minimise par exemple l'erreur quadratique entre les predictions et les realisations de la variable a predire, ou qui maximise la vraisemblance d'un modele conditionnel de la variable dependante. Nos resultats sur des series financieres montrent que de meilleurs resultats peuvent etre obtenus quand les parametres du modeles sont plutot choisis de maniere a maximiser le critere financier voulu, ici les profits en tenant compte des pertes attribuables aux transactions. Des experiences realisees avec 35 titres canadiens sont decrites.

8 citations


Journal Article
TL;DR: Experimental results on nuclear plant data demonstrate the advantages of the proposed approach with respect to classification of signals but also their interpretation for nuclear plant monitoring.
Abstract: In this paper we are concerned with the application of learning algorithms to the classification of reactor states in nuclear plants. Two aspects must be considered: (1) some types events (e.g., abnormal or rare) will not appear in the data set, but the system should be able to detect them, (2) not only classification of signals but also their interpretation are important for nuclear plant monitoring. We address both issues with a mixture of mixtures of Gaussians in which some parameters are shared to reflect the similar signals observed in different states of the reactor. An EM algorithm for these shared Gaussian mixtures is presented. Experimental results on nuclear plant data demonstrate the advantages of the proposed approach with respect to the above two points.

1 citations