GradientBased Learning Applied to Document Recognition

doi:10.1109/9780470544976.CH9

Book ChapterDOI

GradientBased Learning Applied to Document Recognition

- pp 306-351

TLDR

Various methods applied to handwritten character recognition are reviewed and compared and Convolutional Neural Networks, that are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques.

Abstract:

Multilayer Neural Networks trained with the backpropagation algorithm constitute the best example of a successful Gradient-Based Learning technique. Given an appropriate network architecture, Gradient-Based Learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional Neural Networks, that are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation, recognition, and language modeling. A new learning paradigm, called Graph Transformer Networks (GTN), allows such multi-module systems to be trained globally using Gradient-Based methods so as to minimize an overall performance measure. Two systems for on-line handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of Graph Transformer Networks. A Graph Transformer Network for reading bank check is also described. It uses Convolutional Neural Network character recognizers combined with global training techniques to provides record accuracy on business and personal checks. It is deployed commercially and reads several million checks per day.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

A Novel Representation of Parts for Accurate 3D Object Detection and Tracking in Monocular Images

Alberto Crivellaro, +5 more

TL;DR: This work presents a method that estimates in real-time and under challenging conditions the 3D pose of a known object in the form of the 2D projections of a few control points, suitable for practical Augmented Reality applications even in industrial environments.

...read moreread less

Posted Content

Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for Pixelwise Classification

Hongsheng Li, +2 more

- 15 Dec 2014 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: The proposed algorithms eliminate all the redundant computation in convolution and pooling on images by introducing novel d-regularly sparse kernels, which generates exactly the same results as those by patch-by-patch scanning.

...read moreread less

Proceedings ArticleDOI

Ontological supervision for fine grained classification of Street View storefronts

Yair Movshovitz-Attias, +5 more

TL;DR: This work utilizes an ontology of geographical concepts to automatically propagate business category information and create a large, multi label, training dataset for fine grained storefront classification and achieves human level accuracy.

...read moreread less

Journal ArticleDOI

Exploring Deep Learning for View-Based 3D Model Retrieval

Zan Gao, +2 more

- 17 Feb 2020 -

ACM Transactions on Multimedia Computing...

TL;DR: This work systematically evaluates the performance of deep learning features in view-based 3D model retrieval on four popular datasets (ETH, NTU60, PSB, and MVRED) by different kinds of similarity measure methods, and it is clear that theseDeep learning features can consistently outperform all of the hand-crafted features, and they are also more robust than the Handcrafted features when different degrees of noise are added into the image.

...read moreread less

Posted Content

Unsupervised Learning of Spatiotemporally Coherent Metrics

Ross Goroshin, +4 more

- 18 Dec 2014 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this paper, feature learning from unlabeled video data, using the assumption that adjacent video frames contain semantically similar information, is exploited to train a convolutional pooling auto-encoder regularized by slowness and sparsity.

...read moreread less

Collapse

GradientBased Learning Applied to Document Recognition

Citations

A Novel Representation of Parts for Accurate 3D Object Detection and Tracking in Monocular Images

Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for Pixelwise Classification

Ontological supervision for fine grained classification of Street View storefronts

Exploring Deep Learning for View-Based 3D Model Retrieval

Unsupervised Learning of Spatiotemporally Coherent Metrics

Related Papers (5)

Gradient-based learning applied to document recognition

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

ImageNet: A large-scale hierarchical image database