DocDescribor: Digits + Alphabets + Math Symbols - A Complete OCR for Handwritten Documents

doi:10.1007/978-981-15-8697-2_27

Home
/
Papers
/
DocDescribor: Digits + Alphabets + Math Symbols - A Complete OCR for Handwritten Documents

Book Chapter•DOI•

DocDescribor: Digits + Alphabets + Math Symbols - A Complete OCR for Handwritten Documents

Ridhi Aggarwal¹, Hiteshi Jain¹, Gaurav Harit¹, Anil Kumar Tiwari¹•Institutions (1)

Indian Institute of Technology, Jodhpur¹

22 Dec 2019-Vol. 1249, pp 292-301

TL;DR: In this paper, a Siamese-CNN network is proposed to identify if two images in a pair contain similar or dissimilar characters, and then the network is used to recognize different characters by character matching where test images are compared to sample images of any target class.

read less

Abstract: This paper presents an Optical Character Recognition (OCR) system for documents with English text and mathematical expressions. Neural network architectures using CNN layers and/or dense layers achieve high level accuracy in character recognition task. However, these models require large amount of data to train the network, with balanced number of samples for each class. Recognition of mathematical symbols poses challenges of the imbalance and paucity of training data available. To address this issue, we pose the character recognition problem as a Distance Metric Learning problem. We propose a Siamese-CNN Network that learns discriminative features to identify if the two images in a pair contain similar or dissimilar characters. The network is then used to recognize different characters by character matching where test images are compared to sample images of any target class which may or may not be included during training. Thus our model can scale to new symbols easily. The proposed approach is invariant to author’s handwriting. Our model has been tested over images extracted from a dataset of scanned answer scripts collected by us. It is seen that our approach achieves comparable performance to other architectures using convolutional layers or dense layers while using lesser training data.

...read moreread less

References

PDF

Open Access

More filters

Journal Article•DOI•

Gradient-based learning applied to document recognition

[...]

Yann LeCun¹, Léon Bottou², Léon Bottou³, Yoshua Bengio³, Yoshua Bengio⁴, Yoshua Bengio⁵, Patrick Haffner³ - Show less +3 more•Institutions (5)

Bell Labs¹, École Normale Supérieure², AT&T³, Alcatel-Lucent⁴, École Polytechnique de Montréal⁵

01 Jan 1998

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

...read moreread less

42,067 citations

Proceedings Article•DOI•

Best practices for convolutional neural networks applied to visual document analysis

[...]

Patrice Y. Simard¹, David W. Steinkraus¹, John Platt¹•Institutions (1)

Microsoft¹

03 Aug 2003

TL;DR: A set of concrete bestpractices that document analysis researchers can use to get good results with neural networks, including a simple "do-it-yourself" implementation of convolution with a flexible architecture suitable for many visual document problems.

...read moreread less

Abstract: Neural networks are a powerful technology forclassification of visual inputs arising from documents.However, there is a confusing plethora of different neuralnetwork methods that are used in the literature and inindustry. This paper describes a set of concrete bestpractices that document analysis researchers can use toget good results with neural networks. The mostimportant practice is getting a training set as large aspossible: we expand the training set by adding a newform of distorted data. The next most important practiceis that convolutional neural networks are better suited forvisual document tasks than fully connected networks. Wepropose that a simple "do-it-yourself" implementation ofconvolution with a flexible architecture is suitable formany visual document problems. This simpleconvolutional neural network does not require complexmethods, such as momentum, weight decay, structure-dependentlearning rates, averaging layers, tangent prop,or even finely-tuning the architecture. The end result is avery simple yet general architecture which can yieldstate-of-the-art performance for document analysis. Weillustrate our claims on the MNIST set of English digitimages.

...read moreread less

2,783 citations

Posted Content•

High-Performance Neural Networks for Visual Object Classification

[...]

Dan Claudio Ciresan, Ueli Meier, Jonatan Masci, Luca Maria Gambardella, Jürgen Schmidhuber - Show less +1 more

01 Feb 2011-arXiv: Artificial Intelligence

TL;DR: A fast, fully parameterizable GPU implementation of Convolutional Neural Network variants and their feature extractors are neither carefully designed nor pre-wired, but rather learned in a supervised way.

...read moreread less

Abstract: We present a fast, fully parameterizable GPU implementation of Convolutional Neural Network variants. Our feature extractors are neither carefully designed nor pre-wired, but rather learned in a supervised way. Our deep hierarchical architectures achieve the best published results on benchmarks for object classification (NORB, CIFAR10) and handwritten digit recognition (MNIST), with error rates of 2.53%, 19.51%, 0.35%, respectively. Deep nets trained by simple back-propagation perform better than more shallow ones. Learning is surprisingly rapid. NORB is completely trained within five epochs. Test error rates on MNIST drop to 2.42%, 0.97% and 0.48% after 1, 3 and 17 epochs, respectively.

...read moreread less

275 citations

Proceedings Article•DOI•

A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks

[...]

Erik Marchi¹, Fabio Vesperini², Florian Eyben¹, Stefano Squartini², Björn Schuller³ - Show less +1 more•Institutions (3)

Technische Universität München¹, Marche Polytechnic University², University of Passau³

19 Apr 2015

TL;DR: This paper presents a novel unsupervised approach based on a denoising autoencoder which significantly outperforms existing methods by achieving up to 93.4% F-Measure.

...read moreread less

Abstract: Acoustic novelty detection aims at identifying abnormal/novel acoustic signals which differ from the reference/normal data that the system was trained with. In this paper we present a novel unsupervised approach based on a denoising autoencoder. In our approach auditory spectral features are processed by a denoising autoencoder with bidirectional Long Short-Term Memory recurrent neural networks. We use the reconstruction error between the input and the output of the autoencoder as activation signal to detect novel events. The autoencoder is trained on a public database which contains recordings of typical in-home situations such as talking, watching television, playing and eating. The evaluation was performed on more than 260 different abnormal events. We compare results with state-of-theart methods and we conclude that our novel approach significantly outperforms existing methods by achieving up to 93.4% F-Measure.

...read moreread less

210 citations

Proceedings Article•DOI•

INFTY: an integrated OCR system for mathematical documents

[...]

Masakazu Suzuki¹, Fumikazu Tamari², Ryoji Fukuda³, Seiichi Uchida¹, Toshihiro Kanahori - Show less +1 more•Institutions (3)

Kyushu University¹, Fukuoka University of Education², Oita University³

20 Nov 2003

TL;DR: An integrated OCR system for mathematical documents, called INFTY, is presented, which shows high character recognition rates on both mathematical expressions and ordinary texts, and sufficient performance on the structure analysis of the mathematical expressions.

...read moreread less

Abstract: An integrated OCR system for mathematical documents, called INFTY, is presented. INFTY consists of four procedures, i.e., layout analysis, character recognition, structure analysis of mathematical expressions, and manual error correction. In those procedures, several novel techniques are utilized for better recognition performance. Experimental results on about 500 pages of mathematical documents showed high character recognition rates on both mathematical expressions and ordinary texts, and sufficient performance on the structure analysis of the mathematical expressions.

...read moreread less

182 citations