DocDescribor: Digits + Alphabets + Math Symbols - A Complete OCR for Handwritten Documents

doi:10.1007/978-981-15-8697-2_27

Book Chapter•DOI•

DocDescribor: Digits + Alphabets + Math Symbols - A Complete OCR for Handwritten Documents

Ridhi Aggarwal¹, Hiteshi Jain¹, Gaurav Harit¹, Anil Kumar Tiwari¹•Institutions (1)

Indian Institute of Technology, Jodhpur¹

22 Dec 2019-Vol. 1249, pp 292-301

TL;DR: In this paper, a Siamese-CNN network is proposed to identify if two images in a pair contain similar or dissimilar characters, and then the network is used to recognize different characters by character matching where test images are compared to sample images of any target class.

read less

Abstract: This paper presents an Optical Character Recognition (OCR) system for documents with English text and mathematical expressions. Neural network architectures using CNN layers and/or dense layers achieve high level accuracy in character recognition task. However, these models require large amount of data to train the network, with balanced number of samples for each class. Recognition of mathematical symbols poses challenges of the imbalance and paucity of training data available. To address this issue, we pose the character recognition problem as a Distance Metric Learning problem. We propose a Siamese-CNN Network that learns discriminative features to identify if the two images in a pair contain similar or dissimilar characters. The network is then used to recognize different characters by character matching where test images are compared to sample images of any target class which may or may not be included during training. Thus our model can scale to new symbols easily. The proposed approach is invariant to author’s handwriting. Our model has been tested over images extracted from a dataset of scanned answer scripts collected by us. It is seen that our approach achieves comparable performance to other architectures using convolutional layers or dense layers while using lesser training data.

...read moreread less

DocDescribor: Digits + Alphabets + Math Symbols - A Complete OCR for Handwritten Documents

References

Related Papers (5)