scispace - formally typeset
Open AccessBook ChapterDOI

Script identification from indian documents

Reads0
Chats0
TLDR
This paper presents a scheme to identify different Indian scripts from a document image which employs hierarchical classification which uses features consistent with human perception and achieves an overall classification accuracy of 97.11% on a large testing data set.
Abstract
Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific OCR in a multilingual environment. In this paper, we present a scheme to identify different Indian scripts from a document image. This scheme employs hierarchical classification which uses features consistent with human perception. Such features are extracted from the responses of a multi-channel log-Gabor filter bank, designed at an optimal scale and multiple orientations. In the first stage, the classifier groups the scripts into five major classes using global features. At the next stage, a sub-classification is performed based on script-specific features. All features are extracted globally from a given text block which does not require any complex and reliable segmentation of the document image into lines and characters. Thus the proposed scheme is efficient and can be used for many practical applications which require processing large volumes of data. The scheme has been tested on 10 Indian scripts and found to be robust to skew generated in the process of scanning and relatively insensitive to change in font size. This proposed system achieves an overall classification accuracy of 97.11% on a large testing data set. These results serve to establish the utility of global approach to classification of scripts.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Script Recognition—A Review

TL;DR: An overview of the different script identification methodologies under each of the two broad categories-structure-based and visual-appearance-based techniques is given.
Journal ArticleDOI

Word level multi-script identification

TL;DR: The combination of Gabor features with nearest neighbor or SVM classifier shows promising results; i.e., over 98% for bi-script and tri-script cases and above 89% for the eleven-script scenario.
Journal ArticleDOI

Script identification in the wild via discriminative convolutional neural network

TL;DR: The proposed DiscCNN achieves state-of-the-art performances on scene, video and document scripts as well, not requiring any preprocess like binarization, segmentation or hand-crafted features.
Journal ArticleDOI

A novel framework for automatic sorting of postal documents with multi-script address blocks

TL;DR: A novel quad-tree based image partitioning technique is developed in this work for effective feature extraction from the numeric digit patterns of the postal codes written in any of the four popular scripts.
Proceedings ArticleDOI

Automatic script identification in the wild

TL;DR: In this paper, a large-scale dataset with a great quantity of natural images and 10 types of widely-used languages is constructed and released, and a deep learning based algorithm is proposed.
References
More filters
Journal ArticleDOI

Feature Detection in Human Vision: A Phase-Dependent Energy Model

TL;DR: A simple and biologically plausible model of how mammalian visual systems could detect and identify features in an image is presented and it is suggested that the points in a waveform that have unique perceptual significance as ‘lines’ and ‘edges’ are the points where the Fourier components of the waveform come into phase with each other.
Journal ArticleDOI

Rotation invariant texture features and their use in automatic script identification

TL;DR: Rotation invariant texture features are computed based on an extension of the popular multi-channel Gabor filtering technique, and their effectiveness is tested with 300 randomly rotated samples of 15 Brodatz textures to solve a practical but hitherto mostly overlooked problem in document image processing.
Related Papers (5)
Trending Questions (1)
How to see what scripts a Fivem server uses?

These results serve to establish the utility of global approach to classification of scripts.