scispace - formally typeset
Proceedings ArticleDOI

Curvature feature distribution based classification of Indian scripts from document images

Reads0
Chats0
TLDR
A framework for classification of text document images based on their script and uses edge direction based features to capture the distribution of curvature and a recently proposed feature selection algorithm to obtain the most discriminating curvature features.
Abstract
We present a framework for classification of text document images based on their script. We deal with the domain of Indian scripts which has high inter script similarities. Indian scripts have characteristic curvature distributions which help in visual discrimination of scripts. We use edge direction based features to capture the distribution of curvature. We also use a recently proposed feature selection algorithm to obtain the most discriminating curvature features. We form hierarchy (automatically) based on statistical distances between the script models. Hierarchy allows us to group similar scripts at one level and then focus on the classification between the similar scripts at the next level leading to improvement in accuracy. We show experiments and results on a large set of about 3400 images.

read more

Citations
More filters
Proceedings ArticleDOI

A CRF Based Scheme for Overlapping Multi-colored Text Graphics Separation

TL;DR: A novel framework for segmentation of documents with complex layouts performed by combination of clustering and conditional random fields (CRF) based modeling and has been extensively tested on multi-colored document images with text overlapping graphics/image.
Proceedings ArticleDOI

Script based text identification: a multi-level architecture

TL;DR: The proposed framework presents a top-down approach by performing page, block/paragraph and word level script identification in multiple stages by utilizing texture and shape based information embedded in the documents at different levels for feature extraction.
Proceedings ArticleDOI

Content directed enhancement of degraded document images

TL;DR: This paper presents a novel framework that learns optimal parameters, depending on the nature of the document image content for binarization and text/graphics segmentation, using EM algorithm.
Proceedings ArticleDOI

Text graphic separation in Indian newspapers

TL;DR: A novel framework for learning optimal parameters for text graphic separation in the presence of complex layouts of Indian newspaper is proposed.
References
More filters
Journal ArticleDOI

On image classification: city images vs. landscapes

TL;DR: A procedure to qualitatively measure the saliency of a feature towards a classification problem based on the plot of the intra-class and inter-class distance distributions and determines that the edge direction-based features have the most discriminative power for the classification problem of interest here.
Journal ArticleDOI

Rotation invariant texture features and their use in automatic script identification

TL;DR: Rotation invariant texture features are computed based on an extension of the popular multi-channel Gabor filtering technique, and their effectiveness is tested with 300 randomly rotated samples of 15 Brodatz textures to solve a practical but hitherto mostly overlooked problem in document image processing.
Journal ArticleDOI

Determination of the script and language content of document images

TL;DR: This work has developed techniques for distinguishing which language is represented in an image of text using a technique based on character shape codes, a representation of Latin text that is inexpensive to compute.
Journal ArticleDOI

Automatic script identification from document images using cluster-based templates

TL;DR: An automated script identification system for typeset document images that processes thirteen scripts with minimal preprocessing and high accuracy.
Proceedings ArticleDOI

Multi-script line identification from Indian documents

TL;DR: An automatic scheme is presented to identify text lines of different Indian scripts from a document with an overall accuracy of about 97.52% based on water reservoir principle, contour tracing, profileetc.