scispace - formally typeset
Proceedings ArticleDOI

Script based text identification: a multi-level architecture

Reads0
Chats0
TLDR
The proposed framework presents a top-down approach by performing page, block/paragraph and word level script identification in multiple stages by utilizing texture and shape based information embedded in the documents at different levels for feature extraction.
Abstract
Script identification in a multi-lingual document environment has numerous applications in the field of document image analysis, such as indexing and retrieval or as an initial step towards optical character recognition. In this paper, we propose a novel hierarchical framework for script identification in bi-lingual documents. The framework presents a top-down approach by performing page, block/paragraph and word level script identification in multiple stages. We utilize texture and shape based information embedded in the documents at different levels for feature extraction. The prediction task at different levels of hierarchy is performed by Support Vector Machine (SVM) and Rejection based classifier defined using AdaBoost. Experimental evaluation of the proposed concept on document collections of Hindi/English and Bangla/English scripts have shown promising results.

read more

Citations
More filters
Journal ArticleDOI

A Fuzzy Matching based Image Classification System for Printed and Handwritten Text Documents

TL;DR: This article proposes a system that performs better than existing systems, and shows the results of experiments on this and other proposed systems.
Proceedings ArticleDOI

Text recognition in bilingual machine printed image documents — Challenges and survey: A review on principal and crucial concerns of text extraction in bilingual printed images

TL;DR: A survey is presented to focus on the challenges and complex issues of text recognition in bilingual machine printed imaged documents, and proposed solutions along with constraints and errors found during text processing are presented.
Book ChapterDOI

Toward Recognition and Classification of Hindi Handwritten Document Image

TL;DR: A new idea of offline Hindi handwritten document classification is proposed, which first recognizes and classifies the character images, and then classifying the document image into the predefined category, putting a step ahead in the direction of automatic document image classification.
Book ChapterDOI

Identification of Devanagari Script from Bilingual Printed Text Documents

TL;DR: This work has developed a methodology that applies projection profile in line segmentation which is followed by twofold word segmentation for identification of Devanagari (Marathi) script from printed bilingual text document.
References
More filters
Journal ArticleDOI

Robust Real-Time Face Detection

TL;DR: In this paper, a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates is described. But the detection performance is limited to 15 frames per second.
Proceedings ArticleDOI

Robust real-time face detection

TL;DR: A new image representation called the “Integral Image” is introduced which allows the features used by the detector to be computed very quickly and a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions.
BookDOI

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

TL;DR: Learning with Kernels provides an introduction to SVMs and related kernel methods that provide all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms.
Journal ArticleDOI

Rotation invariant texture features and their use in automatic script identification

TL;DR: Rotation invariant texture features are computed based on an extension of the popular multi-channel Gabor filtering technique, and their effectiveness is tested with 300 randomly rotated samples of 15 Brodatz textures to solve a practical but hitherto mostly overlooked problem in document image processing.
Related Papers (5)