Script based text identification: a multi-level architecture

doi:10.1145/2034617.2034630

Proceedings ArticleDOI

Script based text identification: a multi-level architecture

Ehtesham Hassan, +3 more

- pp 11

Chats0

TLDR

The proposed framework presents a top-down approach by performing page, block/paragraph and word level script identification in multiple stages by utilizing texture and shape based information embedded in the documents at different levels for feature extraction.

Abstract:

Script identification in a multi-lingual document environment has numerous applications in the field of document image analysis, such as indexing and retrieval or as an initial step towards optical character recognition. In this paper, we propose a novel hierarchical framework for script identification in bi-lingual documents. The framework presents a top-down approach by performing page, block/paragraph and word level script identification in multiple stages. We utilize texture and shape based information embedded in the documents at different levels for feature extraction. The prediction task at different levels of hierarchy is performed by Support Vector Machine (SVM) and Rejection based classifier defined using AdaBoost. Experimental evaluation of the proposed concept on document collections of Hindi/English and Bangla/English scripts have shown promising results.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A Fuzzy Matching based Image Classification System for Printed and Handwritten Text Documents

Shalini Puri, +1 more

- 01 Apr 2020 -

Journal of Information Technology Resear...

TL;DR: This article proposes a system that performs better than existing systems, and shows the results of experiments on this and other proposed systems.

...read moreread less

Proceedings ArticleDOI

Text recognition in bilingual machine printed image documents — Challenges and survey: A review on principal and crucial concerns of text extraction in bilingual printed images

Shalini Puri, +1 more

TL;DR: A survey is presented to focus on the challenges and complex issues of text recognition in bilingual machine printed imaged documents, and proposed solutions along with constraints and errors found during text processing are presented.

...read moreread less

Book ChapterDOI

Toward Recognition and Classification of Hindi Handwritten Document Image

Shalini Puri, +1 more

TL;DR: A new idea of offline Hindi handwritten document classification is proposed, which first recognizes and classifies the character images, and then classifying the document image into the predefined category, putting a step ahead in the direction of automatic document image classification.

...read moreread less

Book ChapterDOI

Identification of Devanagari Script from Bilingual Printed Text Documents

Ranjana S. Zinjore, +1 more

TL;DR: This work has developed a methodology that applies projection profile in line segmentation which is followed by twofold word segmentation for identification of Devanagari (Marathi) script from printed bilingual text document.

...read moreread less

Book ChapterDOI

Domain Areas: Where are These Relevant?

Steven J. Simske

References

PDF

Open Access

More filters

Journal ArticleDOI

Robust Real-Time Face Detection

Paul A. Viola, +1 more

- 01 May 2004 -

International Journal of Computer Vision

TL;DR: In this paper, a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates is described. But the detection performance is limited to 15 frames per second.

...read moreread less

Proceedings ArticleDOI

Robust real-time face detection

Paul A. Viola, +1 more

TL;DR: A new image representation called the “Integral Image” is introduced which allows the features used by the detector to be computed very quickly and a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions.

...read moreread less

BookDOI

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Bernhard Schölkopf, +1 more

TL;DR: Learning with Kernels provides an introduction to SVMs and related kernel methods that provide all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms.

...read moreread less

Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods

John Platt

Journal ArticleDOI

Rotation invariant texture features and their use in automatic script identification

Tieniu Tan

- 01 Jul 1998 -

IEEE Transactions on Pattern Analysis an...

TL;DR: Rotation invariant texture features are computed based on an extension of the popular multi-channel Gabor filtering technique, and their effectiveness is tested with 300 randomly rotated samples of 15 Brodatz textures to solve a practical but hitherto mostly overlooked problem in document image processing.

...read moreread less

Script based text identification: a multi-level architecture

Citations

A Fuzzy Matching based Image Classification System for Printed and Handwritten Text Documents

Text recognition in bilingual machine printed image documents — Challenges and survey: A review on principal and crucial concerns of text extraction in bilingual printed images

Toward Recognition and Classification of Hindi Handwritten Document Image

Identification of Devanagari Script from Bilingual Printed Text Documents

Domain Areas: Where are These Relevant?

References

Robust Real-Time Face Detection

Robust real-time face detection

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods

Rotation invariant texture features and their use in automatic script identification

Related Papers (5)

SVM Based Scheme for Thai and English Script Identification

Composite Script Identification and Orientation Detection for Indian Text Images

Zone-based structural feature extraction for script identification from Indian documents

A study on word-level multi-script identification from video frames

Script recognition in images with complex backgrounds