Script based text identification: a multi-level architecture

doi:10.1145/2034617.2034630

Proceedings ArticleDOI

Script based text identification: a multi-level architecture

Ehtesham Hassan, +3 more

- pp 11

Chats0

TLDR

The proposed framework presents a top-down approach by performing page, block/paragraph and word level script identification in multiple stages by utilizing texture and shape based information embedded in the documents at different levels for feature extraction.

Abstract:

Script identification in a multi-lingual document environment has numerous applications in the field of document image analysis, such as indexing and retrieval or as an initial step towards optical character recognition. In this paper, we propose a novel hierarchical framework for script identification in bi-lingual documents. The framework presents a top-down approach by performing page, block/paragraph and word level script identification in multiple stages. We utilize texture and shape based information embedded in the documents at different levels for feature extraction. The prediction task at different levels of hierarchy is performed by Support Vector Machine (SVM) and Rejection based classifier defined using AdaBoost. Experimental evaluation of the proposed concept on document collections of Hindi/English and Bangla/English scripts have shown promising results.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Offline Script Identification from multilingual Indic-script documents: A state-of-the-art

Pawan Kumar Singh, +2 more

- 01 Feb 2015 -

Computer Science Review

TL;DR: Various feature extraction and classification techniques associated with the OSI of the Indic scripts are discussed in this survey and it is hoped that this survey will serve as a compendium not only for researchers in India, but also for policymakers and practitioners in India.

...read moreread less

Journal ArticleDOI

Benchmark databases of handwritten Bangla-Roman and Devanagari-Roman mixed-script document images

Pawan Kumar Singh, +5 more

- 01 Apr 2018 -

Multimedia Tools and Applications

TL;DR: This paper addresses three key challenges here: collection, compilation and organization of benchmark databases of images of 150 Bangla-Roman and 150 Devanagari-Roman mixed-script handwritten document pages respectively, and development of a bi-script and tri-script word-level script identification module using Modified log-Gabor filter as feature extractor.

...read moreread less

Journal ArticleDOI

Hindi Text Document Classification System Using SVM and Fuzzy: A Survey

Shalini Puri, +1 more

TL;DR: A new idea of Hindi printed and handwritten document classification system using support vector machine and fuzzy logic first pre-processes and then classifies textual imaged documents into predefined categories.

...read moreread less

Journal ArticleDOI

A Hybrid Hindi Printed Document Classification System Using SVM and Fuzzy: An Advancement

Shalini Puri, +1 more

- 01 Oct 2019 -

Journal of Information Technology Resear...

TL;DR: A new advanced tri-layered segmentation and bi-leveled-classifier-based Hindi printed document classification system, which categorizes imaged documents into pre-defined mutually exclusive categories by using SVM and Fuzzy matching at character and document classifications, respectively.

...read moreread less

Journal ArticleDOI

Advanced Applications on Bilingual Document Analysis and Processing Systems

Shalini Puri, +1 more

- 01 Oct 2020 -

International Journal of Applied Metaheu...

TL;DR: A journey of bilingual NLP and image-based document classification systems is discussed and an overview of their methods, feature extraction techniques, document sets, classifiers, and accuracy for English-Hindi and other language pairs is provided.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Robust Real-Time Face Detection

Paul A. Viola, +1 more

- 01 May 2004 -

International Journal of Computer Vision

TL;DR: In this paper, a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates is described. But the detection performance is limited to 15 frames per second.

...read moreread less

Proceedings ArticleDOI

Robust real-time face detection

Paul A. Viola, +1 more

TL;DR: A new image representation called the “Integral Image” is introduced which allows the features used by the detector to be computed very quickly and a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions.

...read moreread less

BookDOI

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Bernhard Schölkopf, +1 more

TL;DR: Learning with Kernels provides an introduction to SVMs and related kernel methods that provide all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms.

...read moreread less

Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods

John Platt

Journal ArticleDOI

Rotation invariant texture features and their use in automatic script identification

Tieniu Tan

- 01 Jul 1998 -

IEEE Transactions on Pattern Analysis an...

TL;DR: Rotation invariant texture features are computed based on an extension of the popular multi-channel Gabor filtering technique, and their effectiveness is tested with 300 randomly rotated samples of 15 Brodatz textures to solve a practical but hitherto mostly overlooked problem in document image processing.

...read moreread less

Script based text identification: a multi-level architecture

Citations

Offline Script Identification from multilingual Indic-script documents: A state-of-the-art

Benchmark databases of handwritten Bangla-Roman and Devanagari-Roman mixed-script document images

Hindi Text Document Classification System Using SVM and Fuzzy: A Survey

A Hybrid Hindi Printed Document Classification System Using SVM and Fuzzy: An Advancement

Advanced Applications on Bilingual Document Analysis and Processing Systems

References

Robust Real-Time Face Detection

Robust real-time face detection

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods

Rotation invariant texture features and their use in automatic script identification

Related Papers (5)

SVM Based Scheme for Thai and English Script Identification

Composite Script Identification and Orientation Detection for Indian Text Images

Zone-based structural feature extraction for script identification from Indian documents

A study on word-level multi-script identification from video frames

Script recognition in images with complex backgrounds