Proceedings ArticleDOI
Script based text identification: a multi-level architecture
Ehtesham Hassan,Ritu Garg,Santanu Chaudhury,M. Gopal +3 more
- pp 11
Reads0
Chats0
TLDR
The proposed framework presents a top-down approach by performing page, block/paragraph and word level script identification in multiple stages by utilizing texture and shape based information embedded in the documents at different levels for feature extraction.Abstract:
Script identification in a multi-lingual document environment has numerous applications in the field of document image analysis, such as indexing and retrieval or as an initial step towards optical character recognition. In this paper, we propose a novel hierarchical framework for script identification in bi-lingual documents. The framework presents a top-down approach by performing page, block/paragraph and word level script identification in multiple stages. We utilize texture and shape based information embedded in the documents at different levels for feature extraction. The prediction task at different levels of hierarchy is performed by Support Vector Machine (SVM) and Rejection based classifier defined using AdaBoost. Experimental evaluation of the proposed concept on document collections of Hindi/English and Bangla/English scripts have shown promising results.read more
Citations
More filters
Journal ArticleDOI
Offline Script Identification from multilingual Indic-script documents: A state-of-the-art
TL;DR: Various feature extraction and classification techniques associated with the OSI of the Indic scripts are discussed in this survey and it is hoped that this survey will serve as a compendium not only for researchers in India, but also for policymakers and practitioners in India.
Journal ArticleDOI
Benchmark databases of handwritten Bangla-Roman and Devanagari-Roman mixed-script document images
TL;DR: This paper addresses three key challenges here: collection, compilation and organization of benchmark databases of images of 150 Bangla-Roman and 150 Devanagari-Roman mixed-script handwritten document pages respectively, and development of a bi-script and tri-script word-level script identification module using Modified log-Gabor filter as feature extractor.
Journal ArticleDOI
Hindi Text Document Classification System Using SVM and Fuzzy: A Survey
Shalini Puri,Satya Prakash Singh +1 more
TL;DR: A new idea of Hindi printed and handwritten document classification system using support vector machine and fuzzy logic first pre-processes and then classifies textual imaged documents into predefined categories.
Journal ArticleDOI
A Hybrid Hindi Printed Document Classification System Using SVM and Fuzzy: An Advancement
Shalini Puri,Satya Prakash Singh +1 more
TL;DR: A new advanced tri-layered segmentation and bi-leveled-classifier-based Hindi printed document classification system, which categorizes imaged documents into pre-defined mutually exclusive categories by using SVM and Fuzzy matching at character and document classifications, respectively.
Journal ArticleDOI
Advanced Applications on Bilingual Document Analysis and Processing Systems
Shalini Puri,Satya Prakash Singh +1 more
TL;DR: A journey of bilingual NLP and image-based document classification systems is discussed and an overview of their methods, feature extraction techniques, document sets, classifiers, and accuracy for English-Hindi and other language pairs is provided.
References
More filters
Proceedings ArticleDOI
Document Image Retrieval Using Feature Combination in Kernel Space
TL;DR: A novel framework to perform Multiple Kernel Learning for indexing using the Kernel based Distance Based Hashing is proposed and the Genetic Algorithm based framework is used for optimization.