scispace - formally typeset
Search or ask a question
Author

Ritu Garg

Other affiliations: Indian Institutes of Technology
Bio: Ritu Garg is an academic researcher from Indian Institute of Technology Delhi. The author has contributed to research in topics: Graphics & Document layout analysis. The author has an hindex of 5, co-authored 14 publications receiving 61 citations. Previous affiliations of Ritu Garg include Indian Institutes of Technology.

Papers
More filters
Proceedings ArticleDOI
18 Sep 2011
TL;DR: A novel framework for segmentation of documents with complex layouts performed by combination of clustering and conditional random fields (CRF) based modeling and has been extensively tested on multi-colored document images with text overlapping graphics/image.
Abstract: In this paper, we propose a novel framework for segmentation of documents with complex layouts. The document segmentation is performed by combination of clustering and conditional random fields (CRF) based modeling. The bottom-up approach for segmentation assigns each pixel to a cluster plane based on color intensity. A CRF based discriminative model is learned to extract the local neighborhood information in different cluster/color planes. The final category assignment is done by a top-level CRF based on the semantic correlation learned across clusters. The proposed framework has been extensively tested on multi-colored document images with text overlapping graphics/image.

12 citations

Proceedings ArticleDOI
17 Sep 2011
TL;DR: The proposed framework presents a top-down approach by performing page, block/paragraph and word level script identification in multiple stages by utilizing texture and shape based information embedded in the documents at different levels for feature extraction.
Abstract: Script identification in a multi-lingual document environment has numerous applications in the field of document image analysis, such as indexing and retrieval or as an initial step towards optical character recognition. In this paper, we propose a novel hierarchical framework for script identification in bi-lingual documents. The framework presents a top-down approach by performing page, block/paragraph and word level script identification in multiple stages. We utilize texture and shape based information embedded in the documents at different levels for feature extraction. The prediction task at different levels of hierarchy is performed by Support Vector Machine (SVM) and Rejection based classifier defined using AdaBoost. Experimental evaluation of the proposed concept on document collections of Hindi/English and Bangla/English scripts have shown promising results.

12 citations

Proceedings ArticleDOI
11 Apr 2016
TL;DR: A novel framework for automatic selection of optimal parameters for pre-processing algorithm by estimating the quality of the document image and compute parameters to maximize the expected recognition accuracy found in E-step.
Abstract: Performance of most of the recognition engines for document images is effected by quality of the image being processed and the selection of parameter values for the pre-processing algorithm. Usually the choice of such parameters is done empirically. In this paper, we propose a novel framework for automatic selection of optimal parameters for pre-processing algorithm by estimating the quality of the document image. Recognition accuracy can be used as a metric for document quality assessment. We learn filters that capture the script properties and degradation to predict recognition accuracy. An EM based framework has been formulated to iteratively learn optimal parameters for document image pre-processing. In the E-step, we estimate the expected accuracy using the current set of parameters and filters. In the M-step we compute parameters to maximize the expected recognition accuracy found in E-step. The experiments validate the efficacy of the proposed methodology for document image pre-processing applications.

6 citations

Proceedings ArticleDOI
15 Dec 2011
TL;DR: Multi-Objective Genetic Algorithm is used to maximize the camera coverage with optimum illumination of the sensing space and this paper outlines the camera and light source location optimization problem with multiple objective functions.
Abstract: Optimal placement of visual sensors along with good lighting conditions is indispensable for the successful execution of surveillance applications. Limited field-of-view, depth-of-field, occlusion due to presence of different objects in the scene form the major constraints for visual sensor placement. While over/under exposed objects, shadowing and light rays directly incident on the camera lens are some of the constraints for light source placement. Because of the nature of the constraints and complexity of the problem, the placement problem is considered to be a multi-objective global optimization problem. The paper outlines the camera and light source location optimization problem with multiple objective functions. Multi-Objective Genetic Algorithm is used to maximize the camera coverage with optimum illumination of the sensing space.

6 citations

Proceedings ArticleDOI
05 Mar 2007
TL;DR: An integrated scheme for document image compression is presented which preserves the layout structure, and still allows the display of textual portions to adapt to the user preferences and screen area, and derives an SVG representation of the complete document image.
Abstract: We present an integrated scheme for document image compression which preserves the layout structure, and still allows the display of textual portions to adapt to the user preferences and screen area. We encode the layout structure of the document images in an XML representation. The textual components and picture components are compressed separately into different representations. We derive an SVG (scalable vector graphics) representation of the complete document image. Compression is achieved since the word-images are encoded using specifications for geometric primitives that compose a word. A document rendered from its SVG representation can be adapted for display and interactive access through common browsers on desktop as well as mobile devices. We demonstrate the effectiveness of the proposed scheme for document access

5 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This work proposes a new active learning method for classification, which handles label noise without relying on multiple oracles (i.e., crowdsourcing), and proposes a strategy that selects (for labeling) instances with a high influence on the learned model.
Abstract: We propose a new active learning method for classification, which handles label noise without relying on multiple oracles (i.e., crowdsourcing). We propose a strategy that selects (for labeling) instances with a high influence on the learned model. An instance x is said to have a high influence on the model h, if training h on x (with label $$y = h(x)$$ ) would result in a model that greatly disagrees with h on labeling other instances. Then, we propose another strategy that selects (for labeling) instances that are highly influenced by changes in the learned model. An instance x is said to be highly influenced, if training h with a set of instances would result in a committee of models that agree on a common label for x but disagree with h(x). We compare the two strategies and we show, on different publicly available datasets, that selecting instances according to the first strategy while eliminating noisy labels according to the second strategy, greatly improves the accuracy compared to several benchmarking methods, even when a significant amount of instances are mislabeled.

86 citations

Journal ArticleDOI
TL;DR: This survey highlights the variety of the approaches that have been proposed for document image segmentation since 2008 and provides a clear typology of documents and of document images segmentation algorithms.

84 citations

Journal ArticleDOI
TL;DR: Various feature extraction and classification techniques associated with the OSI of the Indic scripts are discussed in this survey and it is hoped that this survey will serve as a compendium not only for researchers in India, but also for policymakers and practitioners in India.

42 citations

Journal ArticleDOI
TL;DR: A survey of the past researches on character based as keyword based approaches used for retrieving information from document images to provide insights into the strengths and weaknesses of current techniques and the guidance in choosing the area that future work on document image retrieval could address.
Abstract: This paper attempts to provide a survey of the past researches on character based as keyword based approaches used for retrieving information from document images. This survey also provides insights into the strengths and weaknesses of current techniques, relevancy lies between each technique and also the guidance in choosing the area that future work on document image retrieval could address.

39 citations

Journal ArticleDOI
TL;DR: This paper addresses three key challenges here: collection, compilation and organization of benchmark databases of images of 150 Bangla-Roman and 150 Devanagari-Roman mixed-script handwritten document pages respectively, and development of a bi-script and tri-script word-level script identification module using Modified log-Gabor filter as feature extractor.
Abstract: Handwritten document image dataset is one of the basic necessities to conduct research on developing Optical Character Recognition (OCR) systems. In a multilingual country like India, handwritten documents often contain more than one script, leading to complex pattern analysis problems. In this paper, we highlight two such situations where Devanagari and Bangla scripts, two most widely used scripts in Indian sub-continent, are individually used along with Roman script in documents. We address three key challenges here: 1) collection, compilation and organization of benchmark databases of images of 150 Bangla-Roman and 150 Devanagari-Roman mixed-script handwritten document pages respectively, 2) script-level annotation of 18931 Bangla words, 15528 Devanagari words and 10331 Roman words in those 300 document pages, and 3) development of a bi-script and tri-script word-level script identification module using Modified log-Gabor filter as feature extractor. The technique is statistically validated using multiple classifiers and it is found that Multi-Layer Perceptron (MLP) classifier performs the best. Average word-level script identification accuracies of 92.32%, 95.30% and 93.78% are achieved using 3-fold cross validation for Bangla-Roman, Devanagari-Roman and Bangla-Devanagari-Roman databases respectively. Both the mixed-script document databases along with the script-level annotations and 44790 extracted word images of the three aforementioned scripts are available freely at https://code.google.com/p/cmaterdb/ .

27 citations