scispace - formally typeset
Search or ask a question

Showing papers by "Santanu Chaudhury published in 2009"


Proceedings ArticleDOI
20 Oct 2009
TL;DR: The practical problem of optimally placing the multiple PTZ cameras to ensure maximum coverage of user defined priority areas with optimum values of parameters like pan, tilt, zoom and the locations of the cameras is addressed.
Abstract: Visual sensor network design facilitates applications such as intelligent rooms, video surveillance, automatic multi-camera tracking, activity recognition etc. These applications require an efficient visual sensor layout which provides a minimum level of image quality or image resolution. This paper addresses the practical problem of optimally placing the multiple PTZ cameras to ensure maximum coverage of user defined priority areas with optimum values of parameters like pan, tilt, zoom and the locations of the cameras. The proposed algorithm works offline and does not require camera calibration. We mapped this problem as an optimization problem using Genetic Algorithm, by defining, coverage matrix as a set of sensor parameters and the space model parameters like priority areas, obstacles and feasible locations of the sensors, and by modelling discrete spaces using probabilistic frame work. We minimized the probability of occlusion due to randomly moving objects by covering each priority area using multiple cameras. The proposed method will be applicable for surveillance of large spaces with discrete priority areas like a hall with more than one entrance or many events happening at different locations in a hall eg.Casino. As we are optimizing the parameters like pan, tilt, zoom and even the locations of the cameras, the coverage provided by this approach will assure good resolution, which improves the QOS of the visual sensor network.

51 citations


Proceedings ArticleDOI
01 Dec 2009
TL;DR: An efficient skin region segmentation methodology using low complexity fuzzy decision tree constructed over B, G, R colour space is proposed for various face and human detection applications for embedded platforms.
Abstract: We propose an efficient skin region segmentation methodology using low complexity fuzzy decision tree constructed over B, G, R colour space. Skin and nonskin training dataset has been generated by using various skin textures obtained from face images of diversity of age, gender, and race people and nonskin pixels obtained from arbitrary thousands of random sampling of nonskin textures. Compact fuzzy model with very few numbers of rules allow to raster scan consumer photographs and classify each pixel as skin or nonskin for various face and human detection applications for embedded platforms.

50 citations


Patent
04 Sep 2009
TL;DR: In this article, a video compression framework based on parametric object and background compression is proposed, where an embodiment detects objects and segments frames into regions corresponding to the foreground object and the background.
Abstract: A video compression framework based on parametric object and background compression is proposed. At the encoder, an embodiment detects objects and segments frames into regions corresponding to the foreground object and the background. The object and the background are individually encoded using separate parametric coding techniques. While the object is encoded using the projection of coefficients to the orthonormal basis of the learnt subspace (used for appearance based object tracking), the background is characterized using an auto-regressive (AR) process model. An advantage of the proposed schemes is that the decoder structure allows for simultaneous reconstruction of object and background, thus making it amenable to the new multi-thread/multi-processor architectures.

27 citations


Book ChapterDOI
15 Dec 2009
TL;DR: A model-guided segmentation and document layout extraction scheme based on hierarchical Conditional Random Fields, motivated for an automated layout analyser and machine translator for technical papers, and can also be used for other applications such as search, indexing and information retrieval.
Abstract: We present a model-guided segmentation and document layout extraction scheme based on hierarchical Conditional Random Fields (CRFs, hereafter). Common methods to classify a pixel of a document image into classes - text, background and image - are often noisy, and error-prone, often requiring post-processing through heuristic methods. The input to the system is a pixel-wise classification based on the output of a Fisher classifier based on the output of a set of Globally Matched Wavelet (GMW) Filters. The system extracts features which encode contextual information and spatial configurations of a given document image, and learns relations between these layout entities using hierarchical CRFs. The hierarchical CRF enables learning at various levels - 1. local features for text, background and image areas; 2. contextual features for further classifying region blocks - title, author block, heading, paragraph, etc.; and 3. probabilistic layout model for encoding global relations between the above blocks for a particular class of documents. Although the work has been motivated for an automated layout analyser and machine translator for technical papers, it can also be used for other applications such as search, indexing and information retrieval.

17 citations


Proceedings ArticleDOI
26 Jul 2009
TL;DR: A novel shape descriptor based on shape context, which in combination with hierarchical distance based hashing is used for word and graphical pattern based document image indexing and retrieval and the applicability is demonstrated for classification of characters and symbols.
Abstract: In this paper we present a novel shape descriptor based on shape context, which in combination with hierarchical distance based hashing is used for word and graphical pattern based document image indexing and retrieval. The shape descriptor represents the relative arrangement of points sampled on the boundary of the shape of object. We also demonstrate the applicability of the novel shape descriptor for classification of characters and symbols. For indexing, we provide anew formulation for distance based hierarchical locality sensitive hashing. Experiments have yielded promising results.

16 citations


Book ChapterDOI
15 Dec 2009
TL;DR: A scheme based on an ontological framework, to recognize concepts in multimedia data, in order to provide effective content-based access to a closed, domain-specific multimedia collection to provide an effective video browsing interface to the user.
Abstract: In this paper, we propose a scheme based on an ontological framework, to recognize concepts in multimedia data, in order to provide effective content-based access to a closed, domain-specific multimedia collection. The ontology for the domain is constructed from high-level knowledge of the domain lying with the domain experts, and further fine-tuned and refined by learning from multimedia data annotated by them. MOWL, a multimedia extension to OWL, is used to encode the concept to media-feature associations in the ontology as well as the uncertainties linked with observation of the perceptual multimedia data. Media feature classifiers help recognize low-level concepts in the videos, but the novelty of our work lies in discovery of high-level concepts in video content using the power of ontological relations between the concepts. This framework is used to provide rich, conceptual annotations to the video database, which can further be used to create hyperlinks in the video collection, to provide an effective video browsing interface to the user.

7 citations


Journal ArticleDOI
TL;DR: The prevalence of mental disability was found higher among males than among females and among individuals with low socioeconomic status, and there is scope of community-based rehabilitation of the mentally disabled.
Abstract: Background: In the present era, mental disability is a major public health problem in the society. Many of the mental disabilities are correctable if detected early. Objectives: To assess the prevalence and pattern of mental disability. Materials and Methods: Community-based cross-sectional study. Patients of all age groups in the age range of 0-60 years were randomly selected from 10 blocks of 2 districts, viz., Ranchi and Hazaribagh. Thirty villages from each block were taken for the study. The study was conducted by making house-to-house visits, interviewing and examining all the individuals in the families selected using pre-tested questionnaire. Statistical Analysis: It was done by the proportions. Results and Conclusion: The prevalence of mental disability was found higher among males (67.9%) than among females (32.1%). The prevalence rate was higher among the productive groups and among individuals with low socioeconomic status. There is scope of community-based rehabilitation of the mentally disabled.

6 citations


Proceedings ArticleDOI
04 Feb 2009
TL;DR: A novel predictive statistical framework is presented to improve the performance of an Eigen Tracker which uses fast and efficient eigen space updates to learn new views of the object being tracked on the fly using candid co-variance free incremental PCA.
Abstract: We present a novel predictive statistical framework to improve the performance of an Eigen Tracker which uses fast and efficient eigen space updates to learn new views of the object being tracked on the fly using candid co-variance free incremental PCA. The proposed system detects and tracks an object in the scene by learning the appearance model of the object online motivated by non-traditional uniform norm. It speeds up the tracker many fold by avoiding nonlinear optimization generally used in the literature.

5 citations


Proceedings ArticleDOI
25 Jul 2009
TL;DR: A framework for classification of text document images based on their script and uses edge direction based features to capture the distribution of curvature and a recently proposed feature selection algorithm to obtain the most discriminating curvature features.
Abstract: We present a framework for classification of text document images based on their script. We deal with the domain of Indian scripts which has high inter script similarities. Indian scripts have characteristic curvature distributions which help in visual discrimination of scripts. We use edge direction based features to capture the distribution of curvature. We also use a recently proposed feature selection algorithm to obtain the most discriminating curvature features. We form hierarchy (automatically) based on statistical distances between the script models. Hierarchy allows us to group similar scripts at one level and then focus on the classification between the similar scripts at the next level leading to improvement in accuracy. We show experiments and results on a large set of about 3400 images.

4 citations


Proceedings ArticleDOI
25 Jul 2009
TL;DR: This paper describes how a new XML based tagging scheme has been exploited to achieve the objectives of the project aimed at developing OCR for 11 scripts of Indian origin for which mature OCR technology was not available.
Abstract: This paper presents an XML-based scheme for managing a large multilingual OCR project. In particular we describe how a new XML based tagging scheme has been exploited to achieve the objectives of the project. Managing a large multi-lingual OCR project involving multiple research groups, developing script specific and script independent technologies in a collaborative fashion is a challenging problem. In this paper, we present some of the software and data management strategies designed for the project aimed at developing OCR for 11 scripts of Indian origin for which mature OCR technology was not available.

3 citations


Book ChapterDOI
01 Jan 2009
TL;DR: An interactive access scheme for Indian language document collection is presented using techniques for word-image-based search and retrieval and the compression and retrieval paradigm is applicable even for those Indian scripts for which reliable OCR technology is not available.
Abstract: Indexing and retrieval of Indian language documents is an important problem. We present an interactive access scheme for Indian language document collection using techniques for word-image-based search. The compression and retrieval paradigm we propose is applicable even for those Indian scripts for which reliable OCR technology is not available. Our technique for word spotting is based on exploiting the geometrical features of the word image. The word image features are represented in the form of a graph called geometric feature graph (GFG). The GFG is encoded as a string which serves as a compressed representation of the word image skeleton. We have also augmented the GFG-based word image spotting with latent semantic analysis for more effective retrieval. The query is specified as a set of word images and the documents that best match with the query representation in the latent semantic space are retrieved. The retrieval paradigm is further enhanced to the conceptual level with the use of document image content-domain knowledge specified in the form of an ontology.

Proceedings ArticleDOI
04 Feb 2009
TL;DR: A document image analysis system which performs segmentation, content characterization as well as semantic labeling of components, and has obtained promising results for semantic segmentation of over 30 categories of documents in Indian scripts.
Abstract: In this paper we describe our document image analysis system which performs segmentation, content characterization as well as semantic labeling of components. Segmentation is done using white spaces and gives the segmented components arranged in a hierarchy. Semantic labeling is done using domain knowledge which is specified where possible in the form of a document model applicable to a class of documents. The novelty of the system lies in the suite of methods it employs which are capable of handling documents in Indian scripts. We have obtained promising results for semantic segmentation of over 30 categories of documents in Indian scripts.

Book ChapterDOI
07 Jul 2009
TL;DR: A unique representation scheme for events in an area under surveillance is presented, which provides a mechanism to analyze videos from multiple perspectives for unusual activity analysis and proposes clustering in event component spaces and defines algebraic operations on these clusters to find co-occurrences of event components.
Abstract: We present a unique representation scheme for events in an area under surveillance, which provides a mechanism to analyze videos from multiple perspectives for unusual activity analysis. We propose clustering in event component spaces and define algebraic operations on these clusters to find co-occurrences of event components. A usualnessmeasure for clusters is proposed that not only gives a measure on how usual or unusual an activity is, but also a basis for analyzing and predicting the possibly usual or unusual activities that can occur in the surveillance region.

Proceedings ArticleDOI
29 Oct 2009
TL;DR: This paper proposes to use handwriting without recognition as a temporal medium of communication in synchronization with other media like audio and video to ensure that the interfaces developed are language independent and provide rich, natural and intuitive interaction.
Abstract: Handwriting has been conventionally used for input by applying handwriting recognition. In this paper we propose to use handwriting without recognition as a temporal medium of communication in synchronization with other media like audio and video. This ensures that the interfaces developed are language independent and provide rich, natural and intuitive interaction. We present multiple applications exploiting this concept.

Book ChapterDOI
15 Dec 2009
TL;DR: A hierarchical framework to perform automatic categorization and reorientation of consumer images based on their content and a recently proposed information theoretic feature selection method is used to find most discriminant subset of features and also to reduce the dimension of feature space.
Abstract: A hierarchical framework to perform automatic categorization and reorientation of consumer images based on their content is presented. Sometimes the consumer rotates the camera while taking the photographs but the user has to later correct the orientation manually. The present system works in such cases; it first categorizes consumer images in a rotation invariant fashion and then detects their correct orientation. It is designed to be fast, using only low level color and edge features. A recently proposed information theoretic feature selection method is used to find most discriminant subset of features and also to reduce the dimension of feature space. Learning methods are used to categorize and detect the correct orientation of consumer images. Results are presented on a collection of about 7000 consumer images, collected by an independent testing team, from the internet and personal image collections.