scispace - formally typeset
Search or ask a question

Showing papers by "Santanu Chaudhury published in 2004"


Journal ArticleDOI
TL;DR: This paper surveys important approaches to active 3-D object recognition and reviews existing approaches towards another important application of an active sensor namely, that of scene analysis and interpretation.

138 citations


Proceedings ArticleDOI
20 Dec 2004
TL;DR: A Gabor filter-based feature extraction scheme is used to generate a 384 dimensional feature vector for each fingerprint image through a novel two stage classifier in which K nearest neighbour acts as the first step and finds out the two most frequently represented classes amongst the K nearest patterns.
Abstract: Fingerprint classification is important for different practical applications An accurate and consistent classification can greatly reduce fingerprint-matching time for large databases We use a Gabor filter-based feature extraction scheme to generate a 384 dimensional feature vector for each fingerprint image The classification of these patterns is done through a novel two stage classifier in which K nearest neighbour (KNN) acts as the first step and finds out the two most frequently represented classes amongst the K nearest patterns, followed by the pertinent SVM classifier choosing the most apt class of the two 6 SVMs have to be trained for a four class problem, (/sup 6/C/sub 2/), that is, all one-against-one SVMs Using this novel scheme and working on the FVC 2000 database (257 final images) we achieved a maximum accuracy of 9881% with a rejection percentage of 195%This is significantly higher than most reported results in contemporary literature The SVM training time was 145 seconds, ie, 24 seconds per SVM on a Pentium III machine

31 citations


Journal ArticleDOI
TL;DR: A framework for a distributed knowledge based system by integrating case based reasoning (CBR) and Fuzzy Logic and the framework for handling distributed case bases enables the system to construct solution based on collective experience distributed by discipline, time, and geography.

16 citations


Journal ArticleDOI
TL;DR: A prototype implementation of a cooperative agent-based multimedia retrieval architecture that integrates a set of dissimilar collections of multimedia data on Indian cultural heritage and a comparison of the retrieval results with some existing Internet search tools proves the effectiveness of the architecture.
Abstract: We present the planning scheme for a cooperative agent-based multimedia retrieval architecture that integrates a heterogeneous set of repositories into a coherent information system. The agents in the system collaborate in context of a conceptual query to formulate unique retrieval strategies for the different collections. The retrieval plan makes need-based use of independent content analysis tools available on the network. The retrieval strategies for the repositories so formulated satisfy the specified constraints on quality of results and the response time requirements. The retrieval plan is reactively updated based on the retrieval performance at the individual repositories. We present some experimental results to show the effectiveness of the planning scheme for repositories with different characteristics and the scalability of the architecture. We present a prototype implementation of this architecture that integrates a set of dissimilar collections of multimedia data on Indian cultural heritage. A comparison of the retrieval results with some existing Internet search tools proves the effectiveness of the architecture.

15 citations


Proceedings ArticleDOI
24 Oct 2004
TL;DR: A robust shape-based on-line tracker for simultaneously tracking the motion of both hands, that is robust to cases of background clutter, other moving objects, occlusions of one hand by the other and a wide range of illumination variations is presented.
Abstract: This paper presents a robust shape-based on-line tracker for simultaneously tracking the motion of both hands, that is robust to cases of background clutter, other moving objects, occlusions of one hand by the other and a wide range of illumination variations. The tracker is based on an online predictive eigentracking framework. This framework allows efficient tracking of articulate objects, which change in appearance across views. We show results of successful tracking across all possible cases of motion dynamics of both hands during occlusion and a wide range of illumination conditions.

10 citations


Proceedings ArticleDOI
24 Oct 2004
TL;DR: A novel predictive statistical framework is presented to improve the performance of an eigentracker and incorporates a new importance sampling mechanism which increases the robustness of the eigent racker and enables it to track nonconvex objects better.
Abstract: We present a novel predictive statistical framework to improve the performance of an eigentracker. In addition, we use fast and efficient eigenspace updates to learn new views of the object being tracked on the fly. We also incorporate a new importance sampling mechanism which increases the robustness of the eigentracker and enables it to track nonconvex objects better. Our eigentracker is flexible-it is possible to use it symbolically with other trackers. We show its successful application in hand gesture analysis; and face and person tracking.

9 citations


Proceedings ArticleDOI
23 Jan 2004
TL;DR: Heritage+ deals with document images as distinct media type and implements tools and techniques for browsing and querying document images along with other media elements like video sequences and images and proposes a new scheme for encoding and use of ontology for accessing multimedia collection.
Abstract: We present Heritage+, an integrated platform for interactive access of different types of media elements through an unified interface. A unique aspect of Heritage+ is that it deals with document images as distinct media type and implements tools and techniques for browsing and querying document images along with other media elements like video sequences and images. Further, Heritage+ proposes a new scheme for encoding and use of ontology for accessing multimedia collection. In the context of document images, the ontology specifies the document class-specific semantics of the logical components that help in an automated semantically meaningful linking of documents and their components with heterogeneous media-type resources. Further, Heritage+ supports conceptual query of document images along with other media elements. This multifunctional access interface to the document images is provided in Heritage+ using a novel model guided document image segmentation scheme and word-image based indexing scheme.

8 citations


Proceedings Article
01 Jan 2004
TL;DR: A scheme for transcoding document images for presentation on handheld devices like PDA’s, e-books etc and use of the knowledge of the document model represented through standard ontology language for generation of document summary is presented.
Abstract: In this paper we have presented a scheme for transcoding document images for presentation on handheld devices like PDA’s, e-books etc. We have proposed technqiues suitable, in particular ,for images of documents of Indian languages having Devanagari based scripts (viz. Hindi, Marathi, Bengali, Assamese, etc). Appropriate compression scheme for textual component of document images exploiting script specific characteristics has been suggested. We have also explored use of the knowledge of the document model represented through standard ontology language for generation of document summary. An experimented system has been developed for validation of these schemes.

7 citations


Proceedings Article
01 Jan 2004
TL;DR: This paper proposes schemes for using learning in video analysis tasks like content based filtering and shot summarization, and explores Independent Component Analysis and extracts Independent Components that act as features for describing the content of a shot.
Abstract: In this paper we propose schemes for using learning in video analysis tasks like content based filtering and shot summarization. Shot segmentation is performed by our neuro-fuzzy framework, which extracts fuzzy rules for video segmentation from the trained neuro-fuzzy network. We explore Independent Component Analysis and extract Independent Components that act as features for describing the content of a shot. We prove our claim by showing simple results of a Content Based Filtering scheme based on this. We also propose a technique for summarizing content of a video shot. Unlike, keyframe based approaches we try to find out those ”critical windows” from the shot sequence that best describes the content of a shot. Hierchical clustering of these windows provide the summarization of the shots. This scheme thus preserves the original component objects that make up the video thus characterizing the semantically essential information present in the video.

2 citations


Proceedings ArticleDOI
23 Aug 2004
TL;DR: A computational model for generating an interpretation of a video shot based on the proposed principle of perceptual prominence is proposed and a formulation of the perceptual grouping problem in the spatio-temporal domain is provided to identify the perceptual clusters.
Abstract: We propose a computational model for generating an interpretation of a video shot based on our proposed principle of perceptual prominence. We also provide a formulation of the perceptual grouping problem in the spatio-temporal domain to identify the perceptual clusters. We illustrate our approach with experimental results.

2 citations


Proceedings Article
01 Jan 2004
TL;DR: The dynamic information present over different frames of a sports video is exploited to characterize the change in the configuration of players across different frames to characterize the scene dynamics.
Abstract: Dynamic changes of object positions provide an important clue for video characterization. In the present work, we exploit the dynamic information present over different frames of a sports video to characterize the change in the configuration of players across different frames. For scene dynamic characterization firstly location of players are detected by using motion based segmentation. We then construct a polygon with the players placed at the vertices of the polygon. Next the change in the shape of the polygon across different frames is computed in terms of the difference in the moments of the polygon shape. The difference is computed for seven moment invariants. The mean of these difference values, called mean-difference, is used as the key feature for characterizing the scene dynamics. An evolutionary learning based fuzzy rule based system is developed for characterizing sports sequences using duifference values.

Proceedings Article
01 Jan 2004
TL;DR: This work describes a novel spatio-temporal perceptual grouping scheme, applied on blobs, that makes use of specified temporal consistency model and results in blob cliques or perceptual clusters or subjects in the scene.
Abstract: We focus on the problem of video shot interpretation by making use of perceptual grouping principles on the visual primitives (2D blobs) in a video shot. We present a novel scheme for modeling the homogeneous regions in the form of 2D blobs, that can be tracked easily across the frames. We describe a novel spatio-temporal perceptual grouping scheme, applied on blobs, that makes use of specified temporal consistency model. The grouping results in blob cliques or perceptual clusters or subjects in the scene. A high level semantic interpretation of scenes is done using the principle of Perceptual Prominence of temporal behaviors of the perceptual clusters.

Proceedings Article
01 Jan 2004
TL;DR: This work proposes a scheme for view synthesis of scenes containing man-made objects from images taken by arbitrary, uncalibrated cameras that computes z-buffer values that can be used for handling occlusions in the synthesized view, but requires the computation of the infinite homography.
Abstract: We propose a scheme for view synthesis of scenes containing man-made objects from images taken by arbitrary, uncalibrated cameras. Under the assumption of availability of the correspondence of three vanishing points,in general position, our scheme computes z-buffer values that can be used for handling occlusions in the synthesized view. This requires the computation of the infinite homography. We also present an alternate formulation of the technique which works with the same assumptions but does not require infinite homography computation. We present experimental results to establish the validity of both formulations.

Proceedings Article
01 Jan 2004
TL;DR: This paper proposes a scheme for representing content of an image as a combination of features from multiple examples for processing multi-example based queries and designed novel query processing schemes for image retrieval based upon this representation.
Abstract: In this paper, we consider the problem of designing a CBIR (content based image retrieval) system where multiple query examples can be used to indicate the need to retrieve not only images similar to the individual examples but also those images which actually represent a combination of the content of query images. We propose a scheme for representing content of an image as a combination of features from multiple examples. We have designed novel query processing schemes for image retrieval based upon this representation. We have also shown applicablity of relevance feedback based learning scheme for processing multi-example based queries. Extensive experimental results with facial and natural image databases have validated effectiveness of our approach.