scispace - formally typeset
Search or ask a question

Showing papers by "Santanu Chaudhury published in 2007"


Journal ArticleDOI
TL;DR: A clustering-based technique has been devised for estimating globally matched wavelet filters using a collection of groundtruth images and a text extraction scheme for the segmentation of document images into text, background, and picture components is extended.
Abstract: In this paper, we have proposed a novel scheme for the extraction of textual areas of an image using globally matched wavelet filters. A clustering-based technique has been devised for estimating globally matched wavelet filters using a collection of groundtruth images. We have extended our text extraction scheme for the segmentation of document images into text, background, and picture components (which include graphics and continuous tone images). Multiple, two-class Fisher classifiers have been used for this purpose. We also exploit contextual information by using a Markov random field formulation-based pixel labeling scheme for refinement of the segmentation results. Experimental results have established effectiveness of our approach.

159 citations


Book ChapterDOI
01 Jan 2007
TL;DR: A new Bayesian Network based probabilistic reasoning framework with M-OWL for semantic interpretation of multimedia data and a new model for ontology integration, based on the similarity of the concepts in the media domain are proposed.
Abstract: An ontology designed for multimedia applications should enable integration of the conceptual and media spaces. We present M-OWL, a new ontology language, that supports this capability. M-OWL supports explicit definition of media properties for the concepts. The language has been defined as an extension of OWL, the standard ontology language for the web. We have proposed a new Bayesian Network based probabilistic reasoning framework with M-OWL for semantic interpretation of multimedia data. We have also proposed a new model for ontology integration, based on the similarity of the concepts in the media domain. It can be used to integrate several multimedia and traditional ontologies.

33 citations


Proceedings ArticleDOI
28 Sep 2007
TL;DR: A reinforcement learning algorithm is proposed for the parameters of the Bayesian Network with the implicit feedback obtained from the clickthrough data to provide personalized ranking of results in a video retrieval system.
Abstract: This paper proposes a new method for using implicit user feedback from clickthrough data to provide personalized ranking of results in a video retrieval system. The annotation based search is complemented with a feature based ranking in our approach. The ranking algorithm uses belief revision in a Bayesian Network, which is derived from a multimedia ontology that captures the probabilistic association of a concept with expected video features. We have developed a content model for videos using discrete feature states to enable Bayesian reasoning and to alleviate on-line feature processing overheads. We propose a reinforcement learning algorithm for the parameters of the Bayesian Network with the implicit feedback obtained from the clickthrough data.

11 citations


Book ChapterDOI
18 Dec 2007
TL;DR: This paper presents a framework for using a case-based reasoning system for stock analysis in financial market using a hierarchical structure for case representation and incorporates a multi-criteria decision-making algorithm which furnishes the most suitable solution with respect to the current market scenario.
Abstract: This paper presents a framework for using a case-based reasoning system for stock analysis in financial market. The unique aspect of this paper is the use of a hierarchical structure for case representation. The system further incorporates a multi-criteria decision-making algorithm which furnishes the most suitable solution with respect to the current market scenario. Two important aspects of financial market are addressed in this paper: stock evaluation and investment planning. CBR and multi-criteria when used in conjunction offer an effective tool for evaluating goodness of a particular stock based on certain factors. The system also suggests a suitable investment plan based on the current assets of a particular investor. Stock evaluation maps to a flat case structure, but investment planning offers a scenario more suited for structuring the case into successive detailed layers of information related to different facets. This naturally leads to a hierarchical case structure.

10 citations


Proceedings ArticleDOI
23 Sep 2007
TL;DR: A segmentation based histogram matching scheme for enhancing small portions of the text in these manuscripts that have degraded with time and are not readable is proposed.
Abstract: In this paper we address the issue of enhancement in the quality of scanned images of old manuscripts. Small portions of the text in these manuscripts have degraded with time and are not readable. We propose a segmentation based histogram matching scheme for enhancing these degraded text regions. To automatically identify the degraded text we use a matched wavelet based text extraction algorithm followed by MRF(Markov Random Field) post processing. Additionally we do background clearing to improve the quality of results. This method does not require any a priori information about the font, font size, background texture or geometric transformation. We have tested our method on a variety of manuscript images. The results show proposed method to be a robust, versatile and effective tool for enhancement of manuscript images.

8 citations


Proceedings ArticleDOI
23 Sep 2007
TL;DR: It is shown through extensive experiments on a large database that use of LSA for document images provides improvements in retrieval precision as is the case with electronic text documents.
Abstract: In this paper we present an application of latent semantic analysis (LSA) for indexing and retrieval of document images with text The query is specified as a set of word images and the documents which best match with the query representation in the the latent semantic space are retrieved We show through extensive experiments on a large database that use of LSA for document images provides improvements in retrieval precision as is the case with electronic text documents

7 citations


Proceedings ArticleDOI
05 Mar 2007
TL;DR: An integrated scheme for document image compression is presented which preserves the layout structure, and still allows the display of textual portions to adapt to the user preferences and screen area, and derives an SVG representation of the complete document image.
Abstract: We present an integrated scheme for document image compression which preserves the layout structure, and still allows the display of textual portions to adapt to the user preferences and screen area. We encode the layout structure of the document images in an XML representation. The textual components and picture components are compressed separately into different representations. We derive an SVG (scalable vector graphics) representation of the complete document image. Compression is achieved since the word-images are encoded using specifications for geometric primitives that compose a word. A document rendered from its SVG representation can be adapted for display and interactive access through common browsers on desktop as well as mobile devices. We demonstrate the effectiveness of the proposed scheme for document access

5 citations


Proceedings ArticleDOI
23 Sep 2007
TL;DR: An email application in which the users are provided with an authoring and rendering environment to compose, view, and reply to messages in the form of Patra, an integrated document architecture which incorporates handwritten illustrations captured and rendered in a temporal fashion synchronized with audio, video, text, and image data.
Abstract: In this paper we present Patra - an integrated document architecture which incorporates handwritten illustrations captured and rendered in a temporal fashion synchronized with audio, video, text, and image data. The architecture of Patra permits non-linear growth in the form of multiple hierarchically organized play streams. Semantic metadata is also an integral part of Patra which serves a useful purpose of organizing such documents in a collection. We have developed an email application in which the users are provided with an authoring and rendering environment to compose, view, and reply to messages in the form of Patra.

4 citations


Proceedings ArticleDOI
02 Nov 2007
TL;DR: A novel content-based re-ranking scheme for enhancing the precision of video retrieval on the Web that effectively re-ranks results for new text queries submitted to the video retrieval system, leading to better satisfaction of the users' information need.
Abstract: We present a novel content-based re-ranking scheme for enhancing the precision of video retrieval on the Web. We use ontology specified knowledge of the video domain to map user queries to domain-based concepts. The user preferences are learned implicitly from the web logs of users' interaction with a video search engine. A ranking SVM is trained for each concept to learn the ranking function which incorporates user preferences for the concept. The videos are represented by a set of ingeniously derived content- based features which are based on MPEG-7 descriptors. Our re-ranking scheme thus effectively re-ranks results for new text queries submitted to our video retrieval system, leading to better satisfaction of the users' information need.

4 citations


Journal ArticleDOI
TL;DR: A computational model for analyzing a video shot based on a novel principle of perceptual prominence that captures the key aspects of mise-en-scene required for characterizing a video scene.
Abstract: We present a novel approach for applying perceptual grouping principles to the spatio-temporal domain of video. Our perceptual grouping scheme, applied on blobs, makes use of a specified spatio-temporal coherence model. The grouping scheme identifies the blob cliques or perceptual clusters in the scene. We propose a computational model for analyzing a video shot based on a novel principle of perceptual prominence. The principle of perceptual prominence captures the key aspects of mise-en-scene required for characterizing a video scene.

1 citations


Proceedings ArticleDOI
01 Sep 2007
TL;DR: In this paper, a genetic algorithm based FSM synthesis technique is presented for minimizing dynamic power together with leakage power reduction both in combinational and sequential part of FSM, and a trade-off between static and dynamic power also has been done.
Abstract: Leakage power is found to be the dominant contributor to total power consumption at present technology level. Large amount of power can be saved if it is taken care early in the design cycle during logic synthesis. While most of the works on FSM synthesis target optimization of switching activity for minimizing dynamic power, yet inclusion of an accurate model for static (leakage) power during synthesis can lead to a considerable saving in total power consumption. In this paper a genetic algorithm based FSM synthesis technique is presented for minimizing dynamic power together with leakage power reduction both in combinational and sequential part of FSM. Simulation results show 22.18% improvement in static power and 8.02% improvement in dynamic power when compared with NOVA. A trade-off between static and dynamic power also has been done.

Journal ArticleDOI
01 Feb 2007
TL;DR: Two techniques are proposed for novel view synthesis of scenes containing man-made objects from images taken by arbitrary, uncalibrated cameras under the assumption of availability of the correspondence of three vanishing points.
Abstract: We have attempted the problem of novel view synthesis of scenes containing man-made objects from images taken by arbitrary, uncalibrated cameras. Under the assumption of availability of the correspondence of three vanishing points, in general position, we propose two techniques. The first is a transfer-based scheme which synthesizes new views with only a translation of the virtual camera and computes z-buffer values for handling occlusions in synthesized views. The second is a reconstruction-based scheme which synthesizes arbitrary new views in which the camera can undergo rotation as well as translation. We present experimental results to establish the validity of both formulations.

Proceedings ArticleDOI
05 Mar 2007
TL;DR: A framework which integrates object-model knowledge with the perceptual organization process and demonstrates the advantages of the add-on grouping evidences as contributed by the object models for a more robust perceptual organization in the spatio-temporal domain is presented.
Abstract: In this paper we present a framework which integrates object-model knowledge with the perceptual organization process. We demonstrate the advantages of the add-on grouping evidences as contributed by the object models for a more robust perceptual organization in the spatio-temporal domain. Our system performs detection of foreground objects along with recognition in video