scispace - formally typeset
Search or ask a question

Showing papers by "Santanu Chaudhury published in 2008"


Proceedings ArticleDOI
30 Oct 2008
TL;DR: This work uses MOWL, a multimedia extension of Web Ontology Language (OWL) which is capable of describing domain concepts in terms of their media properties and of capturing the inherent uncertainties involved.
Abstract: In this work, we offer an approach to combine standard multimedia analysis techniques with knowledge drawn from conceptual metadata provided by domain experts of a specialized scholarly domain, to learn a domain-specific multimedia ontology from a set of annotated examples. A standard Bayesian network learning algorithm that learns structure and parameters of a Bayesian network is extended to include media observables in the learning. An expert group provides domain knowledge to construct a basic ontology of the domain as well as to annotate a set of training videos. These annotations help derive the associations between high-level semantic concepts of the domain and low-level MPEG-7 based features representing audio-visual content of the videos. We construct a more robust and refined version of this ontology by learning from this set of conceptually annotated videos. To encode this knowledge, we use MOWL, a multimedia extension of Web Ontology Language (OWL) which is capable of describing domain concepts in terms of their media properties and of capturing the inherent uncertainties involved. We use the ontology specified knowledge for recognizing concepts relevant to a video to annotate fresh addition to the video database with relevant concepts in the ontology. These conceptual annotations are used to create hyperlinks in the video collection, to provide an effective video browsing interface to the user.

20 citations


Proceedings ArticleDOI
16 Dec 2008
TL;DR: A video coding scheme based on parametric compression of texture is proposed that achieves upto 54.52% more compression as compared to the standard H.264/AVC at similar visual quality.
Abstract: In this paper a video coding scheme based on parametric compression of texture is proposed Each macro block is characterized either as an edge block, or as a non edge block containing texture The non edge blocks are coded by modeling them as an auto-regressive process (AR) By applying the AR model in spatio-temporal domain, we ensure both spatial as well as temporal consistency Edge blocks are encoded using the standard H264/AVC The proposed algorithm achieves upto 5452% more compression as compared to the standard H264/AVC at similar visual quality

20 citations


Proceedings ArticleDOI
16 Dec 2008
TL;DR: This paper uses video epitomes for segmenting foreground objects from background and applies pLSA for finding correlations among these patches to learn usual activities in the scene and extends it to classify a novel video as usual or unusual.
Abstract: In this paper, we address the problem of unsupervised learning of usual patterns of activities in an area under surveillance and detecting deviant patterns We use video epitomes for segmenting foreground objects from background and obtain an approximate shape, trajectory and temporal information in the form of space-time patches We apply pLSA for finding correlations among these patches to learn usual activities in the scene We also extend pLSA to classify a novel video as usual or unusual

18 citations


Proceedings ArticleDOI
01 Dec 2008
TL;DR: A loss functional is formulated to quantify the discrepency between state transitional probabilities in the original video and that in the intended summary video, and optimize this functional to produce high quality summarization capturing the user perception.
Abstract: We present a video summarization technique based on supervised learning. Within a class of videos of similar nature, user provides the desired summaries for a subset of videos. Based on this supervised information, the summaries for other videos in the same class are generated. We derive frame-transitional features and subsequently represent each frame transition as a state. We then formulate a loss functional to quantify the discrepency between state transitional probabilities in the original video and that in the intended summary video, and optimize this functional. We experimentally validate the performance of the technique using cross-validation scores on two different class of videos, and demonstrate that the proposed technique is able to produce high quality summarization capturing the user perception.

14 citations


Proceedings ArticleDOI
13 Oct 2008
TL;DR: A scheme for on-line semantic transcoding of the video captured by the smart camera is proposed and a local associative computation based change detection scheme for identifying frames of interest is proposed.
Abstract: Smart cameras are expected to be important components for creating ubiquitous multimedia environments. In this paper, we propose a scheme for on-line semantic transcoding of the video captured by the smart camera. The transcoding process selects frames of importance and regions of interest for use by other processing elements in a ubiquitous computing environment. We have proposed a local associative computation based change detection scheme for identifying frames of interest. The algorithm also segments out the region of change. The computation is structured for easy implementation in DSP based embedded environment. The transcoding scheme enables the camera to communicate only regions of change in frames of interest to a server or a peer. Consequently communication and processing overhead reduces in a networked application environment. Experimental results have established effectiveness of the transcoding scheme.

10 citations


Journal ArticleDOI
TL;DR: A handcrafted fuzzy rule-based system for segmentation and identification of different tissue types in magnetic resonance (MR) brain images using a combination of histogram and spatial neighborhood-based features to handle variations and variability in features corresponding to different types of tissues.

10 citations


Proceedings ArticleDOI
01 Dec 2008
TL;DR: A classifier unifying local features based representation and subspace based learning is presented and the system allows hierarchy by merging the KES in the feature space, which shows hierarchy on a dataset of videos collected over the internet.
Abstract: We present a classifier unifying local features based representation and subspace based learning. We also propose a novel method to merge kernel eigen spaces (KES) in feature space. Subspace methods have traditionally been used with the full appearance of the image. Recently local features based bag-of-features (BoF) representation has performed impressively on classification tasks. We use KES with BoF vectors to construct class specific subspaces and use the distance of a query vector from the database KESs as the classification criteria. The use of local features makes our approach invariant to illumination, rotation, scale, small affine transformation and partial occlusions. The system allows hierarchy by merging the KES in the feature space. The classifier performs competitively on the challenging Caltech-101 dataset under normal and simulated occlusion conditions. We show hierarchy on a dataset of videos collected over the internet.

8 citations


Proceedings ArticleDOI
16 Dec 2008
TL;DR: A novel framework for automated analysis of surveillance videos that applies cluster algebra to mine this summary from multiple perspectives and to adapt association learning for automated selection of components because of which the event is unusual is proposed.
Abstract: In this paper, we propose a novel framework for automated analysis of surveillance videos. By analysis, we imply summarizing and mining of the information in the video for learning usual patterns and discovering unusual ones. We approach this video analysis problem by acknowledging that a video contains information at multiple levels and in multiple attributes. Each such component and co-occurrences of these component values play an important role in characterizing an event as usual or unusual. Therefore, we cluster the video data at multiple levels of abstraction and in multiple attributes and view these clusters as a summary of the information in the video. We apply cluster algebra to mine this summary from multiple perspectives and to adapt association learning for automated selection of components because of which the event is unusual. We also propose a novel incremental clustering algorithm.

8 citations


Proceedings ArticleDOI
01 Dec 2008
TL;DR: The novelty of the approach presented in this paper is the unique object-based video coding framework for videos obtained from a static camera that does not require explicit 2D or 3D models of objects and is general enough to satisfy the need for varying types of objects in the scene.
Abstract: The novelty of the approach presented in this paper is the unique object-based video coding framework for videos obtained from a static camera. As opposed to most existing methods, the proposed method does not require explicit 2D or 3D models of objects and hence is general enough to satisfy the need for varying types of objects in the scene. The proposed system detects and tracks an object in the scene by learning the appearance model of each object online using nontraditional uniform norm based subspace. At the same time the object is coded using the projection coefficients to the orthonormal basis of the subspace learnt. The tracker incorporates a predictive framework based upon Kalman filter for predicting the five motion parameters. The proposed method shows substantially better compression than MPEG2 based coding with almost no additional complexity.

4 citations


Proceedings ArticleDOI
16 Dec 2008
TL;DR: A novel framework for object detection and localization in images containing appreciable clutter and occlusions is proposed and a method similar to the recently proposed spatial scan statistic is used to refine the object localization estimates obtained from the sampling process.
Abstract: We propose a novel framework for object detection and localization in images containing appreciable clutter and occlusions. The problem is cast in a statistical hypothesis testing framework. The image under test is converted into a set of local features using affine invariant local region detectors, described using the popular SIFT descriptor. Due to clutter and occlusions, this set is expected to contain features which do not belong to the object. We sample subsets of local features from this set and test for the alternate hypothesis of object present against the null hypothesis of object absent. Further, we use a method similar to the recently proposed spatial scan statistic to refine the object localization estimates obtained from the sampling process. We demonstrate the results of our method on the two datasets TUD Motorbikes and TUD Cars. TUD Cars database has background clutter. TUD Motorbikes dataset is recognized to have substantial variation in terms of scale, background, illumination, viewpoint and occlusions.

Proceedings ArticleDOI
16 Dec 2008
TL;DR: This paper shows how the 3D structure of multiple objects and their affine repetitions may be computed and used for synthesizing new views in a 3D scene for reconstruction and view synthesis from a single image.
Abstract: Symmetry and affine repetitions are common in scenes with man-made structures. In this paper we propose a technique to exploit affine repetitions in a 3D scene for reconstruction and view synthesis from a single image. Assuming three vanishing points in the image, we show how the 3D structure of multiple objects and their affine repetitions may be computed and used for synthesizing new views. The reconstructed objects may also be inserted in other scenes to create augmented images.

Journal ArticleDOI
TL;DR: A new scheme for media feature based concept modelling is proposed to address the limitation of traditional ontology based multimedia retrieval systems and supports probabilistic evidential reasoning for robust concept recognition in multimedia documents.
Abstract: This paper presents a new approach to build distributed digital libraries with multimedia contents. The authors propose a new scheme for media feature based concept modelling to address the limitation of traditional ontology based multimedia retrieval systems. The perceptual models can be used for semantic query processing using standard MPEG-7 media content descriptions. The authors have defined a new ontology language M-OWL (multimedia web ontology language) to support this perceptual modelling. M-OWL is an extension to the OWL (web ontology language) with new constructs for formal representation of the media properties of the domain concepts. It supports probabilistic evidential reasoning for robust concept recognition in multimedia documents. The separation of perceptual modelling of concepts from the repository architecture enables seamless integration of diverse multimedia contents. SOA (Service Oriented Architecture) is used to integrate large number of distributed information sources, each of which is modelled as an intelligent information agent. The authors have demonstrated the capability of the architecture by building a few research prototypes, namely a virtual encyclopaedia of Indian culture, a document image repository and a multimedia portal.