scispace - formally typeset
Search or ask a question

Showing papers by "Santanu Chaudhury published in 2012"


Journal ArticleDOI
TL;DR: An ontology learning scheme is proposed in this paper which combines standard multimedia analysis techniques with knowledge drawn from conceptual meta-data to learn a domain-specific multimedia ontology from a set of annotated examples.
Abstract: A domain-specific ontology models a specific domain or part of the world. In fact, ontologies have proven to be an excellent medium for capturingpagebreak the knowledge of a domain. We propose an ontology learning scheme in this paper which combines standard multimedia analysis techniques with knowledge drawn from conceptual meta-data to learn a domain-specific multimedia ontology from a set of annotated examples. A standard machine-learning algorithm that learns structure and parameters of a Bayesian network is extended to include media observables in the learning. An expert group provides domain knowledge to construct a basic ontology of the domain as well as to annotate a set of training videos. These annotations help derive the associations between high-level semantic concepts of the domain and low-level media features. We construct a more robust and refined version of the basic ontology by learning from this set of conceptually annotated data. We show an application of our ontology-based framework for exploration of multimedia content, in the field of cultural heritage preservation. By constructing an ontology for the cultural heritage domain of Indian classical dance, and by offering an application for semantic annotation of the heritage collection of Indian dance videos, we demonstrate the efficacy of ou approach.

24 citations


Journal ArticleDOI
TL;DR: The scheme presents the extension of distance based hashing to kernel space for generating the indexing structure based on similarity in kernel space using the concept of multiple kernel learning to incorporate multiple features for defining the image indexing space.
Abstract: The paper presents a novel feature based indexing scheme for image collections. The scheme presents the extension of distance based hashing to kernel space for generating the indexing structure based on similarity in kernel space. The objective of the scheme is to incorporate multiple features for defining the image indexing space using the concept of multiple kernel learning. However, the indexing problems are defined with unique learning objective; therefore, a novel application of genetic algorithm is presented for the optimization task. The extensive evaluation of the proposed concept is performed for developing word based document indexing application of Devanagari, Bengali, and English scripts. In addition, the efficacy of the proposed concept is shown by experimental evaluations on handwritten digits and natural image collection.

22 citations


Book ChapterDOI
05 Nov 2012
TL;DR: The novelty of the architecture lies in merging this variety based approach with standard depth image based view synthesis pipeline, without explicitly obtaining sparse or dense 3D points, and subsequently overcomes the problems associated with existing depth based representations.
Abstract: This paper presents a novel parameterized variety based view synthesis scheme for 3DTV and multi-view systems. We have generalized the parameterized image variety approach to image based rendering proposed in [1] to handle full perspective cameras. An algebraic geometry framework is proposed for the parameterization of the variety associated with full perspective images, by image positions of three reference scene points. A complete parameterization of the 3D scene is constructed. This allows to generate realistic novel views from arbitrary viewpoints without explicit 3D reconstruction, taking few multi-view images as input from uncalibrated cameras. Another contribution of this paper is to provide a generalised and flexible architecture based on this variety model for multi-view 3DTV. The novelty of the architecture lies in merging this variety based approach with standard depth image based view synthesis pipeline, without explicitly obtaining sparse or dense 3D points. This integrated framework subsequently overcomes the problems associated with existing depth based representations. The key aspects of this joint framework are: 1) Synthesis of artifacts free novel views from arbitrary camera positions for wide angle viewing. 2) Generation of signal representation compatible with standard multi-view systems. 3) Extraction of reliable view dependent depth maps from arbitrary virtual viewpoints without recovering exact 3D points. 4) Intuitive interface for virtual view specification based on scene content. Experimental results on standard multi-view sequences are presented to demonstrate the effectiveness of the proposed scheme.

6 citations


Proceedings ArticleDOI
16 Dec 2012
TL;DR: A novel rendering algorithm based on depth image warping to support virtual pan-tilt-zoom (PTZ) functionalities during 3D view generation and a novel selective warping scheme is presented to reduce the computational cost by as much as 40% while maintaining an acceptable quality of rendering results.
Abstract: This paper presents a novel rendering algorithm based on depth image warping to support virtual pan-tilt-zoom (PTZ) functionalities during 3D view generation. A method based on "3D-ness" knob is proposed for automatically specifying the virtual camera positions along a path, to model the PTZ mechanism in projective framework. Two novel quality enhancing techniques based on segmentation cues are proposed to add pan, tilt and zoom capabilities during arbitrary view synthesis. In addition to reduce the computational load that results in providing such functionalities, a novel selective warping scheme is presented to reduce the computational cost by as much as 40% while maintaining an acceptable quality of rendering results. Experiments are performed using standard "Breakdancers" and "Ballet" video sequences to demonstrate the effectiveness of the proposed methods as compared to currently published results.

4 citations


Proceedings ArticleDOI
16 Dec 2012
TL;DR: This paper presents a novel framework that learns optimal parameters, depending on the nature of the document image content for binarization and text/graphics segmentation, using EM algorithm.
Abstract: Most of the document pre-processing techniques are parameter dependent. In this paper, we present a novel framework that learns optimal parameters, depending on the nature of the document image content for binarization and text/graphics segmentation. The learning problem has been formulated as an optimization problem using EM algorithm to adaptively learn optimal parameters. Experimental results have established the effectiveness of our approach.

4 citations


Proceedings ArticleDOI
03 Dec 2012
TL;DR: A novel parameterized variety based model is presented that integrates these different domains into one common framework to accommodate multi-view stereo for multiple exposure input views and to render photo-realistic HDR images from arbitrary virtual viewpoints for high quality 3D reconstruction.
Abstract: Multi-view stereo, novel view synthesis and high dynamic range (HDR) imaging are three pertinent areas of concern for high quality 3D view generation. This paper presents a novel parameterized variety based model that integrates these different domains into one common framework with an envisioned goal, to accommodate multi-view stereo for multiple exposure input views and to render photo-realistic HDR images from arbitrary virtual viewpoints for high quality 3D reconstruction. We extend the parameterized variety approach for rendering presented earlier by Genc and Ponce [1] to handle full perspective cameras. An efficient algebraic framework is proposed to construct an explicit parameterization of the space of all multi-view multi-exposed images. This characterization of differently exposed views allow to simultaneously recover artifacts free HDR images, and reliable depth maps from arbitrary camera viewpoints. High quality, HDR textured 3D model of the scene is obtained using these images and recovered geometry information.

4 citations


Proceedings ArticleDOI
09 Jul 2012
TL;DR: The proposed probabilistic model employs a more flexible prior distribution to model topic-topic correlations and utilizes both tag and image information for discovering subgroups in a given Flickr Group by discovering latent subgroups.
Abstract: Information management systems today face a tremendous challenge considering the growing popularity of social media repositories involving images and video. Considering the growing volume of multimedia content in such online media-sharing communities there is an increasing need for novel ways of organizing content. In this paper we consider the problem of organizing images in a given Flickr Group by discovering latent subgroups. A Flickr Group can be visualized as a collection of such subgroups where each subgroup represents a distinct theme. We model the task of discovering subgroups as that of finding highly correlated topics from a dataset containing images and associated tags. The proposed probabilistic model employs a more flexible prior distribution to model topic-topic correlations and utilizes both tag and image information for discovering such subgroups. Our experiments on Flickr Group data demonstrate that the model is able to successfully discover subgroups without any supervision.

4 citations


Book ChapterDOI
05 Nov 2012
TL;DR: A framework to provide cross-modal semantic linkage between semantically annotated content of a repository of Indian mural paintings, and a collection of labelled text documents of their narratives is proposed, based on a multimedia ontology of the domain.
Abstract: In this paper, we propose an archiving scheme for heritage mural paintings. The mural paintings typically depict stories from folk-lore, mythology and history. These narratives provide content-based correlations between different pieces of art. Our e-heritage scheme for archiving the mural paintings is based on an ontology which captures the background knowledge of these narratives. Media features and patterns derived from the mural content are used to enrich the ontology with multimedia data. We have used the multimedia web ontology language as our ontology representation scheme, as it allows perceptual modelling of domain concepts in terms of their media properties, as well as reasoning with uncertainties. Besides the mural content and its knowledge, the ontology also helps encode other aspects of the mural paintings like their painting style, color, physical location, time-period, etc., which are important parameters of their preservation. We propose a framework to provide cross-modal semantic linkage between semantically annotated content of a repository of Indian mural paintings, and a collection of labelled text documents of their narratives. This framework, based on a multimedia ontology of the domain, helps preserve the cultural heritage encoded in these artefacts.

3 citations


Proceedings ArticleDOI
04 Dec 2012
TL;DR: This paper proposes an approach to tag images in a user's collection based upon user's personal profile, his/her social context and the context defined by his/ her prior image collection using an Adaptive Context Model created from user related sources.
Abstract: Tagging is nowadays the most predominant technique to make resources searchable. These allow users to create and manage tags to annotate and categorize content. In this paper, we propose an approach to tag images in a user's collection based upon user's personal profile, his/her social context and the context defined by his/her prior image collection. We apply LDA for context modeling. In this scheme, tag similarity and tag relevance are jointly estimated so that they can profit from each other. We have used an Adaptive Context Model created from user related sources to tag images. Experimental validation with user's mobile as well as website based image collection has established effectiveness of the approach.

1 citations


Proceedings ArticleDOI
18 Sep 2012
TL;DR: An unsupervised model is proposed, called Time pLSA model, that extends the probabilistic Latent Semantic Analysis (pLSA) model to jointly capture the activities and their behaviour over time.
Abstract: In this paper, we address the problem of discovering activities and their temporal significance in an area under surveillance. Discovering activities along with its expectation of occurrence at a particular time plays an important role in many surveillance applications. We propose an unsupervised model, called Time pLSA model, that extends the probabilistic Latent Semantic Analysis (pLSA) model to jointly capture the activities and their behaviour over time. We use adaptive background subtraction to detect spatio-temporal patches, which are used as feature representation for activity patterns. Each of these patches are associated with the time slot in which they occur. Multinomial distributions are used to model both activities as distribution over spatio-temporal patches and time significance as distribution over the time-line. We demonstrate the effectiveness of our approach on a real life surveillance feed of an outdoor scene.

1 citations


Proceedings Article
01 Nov 2012
TL;DR: It is demonstrated that the proposed generative model Temporal BlockLink LDA is able to successfully extract such user-subgroups, subgroup-themes and associated temporal patterns from data in an unsupervised manner.
Abstract: The last few years have seen an exponential increase in the amount of multimedia content that is available online thanks to collaborative-online communities such as Flickr, You Tube etc. As opposed to “pure” social networking services these collaborative-online communities not only allow users to create new social links (e.g. add people to one's friend list) but also allow users to contribute multimedia content and engage in content-driven interactions. A good example of this can be seen in Flickr, in general and Flickr Group in particular where users can comment on or “like” an image contributed by another user. This paper looks at utilizing this within group user-user interaction information, along with image meta-data to discover user communities (user-subgroups) that contribute content around specific topics (subgroup-themes) at specific points in time. A good example of this is a group of users (e.g sports fans) contributing content and interacting with each other only at specific times of the year (e.g close to their favorite sporting event). We demonstrate that our proposed generative model Temporal BlockLink LDA is able to successfully extract such user-subgroups, subgroup-themes and associated temporal patterns from data in an unsupervised manner.

Proceedings ArticleDOI
16 Dec 2012
TL;DR: A novel technique for Document script identification from printed documents, using Empirical Mode Decomposition (EMD), which uses finite set of IMFs (Intrinsic Mode Functions) as feature vectors to distinguish various scripts.
Abstract: In this paper, we describe a novel technique for Document script identification(DSI) from printed documents, using Empirical Mode Decomposition (EMD). The intrinsic decomposition nature can adaptively decompose script images into a series of modes representing different local features of script images. In this method, Radon transformed script images are decomposed into finite set of IMFs (Intrinsic Mode Functions). The energy concentration in a particular orientation characterises a script texture as it indicates the dominance of individual script in that direction. We demonstrate how the proposed method use these IMFs as feature vectors to distinguish various scripts.


Book ChapterDOI
12 Jan 2012
TL;DR: An asynchronous multimedia conferencing application in which the users are provided with an authoring and rendering environment to record and view lectures and allows the users to ask and reply to doubts in the previously stored lectures making it a fully interactive but asynchronous system.
Abstract: In this paper we present Shiksha − an integrated architecture which incorporates handwritten illustrations captured and rendered in a temporal fashion synchronized with audio and video data. The architecture of Shiksha permits non-linear growth in the form of multiple hierarchically organized play streams. We have developed an asynchronous multimedia conferencing application in which the users are provided with an authoring and rendering environment to record and view lectures. It also allows the users to ask and reply to doubts in the previously stored lectures making it a fully interactive but asynchronous system.

Proceedings Article
01 Nov 2012
TL;DR: The Parameterized Image Variety approach for rendering proposed earlier by Genc and Ponce to handle full perspective cameras is extended and a fast and efficient algebraic framework is proposed for the parameterized representation of 3D scene, in terms of image pixel positions corresponding to only three reference scene points.
Abstract: This paper presents a novel parameterized variety based architecture for interactive 3DTV and Free viewpoint TV (FTV) applications. The proposed signal representation scheme allows to render free viewpoint images, taking only few sample images of the scene acquired by arbitrary, uncalibrated cameras. We extend the Parameterized Image Variety approach for rendering proposed earlier by Genc and Ponce [1] to handle full perspective cameras. A fast and efficient algebraic framework is proposed for the parameterized representation of 3D scene, in terms of image pixel positions corresponding to only three reference scene points. The key aspects of the novel FTV architecture based on this variety model are i) interactive stereoscopic view synthesis from arbitrary viewpoint ii) intuitive interface for content based virtual view specification iii) facilitation to add special effects like 3D scene augmentation.