scispace - formally typeset
Search or ask a question

Showing papers by "Santanu Chaudhury published in 2014"


Proceedings Article•DOI•
01 Dec 2014
TL;DR: This paper introduces the concept of a common latent semantic space, spanning multiple domains, using topic modeling of semantic clustered vocabularies of distinct domains, and shows that there is a marked improvement in the precision of predicting user preferences for items in one domain when given the preferences in another domain.
Abstract: Cross-domain recommendation systems exploit tags, textual descriptions or ratings available for items in one domain to recommend items in multiple domains. Handling unstructured/ unannotated item information is, however, a challenge. Topic modeling offer a popular method for deducing structure in such data corpora. In this paper, we introduce the concept of a common latent semantic space, spanning multiple domains, using topic modeling of semantic clustered vocabularies of distinct domains. The intuition here is to use explicitly-determined semantic relationships between non-identical, but possibly semantically equivalent, words in multiple domain vocabularies, in order to capture relationships across information obtained in distinct domains. The popular WordNet based ontology is used to measure semantic relatedness between textual words. The experimental results shows that there is a marked improvement in the precision of predicting user preferences for items in one domain when given the preferences in another domain.

25 citations


Proceedings Article•DOI•
11 Aug 2014
TL;DR: This paper uses latent Dirichlet allocation (LDA) to learn latent properties of items, expressed in terms of topic proportions, derived from their textual description, and infer user's topic preferences or user profile in the same latent space, based on her historical ratings.
Abstract: Standard Collaborative Filtering (CF) algorithms make use of interactions between users and items in the form of implicit or explicit ratings alone for generating recommendations. Similarity among users or items is calculated purely based on rating overlap in this case, without considering explicit properties of users or items involved, limiting their applicability in domains with very sparse rating spaces. In many domains such as movies, news or electronic commerce recommenders, considerable contextual data in text form describing item properties is available along with the rating data, which could be utilized to improve recommendation quality. In this paper, we propose a novel approach to improve standard CF based recommenders by utilizing latent Dirichlet allocation (LDA) to learn latent properties of items, expressed in terms of topic proportions, derived from their textual description. We infer user's topic preferences or user profile in the same latent space, based on her historical ratings. While computing similarity between users, we make use of a combined similarity measure involving rating overlap as well as similarity in the latent topic space. This approach alleviates sparsity problem as it allows calculation of similarity between users even if they have not rated any items in common. Our experiments on multiple public datasets indicate that the proposed hybrid approach significantly outperforms standard User Based and Item Based CF recommenders in terms of classification accuracy metrics such as precision, recall and F-measure.

23 citations


Journal Article•DOI•
TL;DR: The proposed signal representation improves the interactivity of dense point-based methods, making them appropriate for modeling the scene semantics and free-viewpoint 3DTV applications, and a ''selective'' warping technique is proposed that takes the advantage of temporal coherence to reduce the computational overhead.

23 citations


Proceedings Article•DOI•
07 Apr 2014
TL;DR: A novel learning based framework to extract articles from newspaper images using a Fixed-Point Model that uses contextual information and features of each block to learn the layout of newspaper images and attains a contraction mapping to assign a unique label to every block.
Abstract: This paper presents a novel learning based framework to extract articles from newspaper images using a Fixed-Point Model. The input to the system comprises blocks of text and graphics, obtained using standard image processing techniques. The fixed point model uses contextual information and features of each block to learn the layout of newspaper images and attains a contraction mapping to assign a unique label to every block. We use a hierarchical model which works in two stages. In the first stage, a semantic label (heading, sub-heading, text-blocks, image and caption) is assigned to each segmented block. The labels are then used as input to the next stage to group the related blocks into news articles. Experimental results show the applicability of our algorithm in newspaper labeling and article extraction.

17 citations


Journal Article•DOI•
TL;DR: A novel binary multiple kernel learning-based classification architecture for applications including characters/primitives and symbols including such problems for fast and efficient performance is demonstrated.
Abstract: The paper presents a novel framework for large class, binary pattern classification problem by learning-based combination of multiple features. In particular, class of binary patterns including characters/primitives and symbols has been considered in the scope of this work. We demonstrate novel binary multiple kernel learning-based classification architecture for applications including such problems for fast and efficient performance. The character/primitive classification problem primarily concentrates on Gujarati and Bangla character recognition from the analytical and experimental context. A novel feature representation scheme for symbols images is introduced containing the necessary elastic and non-elastic deformation invariance properties. The experimental efficacy of proposed framework for symbol classification has been demonstrated on two public data sets.

13 citations


Proceedings Article•DOI•
15 Dec 2014
TL;DR: A robust online signature based cryptosystem to hide the secret by binding it with invariant online signature templates that works well for all kinds of signatures and is independent of the number of zero crossing and high curvature points in the signature trajectory.
Abstract: Cryptography is the backbone for the security systems. The main challenge in use of the Cryptosystems is maintaining the confidentiality of the cryptographic key. A Cryptosystem which encrypts the data using biometric features improves the security of the data and overcomes the problems of key management and key confidentiality. Fuzzy Vault Scheme proposed by Juels and Sudan [1] binds the secret key and the biometric template, so that extraction of the secret without the biometric data is infeasible. Physical signature is a biometric that is widely accepted and is used for proving the authenticity of a person in legal documents, bank transactions etc. Electronic devices such as digital tablets capture azimuth, altitude and pressure along with x any y coordinates at fixed time interval. This paper describes a robust online signature based cryptosystem to hide the secret by binding it with invariant online signature templates. The invariant templates of the signature are derived from artificial neural network based classifier. The entire signature is divided into fixed number of time slices. Important features are extracted based on the consistency of the feature in the slices of the genuine signature. Binary back propagation based neural network for each feature, each subset of slices for a user is trained by a weighted back propagation algorithm. The decisions of these networks are combined using AdaBoost algorithm. The proposed scheme is highly robust as it works well for all kinds of signatures and is independent of the number of zero crossing and high curvature points in the signature trajectory.

11 citations


Journal Article•DOI•
TL;DR: A scheme for multiple feature based identity establishment using multi-kernel learning using genetic algorithm and the efficacy of the framework using individual and combination of features is demonstrated for Devanagari script input.

11 citations


Posted Content•
TL;DR: In this article, a hybrid approach is proposed to improve standard collaborative filtering algorithms by utilizing latent Dirichlet allocation (LDA) to learn latent properties of items, expressed in terms of topic proportions, derived from their textual description.
Abstract: Standard Collaborative Filtering (CF) algorithms make use of interactions between users and items in the form of implicit or explicit ratings alone for generating recommendations. Similarity among users or items is calculated purely based on rating overlap in this case,without considering explicit properties of users or items involved, limiting their applicability in domains with very sparse rating spaces. In many domains such as movies, news or electronic commerce recommenders, considerable contextual data in text form describing item properties is available along with the rating data, which could be utilized to improve recommendation this http URL this paper, we propose a novel approach to improve standard CF based recommenders by utilizing latent Dirichlet allocation (LDA) to learn latent properties of items, expressed in terms of topic proportions, derived from their textual description. We infer user's topic preferences or persona in the same latent space,based on her historical ratings. While computing similarity between users, we make use of a combined similarity measure involving rating overlap as well as similarity in the latent topic space. This approach alleviates sparsity problem as it allows calculation of similarity between users even if they have not rated any items in common. Our experiments on multiple public datasets indicate that the proposed hybrid approach significantly outperforms standard user Based and item Based CF recommenders in terms of classification accuracy metrics such as precision, recall and f-measure.

10 citations


Patent•
13 Jun 2014
TL;DR: In this article, a method and system for detection, classification and prediction of user behavior trends using correspondence analysis is disclosed, which reduces the n-dimensional feature space to lower dimensional space for easy processing, improved quality of emerging clusters and superior prediction accuracies.
Abstract: A method and system for detection, classification and prediction of user behavior trends using correspondence analysis is disclosed. The method and system reduces the n-dimensional feature space to lower dimensional space for easy processing, improved quality of emerging clusters and superior prediction accuracies. Further, the method applies the correspondence analysis so that each user is assigned with a new coordinate in the lower dimension which maintains a similarity, difference and the relationship between the variables. Once the correspondence analysis is completed, clustering or grouping of the coordinates based on the similar trends of the users is performed. Further, unlabeled cluster members are assigned class membership proportional to the labeled samples in the cluster. Finally, the method predicts the future actions of the users based on the past trends that are observed from the labeled clusters.

9 citations


Proceedings Article•DOI•
24 Aug 2014
TL;DR: This paper presents a novel low-cost hybrid Kinect-variety based content generation scheme for 3DTV displays and demonstrates that proposed robust integration provides guarantees on the completeness and consistency of the algorithm.
Abstract: This paper presents a novel low-cost hybrid Kinect-variety based content generation scheme for 3DTV displays. The integrated framework constructs an efficient consistent image-space parameterization of 3D scene structure using only sparse depth information of few reference scene points. Under full-perspective camera model, the enforced Euclidean constraints simplify the synthesis of high quality novel multiview content for distinct camera motions. The algorithm does not rely on complete precise scene geometry information, and are unaffected by scene complex geometric properties, unconstrained environmental variations and illumination conditions. It, therefore, performs fairly well under a wider set of operation condition where the 3D range sensors fail or reliability of depth-based algorithms are suspect. The robust integration of vision algorithm and visual sensing scheme complement each other's shortcomings. It opens new opportunities for envisioning vision-sensing applications in uncontrolled environments. We demonstrate that proposed robust integration provides guarantees on the completeness and consistency of the algorithm. This leads to improved reliability on an extensive set of experimental results.

7 citations


Proceedings Article•DOI•
11 Aug 2014
TL;DR: An unsupervised trend discovery approach that detects and correlates event patterns from videos temporally as well as spatially and utilizes geographic ontology (Geoontology) for identifying the various spatial patterns that exist corresponding to an event in a document.
Abstract: With increasing amount of information (video, text) being available today, it has become non-trivial to develop techniques to categorize documents into contextually meaningful classes. The information as available in the documents is composed of sequence of events termed as patterns. It is evident to know the important trends as observed from patterns that are emerging over a specific time period and space. For identifying the patterns, we must focus on semantic meaning of documents. Tracing such patterns in videos or texts manually is a time-consuming, cumbersome or an impossible task. So, in this paper we have devised an unsupervised trend discovery approach that detects and correlates event patterns from videos temporally as well as spatially. We begin by building our own document collection on the basis of contextual meaning of documents. This helps in associating an input video with another video or text documents on the basis of their semantic meaning. This approach helps in accumulating variety of information that is scattered over the web thus providing relatively complete information about the video. The highly correlated words are grouped in a topic using Latent Dirichlet Allocation (LDA). To identify topics an E-MOWL based ontology is used. This event ontology helps in discovering associations and relations between the various events. With this kind of representation, the users can infer different concepts as emerged over time. For identifying the various spatial patterns that exist corresponding to an event in a document, we have utilized geographic ontology (Geoontology). We establish validity of our approach using experimental results.

Book Chapter•DOI•
01 Nov 2014
TL;DR: An online method wherein the gait space of individuals are created as they are tracked, and person identification is carried out on-the-fly based on the uniqueness of gait, using Grassmann discriminant analysis.
Abstract: In this paper, we propose a novel online multi-camera framework for person identification based on gait recognition using Grassmann Discriminant Analysis. We propose an online method wherein the gait space of individuals are created as they are tracked. The gait space is view invariant and the recognition process is carried out in a distributed manner. We assume that only a fixed known set of people are allowed to enter the area under observation. During the training phase, multi-view data of each individual is collected from each camera in the network and their global gait space is created and stored. During the test phase, as an unknown individual is observed by the network of cameras, simultaneously or sequentially, his/her gait space is created. Grassmann manifold theory is applied for classifying the individual. The gait space of an individual is a point on a Grassmann manifold and distance between two gait spaces is the same as distance between two points on a Grassmann manifold. Person identification is, therefore, carried out on-the-fly based on the uniqueness of gait, using Grassmann discriminant analysis.

Proceedings Article•DOI•
14 Dec 2014
TL;DR: This paper is the first paper to report scene text recognition using deep belief networks and achieves improved recognition results on Chars74K English, Kannada and SVT-CHAR dataset in comparison to the state-of-art algorithms.
Abstract: This paper focuses on the recognition and analysis of text embedded in scene images using Deep learning. The proposed approach uses deep learning architectures for automated higher order feature extraction, thereby improving classification accuracies in comparison to handcrafted features used traditionally. Exhaustive experiments have been performed with Deep Belief Networks and Convolutional Deep Neural Networks with varied training algorithms like Contrastive Divergence, De-noising Score Matching and supervised learning algorithms such as logistic regression and Multi-layer perceptron. These algorithms have been validated on 4 standard datasets: Chars 74K English, Chars 74K Kannada, ICDAR 2003 Robust OCR dataset and SVT-CHAR dataset. The proposed network achieves improved recognition results on Chars74K English, Kannada and SVT-CHAR dataset in comparison to the state-of-art algorithms. For ICDAR 2003 dataset, the proposed network is marginally worse in comparison to Deep Convolutional networks. Although deep belief networks have been considerably used for several applications, according to the knowledge of the authors, this is the first paper to report scene text recognition using deep belief networks.

Book Chapter•DOI•
01 Nov 2014
TL;DR: By separating the text layer from the non-text layer using the proposed cumulants based Blind Source Extraction method, and store them in a digital library with their corresponding historic information, these images are retrieved from database using image search based on Bag-of-Words(BoW) method.
Abstract: In this paper we have presented a technique for enhancement and retrieval of historic inscription images. Inscription images in general have no distinction between the text layer and background layer due to absence of color difference and possess highly correlated signals and noise; pertaining to which retrieval of such images using search based on feature matching returns inaccurate results. Hence, there is a need to first enhance the readability and then binarize the images to create a digital database for retrieval. Our technique provides a suitable method for the same, by separating the text layer from the non-text layer using the proposed cumulants based Blind Source Extraction(BSE) method, and store them in a digital library with their corresponding historic information. These images are retrieved from database using image search based on Bag-of-Words(BoW) method.

Proceedings Article•DOI•
11 Aug 2014
TL;DR: A novel unified model of bio-inspired mechanism for modeling spatially situated pervasive computing scenarios and exploring emergence of self-adaptive behaviors in individuals within urban super organism is proposed and results demonstrate efficacy of the proposed approach.
Abstract: Modern urban areas constitute of rich mix of mobile user devices enabled with sensors and capability of being continuously connected to existing social networking infrastructure. In future the cities can act as single sociotechnical super-organism with capability to generate large scale adaptive urban dynamics expressing various forms of urban intelligence. Such an advanced situation-aware super organism will be difficult to manage with traditional approaches and there is a need for innovative mechanisms for coordination in such open, unpredictable and dynamic pervasive environment. In this paper we propose a novel unified model of bio-inspired mechanism for modeling spatially situated pervasive computing scenarios and exploring emergence of self-adaptive behaviors in individuals within urban super organism. Simulation results demonstrate efficacy of the proposed approach.

Proceedings Article•DOI•
14 Jul 2014
TL;DR: This paper proposes a model for tagging of multimedia data on the basis of contextual meaning which has practical applicability in the sense that whenever a new video is uploaded on some media sharing site, the context and content information gets attached automatically to a video.
Abstract: To exhibit multi-modal information and to facilitate people in finding multimedia resources, tagging plays a significant role. Various public events like protests and demonstrations are always consequences of break out of some public outrage resulting from prolonged exploitation and harassment. This outrage can be seen in news footage, blogs, text news and other web data. So, aggregating this variety of data from heterogeneous sources is a prerequisite step for tagging multimedia data with appropriate content. Since content has no meaning without a context, a video should be tagged with its relevant context and content information to assist user in multimedia retrieval. This paper proposes a model for tagging of multimedia data on the basis of contextual meaning. Since context is knowledge based, it has to be guided and learned by ontology which will help fragmented information to be represented in a more meaningful way. Our tagging approach is novel and has practical applicability in the sense that whenever a new video is uploaded on some media sharing site, the context and content information gets attached automatically to a video. Thus, providing relatively complete information associated with the video.

Proceedings Article•DOI•
24 Aug 2014
TL;DR: A novel image indexing method based on multiple kernel learning, which combines multiple features by combinatorial optimization of time and search complexity is presented, which is subsequently solved in Genetic algorithm based solution framework for obtaining the pareto-optimal solutions.
Abstract: Approximate nearest neighbor (ANN) search provides computationally viable option for retrieval from large document collection. Hashing based techniques are widely regarded as most efficient methods for ANN based retrieval. It has been established that by combination of multiple features in a multiple kernel learning setup can significantly improve the effectiveness of hash codes. The paper presents a novel image indexing method based on multiple kernel learning, which combines multiple features by combinatorial optimization of time and search complexity. The framework is built upon distance based hashing, where the existing kernel distance based hashing formulation adopts linear combination of kernels in tune with optimum search accuracy. In this direction, a novel multiobjective formulation for optimizing the search time as well as accuracy is proposed which is subsequently solved in Genetic algorithm based solution framework for obtaining the pareto-optimal solutions. We have performed extensive experimental evaluation of proposed concepts on different datasets showing improvement in comparison with the existing methods.

Proceedings Article•DOI•
24 Aug 2014
TL;DR: An unsupervised method is proposed that jointly models "social" interaction and content metadata in YouTube to discover user-communities and the nature of topics beings discussed in these communities.
Abstract: Most of the popular multimedia sharing web-sites such as YouTube, Flickr etc not only allow users to author and upload content but also facilitate "social" networking amongst users. These social interactions can be in the form of - user-to-user interactions i.e. adding existing users to friend or contact list or user-to-content interactions : commenting on a video or picture, marking a picture/video as "favorite", subscribing to a user created "channel" etc. Analyzing these social interactions jointly with the content metadata (such as the description of the video, keywords associated with the image/video etc) can reveal interesting insights about user activity on these social media platforms. In this paper, we propose an unsupervised method that jointly models "social" interaction and content metadata in YouTube to discover user-communities and the nature of topics beings discussed in these communities. We report the effectiveness of the proposed method on real-world dataset.

Proceedings Article•DOI•
24 Aug 2014
TL;DR: This paper investigates the use of social tag information for visual diversification of image search results in Flickr by model search result diversity as an instance of the p-dispersion problem, and demonstrates the effectiveness of the proposed method on a real-world data set.
Abstract: Unlike traditional multimedia content, content generated on social media platforms such as YouTube, Flickr etc are usually annotated with rich set of social tags such as keywords, textual description, category information, author's profile etc. In this paper we investigate the use of such social tag information for visual diversification of image search results in Flickr. We model search result diversity as an instance of the p-dispersion problem where the objective is to choose p out of n given points, so that the minimum distance between any pair of chosen points is maximized. The distance metric used in the p-dispersion problem is learnt from the data itself by combining candidate similarity measures as defined on the social tags. We demonstrate the effectiveness of our proposed method on a real-world data set.

Proceedings Article•DOI•
14 Dec 2014
TL;DR: The proposed technique provides a suitable method to separate the text layer from the historic inscription images by considering the problem as blind source separation which aims to calculate the independent components from a linear mixture of source signals, by maximizing a contrast function based on higher order cumulants.
Abstract: In this paper a novel method to address the problem of enhancement and binarization of historic inscription images is presented. Inscription images in general have no distinction between the text layer and background layer due to absence of color difference and possess highly correlated signals and noise. The proposed technique provides a suitable method to separate the text layer from the historic inscription images by considering the problem as blind source separation which aims to calculate the independent components from a linear mixture of source signals, by maximizing a contrast function based on higher order cumulants. Further, the results are compared with existing ICA based techniques like NGFICA and Fast-ICA.

Proceedings Article•DOI•
14 Dec 2014
TL;DR: A novel content-aware compositing technique that faithfully preserves the salient structures of cloned source and target content, and avoid major conflicting stereopsis cues to maintain a pleasant 3D illusion altogether is presented.
Abstract: This paper addresses the challenges in creating good quality composite 3D contents for 3DTV applications and post-production visual-effects. We present a novel content-aware compositing technique that faithfully preserves the salient structures of cloned source and target content, and avoid major conflicting stereopsis cues to maintain a pleasant 3D illusion altogether. Our approach typically learns the appearance layouts of both source and target scene elements. The system extracts object's significance prior maps using classified labels and derive geometric transforms to compensate the 3D perspective mismatches between source and target images using a novel depth image-based rendering procedure. For seamless cloning, we apply a new depth-consistent interpolant technique which utilizes the classified likelihood confidences in weighting the salient or low-significant regions and re-estimating the plausible depth values of the cloned region in accordance with target 3D structure. Further, we adopt a novel content-preserving local warping scheme to reduce the apparent distortions in object shape, size and perspective. Finally, we propose a content-aware mean value cloning technique that seamlessly merges the warped cloned patches with the geometric-appearance context of new background and homogenize vague boundaries with the aid of an object salient map to remove the smudging effects. The overall process is formulated as an energy minimization problem and optimally regularized for large warps, vertical disparities, and stereo baseline changes. Plausible results are demonstrated to show the effectiveness of our approach.