scispace - formally typeset
Search or ask a question

Showing papers by "Santanu Chaudhury published in 2013"


Journal ArticleDOI
TL;DR: This paper introduces a novel, macroblock level visual saliency guided video compression algorithm, modelled as a 2 step process viz. salient region detection and frame foveation, and proposes a novel video compression architecture incorporating saliency, to save tremendous amount of computation.
Abstract: Recently Saliency maps from input images are used to detect interesting regions in images/videos and focus on processing these salient regions. This paper introduces a novel, macroblock level visual saliency guided video compression algorithm. This is modelled as a 2 step process viz. salient region detection and frame foveation. Visual saliency is modelled as a combination of low level, as well as high level features which become important at the higher-level visual cortex. A relevance vector machine is trained over 3 dimensional feature vectors pertaining to global, local and rarity measures of conspicuity, to yield probabilistic values which form the saliency map. These saliency values are used for non-uniform bit-allocation over video frames. To achieve these goals, we also propose a novel video compression architecture, incorporating saliency, to save tremendous amount of computation. This architecture is based on thresholding of mutual information between successive frames for flagging frames requiring re-computation of saliency, and use of motion vectors for propagation of saliency values.

46 citations


Journal ArticleDOI
TL;DR: A new perceptual modeling technique for reasoning with media properties observed in multimedia instances and the latent concepts is proposed, and a probabilistic reasoning scheme for belief propagation across domain concepts through observation of media properties is introduced.
Abstract: Several multimedia applications need to reason with concepts and their media properties in specific domain contexts. Media properties of concepts exhibit some unique characteristics that cannot be dealt with conceptual modeling schemes followed in the existing ontology representation and reasoning schemes. We have proposed a new perceptual modeling technique for reasoning with media properties observed in multimedia instances and the latent concepts. Our knowledge representation scheme uses a causal model of the world where concepts manifest in media properties with uncertainties. We introduce a probabilistic reasoning scheme for belief propagation across domain concepts through observation of media properties. In order to support the perceptual modeling and reasoning paradigm, we propose a new ontology language, Multimedia Web Ontology Language (MOWL). Our primary contribution in this article is to establish the need for the new ontology language and to introduce the semantics of its novel language constructs. We establish the generality of our approach with two disperate knowledge-intensive applications involving reasoning with media properties of concepts.

27 citations


Patent
13 Sep 2013
TL;DR: In this article, a system and method for identifying erroneous videos and assessing video quality is provided, where feature vectors are generated corresponding to a plurality of frames associated with the one or more videos.
Abstract: System and method for identifying erroneous videos and assessing video quality is provided. Feature vectors are generated corresponding to a plurality of frames associated with the one or more videos. The feature vectors are subsequently subjected to anomaly detection to obtain first and second normalized path lengths and normalized anomaly measures. The first and second normalized path lengths and normalized anomaly measures are provided to a regression model to identify the erroneous video.

17 citations


Journal ArticleDOI
TL;DR: The exhaustive experimental evaluation of the proposed framework on a collection of documents belonging to Devanagari, Bengali and English scripts has yielded encouraging results.
Abstract: In this paper, we propose a novel feature representation for binary patterns by exploiting the object shape information. Initial evaluation of the representation is performed for Bengali and Gujarati script character classification. The extension of the representation for word images is presented subsequently. The proposed feature representation in combination with distance-based hashing is applied for defining novel word image-based document image indexing and retrieval framework. The concept of hierarchical hashing is utilized to reduce the retrieval time complexity. In addition, with the objective of reduction in the size of hashing data structure, the concept of multi-probe hashing is extended for binary mapping functions. The exhaustive experimental evaluation of the proposed framework on a collection of documents belonging to Devanagari, Bengali and English scripts has yielded encouraging results.

17 citations


Proceedings ArticleDOI
01 Dec 2013
TL;DR: A distributed camera and processing based face detection and recognition system which can generate information for finding spatiotemporal movement pattern of individuals over a large monitored space is proposed.
Abstract: Large space with many cameras require huge storage and computational power to process these data for surveillance applications. In this paper we propose a distributed camera and processing based face detection and recognition system which can generate information for finding spatiotemporal movement pattern of individuals over a large monitored space. The system is built upon Hadoop Distributed File System using map reduce programming model. A novel key generation scheme using distance based hashing technique has been used for distribution of the face matching task. Experimental results have established effectiveness of the technique.

11 citations


Journal ArticleDOI
TL;DR: The proposed method improves word and character recognition accuracies of the OCR system by 65.3% and 54.3%, respectively, and is a suitable method for separating signals from a mixture of highly correlated signals.
Abstract: This paper addresses the problems encountered during digitization and preservation of inscriptions such as perspective distortion and minimal distinction between foreground and background. In general inscriptions possess neither standard size and shape nor colour difference between the foreground and background. Hence the existing methods like variance based extraction and Fast ICA based analysis fail to extract text from these inscription images. Natural gradient flexible ICA (NGFICA) is a suitable method for separating signals from a mixture of highly correlated signals, as it minimizes the dependency among the signals by considering the slope of the signal at each point. We propose an NGFICA based enhancement of inscription images. The proposed method improves word and character recognition accuracies of the OCR system by 65.3% (from 10.1% to 75.4%) and 54.3% (from 32.4% to 86.7%), respectively.

11 citations


Proceedings ArticleDOI
TL;DR: The algorithm enables online pellet pose determination and pick-up using KUKA KR5 robot and a multiple-view based pose recognition system is proposed for occluded pellets.
Abstract: Pose estimation of cylindrical pellet using a single camera-in-hand configuration of a robot is discussed in this paper. Approaches to estimate pose in both isolated and an occluded environment is discussed. The pellet contour from the segmented image of the scene was compared with contours in the database to ascertain the matching orientation. For occluded pellets, a multiple-view based pose recognition system is proposed. Later, the estimated pose was communicated to the robot to enable it to pick-up the pellet. This has been experimentally implemented for cylindrical pellets and the performance is discussed. The algorithm enables online pellet pose determination and pick-up using KUKA KR5 robot.

9 citations


Proceedings ArticleDOI
28 Mar 2013
TL;DR: The proposed method improves word and character recognition accuracies of the OCR system by 65.3% and 54.3%, respectively, and is a suitable method for separating signals from a mixture of highly correlated signals.
Abstract: This paper addresses the problems encountered during digitization and preservation of inscriptions such as perspective distortion and minimal distinction between foreground and background. In general inscriptions neither possess standard size and shape nor colour difference between the foreground and background. Hence the existing methods like variance based extraction and Fast-ICA based analysis fail to extract text from these inscription images. Natural gradient Flexible ICA (NGFICA) is a suitable method for separating signals from a mixture of highly correlated signals, as it minimizes the dependency among the signals by considering the slope of the signal at each point. We propose an NGFICA based enhancement of inscription images. The proposed method improves word and character recognition accuracies of the OCR system by 65.3% (from 10.1% to 75.4%) and 54.3% (from 32.4% to 86.7%) respectively.

8 citations


Proceedings ArticleDOI
01 Oct 2013
TL;DR: A video based adaptive traffic signaling scheme for reducing waiting period of vehicles at road junctions without detecting or tracking vehicles is proposed and found to be a much faster and effective control strategy.
Abstract: The ability to exert real time, adaptive control, of the transportation process is the core of an intelligent traffic system. We propose a video based adaptive traffic signaling scheme for reducing waiting period of vehicles at road junctions without detecting or tracking vehicles. The traffic signal timing parameters at a given intersection are adjusted automatically as functions of the local traffic conditions. The video sequences recorded at junctions are used for generating Spatial Interest Points (SIP) and Spatio-Temporal Interest Points (STIP). The traffic congestion at the junction is estimated using SIP and STIP. The decision rules are based on a definitive analogy between road traffic and computer data traffic wherein road vehicles are compared with data packets on the network. The system is similar in approach to the technique of Weighted Round Robin (WRR) queuing, a scheduling discipline used in data communication networks. Local traffic information is used to adjust the phase split keeping the cycle time constant. Two methods have been proposed. The first method, Optimal Weight Calculator (OWC), minimizes traffic at an intersection by determining the optimal phase splits or weights. The second method, Fair Weight Calculator (FWC), calculates weights relative to the road with minimum traffic to bring more fairness. After applying the respective algorithms mathematically on varying traffic conditions, OWC was found to be more equitable in the allocation of green time which is suitable for highly weight-sensitive junctions. For traffic with road priorities, FWC was found to be a much faster and effective control strategy.

8 citations


Book ChapterDOI
09 Sep 2013
TL;DR: A multimedia ontology encoded in the Multimedia Web Ontology Language (MOWL) is used to illustrate this paradigm by correlating the digital artefacts with their history as well their living context in today's world.
Abstract: Heritage preservation requires preserving the tangibles (monuments, sculpture, coinage, etc) and the intangibles (history, traditions, stories, dance, etc). Besides these artefacts, there is a huge amount of background knowledge that correlates all these resources and establishes their context. In this work, we present a new paradigm for heritage preservation — 'an Intellectual Journey into the past', which is more advanced than physical explorations of heritage sites and virtual explorations of monuments and museums. This paradigm proposes an experiential expedition into a historical era by using an ontology to inter-link the digital heritage artefacts with their background knowledge. A multimedia ontology encoded in the Multimedia Web Ontology Language (MOWL) is used to illustrate this paradigm by correlating the digital artefacts with their history as well their living context in today's world. The user experience of this paradigm involves a virtual traversal of a heritage site, with an ontology guided navigation through space and time and a dynamic display of different kinds of media.

7 citations


Journal ArticleDOI
TL;DR: A self-organizing sensor network that is inspired from real-life systems for sampling a region in an energy-efficient manner is proposed and results indicate that the model is more effective than a conventional model with a fixed rate sampling.
Abstract: Nature offers several examples of self-organizing systems that automatically adjust to changing conditions without adversely affecting the system goals. We propose a self-organizing sensor network that is inspired from real-life systems for sampling a region in an energy-efficient manner. Mobile nodes in our network execute certain rules by processing local information. These rules enable the nodes to divide the sampling task in a manner such that the nodes self-organize themselves to reduce the total power consumed and improve the accuracy with which the phenomena are sampled. The digital hormone-based model that encapsulates these rules, provides a theoretical framework for examining this class of systems. This model has been simulated and implemented on cricket motes. Our results indicate that the model is more effective than a conventional model with a fixed rate sampling.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: A framework for “User's Personalized Workspace” is presented by augmenting the physical paper and digital document by synchronizing aphysical paper and those of digital version in seamless fashion from a user's perspective.
Abstract: In this paper, we are presenting a framework for “User's Personalized Workspace” by augmenting the physical paper and digital document. The paper based interactions are seamlessly integrated with digital document based interactions for reading as a activity. For instance when user is involved in reading activity, writing becomes complimentary. In a academic system, paper based presentation mode has facilitated such exercises. Despite rendering the annotation on digital document and store it onto the database, the content of the paper encircled or underlined is used to hyperlink the document. Synchronizing a physical paper and those of digital version in seamless fashion from a user's perspective is the main objective of this work. We have also compared the existing systems which focus on one activity or the other in our proposed system.

Proceedings ArticleDOI
25 Aug 2013
TL;DR: A novel script independent CRF based inferencing framework for character recognition that considers a word as a sequence of connected components using multiple hypothesis tree to form the correct sequence of alphabets.
Abstract: The paper presents a novel script independent CRF based inferencing framework for character recognition. In this framework we consider a word as a sequence of connected components. The connected components are obtained using different binarization schemes and different possible sequences are considered using a tree structure. CRF uses contextual information to learn perfect primitive sequences and finds the most probable labeling of the sequence of primitives using multiple hypothesis tree to form the correct sequence of alphabets. This approach is particularly suitable for degraded printed document images as it considers multiple alternate hypotheses for correct decision.

01 Jan 2013
TL;DR: The need for a fundamentally different approach for a representation and reasoning scheme with ontologies for semantic interpretation of multimedia contents is established and a new ontology representation scheme is introduced that enables reasoning with uncertain media properties of concepts in a domain context.
Abstract: This paper provides an overview of the contents of a tutorial on the subject by one of the authors at WI-2013 Conference. The domination of multimedia contents on the web in recent times has motivated research in their semantic analysis. This tutorial aims to provide a critical overview of the technology, and focuses on application of ontologies for multimedia applications. It establishes the need for a fundamentally different approach for a representation and reasoning scheme with ontologies for semantic interpretation of multimedia contents. It introduces a new ontology representation scheme that enables reasoning with uncertain media properties of concepts in a domain context and a language “Multimedia Web Ontology Language” (MOWL) to support the representation scheme. We discuss the approaches to semantic modeling and ontology learning with specific reference to the probabilistic framework of MOWL. We present a couple of illustrative application examples. Further, we discuss the issues of distributed multimedia information systems and how the new ontology representation scheme can create semantic interoperability across heterogeneous multimedia data sources.

Proceedings ArticleDOI
25 Aug 2013
TL;DR: A novel multi-modal document indexing framework for retrieval of old and degraded text documents by combining OCR'ed text and image based representation using learning is proposed.
Abstract: The paper proposes a novel multi-modal document image retrieval framework by exploiting the information of text and graphics regions. The framework applies multiple kernel learning based hashing formulation for generation of composite document indexes using different modalities. The existing multimedia management methods for imaged text documents have not addressed the requirement of old and degraded documents. In the subsequent contribution, we propose novel multi-modal document indexing framework for retrieval of old and degraded text documents by combining OCR'ed text and image based representation using learning. The evaluation of proposed concepts is demonstrated on sampled magazine cover pages, and documents of Devanagari script.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: The combination of 360°-rotational symmetry and camera center is used to identify two orthogonal planes called axis plane and Orthogonal axis plane, which are the basis for the proposed reconstruction framework and virtual camera configuration.
Abstract: First, we describe how 360°-rotational symmetry may be used for three dimensional reconstruction of repeated cylinders from a single perspective image. In our experiments, we consider translational and affine repetition of cylinders with vertical and random orientations. Later, we create a virtual camera configuration for retrieving pose and location of repeated cylinders. The combination of 360°-rotational symmetry and camera center is used to identify two orthogonal planes called axis plane and orthogonal axis plane. These two planes are the basis for the proposed reconstruction framework and virtual camera configuration. Furthermore, we discuss possible extension of our method in vision tasks based on motion analysis.

Proceedings ArticleDOI
25 Aug 2013
TL;DR: An approach for selecting best discriminative primitives for writer recognition is presented and a hybrid system by combining both writer recognition and handwriting recognition for improved accuracy is proposed.
Abstract: Writer recognition based on peculiarity of hand-writing is an important aspect of any forensic analysis. We present an approach for selecting best discriminative primitives for writer recognition. After selecting the primitives we also propose a hybrid system by combining both writer recognition and handwriting recognition for improved accuracy. We have also validated the performance of selected primitives on publically available dataset. We have performed this study on the Devanagri script. Experimental results verified the effectiveness of the proposed franework.

Proceedings ArticleDOI
28 Mar 2013
TL;DR: An adaptive recommendation model is discussed about which overcomes various deficiencies associated with existing solutions and recommends suitable recharge packs to subscribers based on their usage history and affordability.
Abstract: In today's competitive market, mobile service providers are very keen on improving the customer satisfaction by providing personalized services. Recommending recharge packs to the subscribers that suits their personal profile is an important such personalized service. But a solution to this problem is not that simple as it requires careful analysis of the subscribers' usage behavior and involves very large volume of data generated by the subscribers' frequent interaction with the telecom network. Also, this solution needs to ensure a fine balance between customer satisfaction and profitability of service providers. This paper discusses about an adaptive recommendation model which overcomes various deficiencies associated with existing solutions. The model recommends suitable recharge packs to subscribers based on their usage history and affordability. Further, it accommodates a configurable fairness parameter that ensures a balance between the profitability factor, conversion probability and relevance of the recommendations. Due to the sheer volume of the data involved, the model is implemented using a distributed framework. The validity of the model is evaluated on the basis of statistical properties and conversion factor.

Proceedings ArticleDOI
17 Nov 2013
TL;DR: The paper proposes a novel approach of generating online adaptive response in assisting search-and-rescue operations using situation awareness built from real-time heterogeneous spatio-temporal data streams.
Abstract: Cyber physical space is potentially hosting innumerable spatio-temporal data streams due to increasing use of social networking platforms as real-time information dissemination system and world-wide deployment of sensors for continuous monitoring of physical phenomena. In this paper we address the problem of how cyber physical space can be used for sensing and responding to global calamities such as earthquake. The paper proposes a novel approach of generating online adaptive response in assisting search-and-rescue operations using situation awareness built from real-time heterogeneous spatio-temporal data streams. Online adaptive response is achieved by using agent-based cooperative task sharing and modeling agent decision making as self-organized emergent behavior based on concepts of complex adaptive system. An implemented simulation platform use concepts of situation modeling, domain task network, contract net protocol based negotiation and complex adaptive system to generate adaptive plans. Preliminary simulation results are promising as we have been able to demonstrate a repertoire of self-organized emergent behaviors.

Proceedings ArticleDOI
24 Aug 2013
TL;DR: A novel framework for learning optimal parameters for text graphic separation in the presence of complex layouts of Indian newspaper is proposed.
Abstract: Digitization of newspaper article is important for registering historical events. Layout analysis of Indian newspaper is a challenging task due to the presence of different font size, font styles and random placement of text and non-text regions. In this paper we propose a novel framework for learning optimal parameters for text graphic separation in the presence of complex layouts. The learning problem has been formulated as an optimization problem using EM algorithm to learn optimal parameters depending on the nature of the document content.

04 Jul 2013
TL;DR: Advances in Robotics (AIR) series is to create a forum to present and exchange new ideas by the researchers and developers of robotic and allied systems from India and abroad.
Abstract: Robotics Society of India (RSI) was established in July 2011. From this year (2013), we initiated a new conference series called Advances in Robotics (AIR) to be held on a regular basis. The intention of the AIR series is to create a forum to present and exchange new ideas by the researchers and developers of robotic and allied systems from India and abroad. We were proud to host the first conference in the DRDO's R&DE (Engrs), Pune, India during July 4-6, 2013.

Proceedings ArticleDOI
01 Feb 2013
TL;DR: A dynamic texture based compression scheme is devised for videos for the analysis of motion patterns in a video on the basis of optic flow data and then clusters of different motion patterns are created.
Abstract: In this paper, a dynamic texture based compression scheme is devised for videos. Correspondence analysis is explored for the analysis of motion patterns in a video on the basis of optic flow data and then clusters of different motion patterns are created. Dynamic textures tend to disobey Horn and Schunck's assumption of brightness constancy and hence, optic flow residual is used as an indicator of their presence. The correspondence analysis results and optic flow residual are combined together in a new segmentation scheme. The optic flow data tracks the motion of groups of pixels to generate the flow lines. These flow lines are used in a synthesis scheme for creating an illusion of continuously flowing texture. The integration of this synthesis scheme in the compression format gives us considerable bit stream reduction corresponding to the dynamic texture regions. The scheme is integrated with standard H.264/AVC model, texture regions that do not fall under the above precinct of dynamic textures and non-texture regions are encoded-decoded by the H.264 model directly.

Proceedings ArticleDOI
25 Aug 2013
TL;DR: A framework for finding the best possible augmentation of a classifier for the character recognition problem using minimum number of crowd labeled samples and inherently rejects the noisy data and tries to accept a subset of correctly labeled data to maximize the classifier performance.
Abstract: Active learning and crowd sourcing are becoming increasingly popular in the machine learning community for fast and cost effective generation of labels for large volumes of data. However, such labels may be noisy. So, it becomes important to ignore the noisy labels for building of a good classifier. We propose a framework for finding the best possible augmentation of a classifier for the character recognition problem using minimum number of crowd labeled samples. The approach inherently rejects the noisy data and tries to accept a subset of correctly labeled data to maximize the classifier performance.

Proceedings ArticleDOI
02 Dec 2013
TL;DR: A novel parameterized variety-based 3D exploration model is presented to comprehend the sparse unstructured collection of photographs, and automatically plan virtual 3D tours of the world's landmarks through interesting viewpoints without explicit 3D reconstruction.
Abstract: This paper presents a novel parameterized variety-based 3D exploration model to comprehend the sparse unstructured collection of photographs, and automatically plan virtual 3D tours of the world's landmarks through interesting viewpoints without explicit 3D reconstruction. The proposed system analyzes the collection of unstructured but related image data containing the same location or environment to create a parameterized scene graph: a data structure that conveys spatial relations and enable smooth virtual navigation between photos. A novel statistical-heuristic criteria is evolved exploiting the scene spatial layout and appearance to automatically identify best available portals between photographs. Once well connected, the graph is parameterized and consistently rendered choosing visually compelling 3D transition paths, maintaining a pleasing essence of parallax. The system's ability is demonstrated on several casually captured personal photo collections of heritage sites and imagery gathered from “Flickr” data.

Proceedings ArticleDOI
29 Jun 2013
TL;DR: This paper presents a novel image variety-Based approach that elegantly models the space of a broad class of perspective and non-perspective stereo varieties within a single, unified framework and provides an effective tool for montaging, indexing and virtual navigation.
Abstract: This paper presents a novel image variety-Based approach that elegantly models the space of a broad class of perspective and non-perspective stereo varieties within a single, unified framework. The basic concept of parameterized variety presented earlier by Genc and Ponce [1] is extended to represent the nonlinear space of images. An efficient algebraic framework is constructed to parameterize the variety associated with full perspective cameras. The algorithm seeks the manifolds that constrain this space of six-dimensional variety to generate compelling multi-perspective 3D effects from arbitrary virtual viewpoints. Combining geometric space of multiple uncalibrated perspective views with appearance space in a globally optimized way leads to numerous potential applications, especially in content creation for multi-perspective 3DTV. The proposed approach works for uncalibrated static/dynamic scenes, containing parallax and unstructured object motion. It even seamlessly deals with images or video sequences that do not share a common origin, thus provides an effective tool for montaging, indexing and virtual navigation.

Proceedings ArticleDOI
02 Dec 2013
TL;DR: An architecture of a smart space built with robots and distributed smart cameras connected via a network using vision as the basic sensing mechanism and a MAP based object identification scheme which works on Grassmannian manifold is presented.
Abstract: This paper presents an architecture of a smart space built with robots and distributed smart cameras connected via a network. We present a framework for tracking and recording presence of objects over a large space using vision as the basic sensing mechanism. A MAP based object identification scheme which works on Grassmannian manifold has been presented. Experimental results establish validity of the approach.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: This paper is aimed at exploring the potential of using discriminatory primitives containing words for the task of detecting skilled forgeries, and considers handwritten Devanagri documents for this work.
Abstract: This paper is aimed at exploring the potential of using discriminatory primitives containing words for the task of detecting skilled forgeries We consider handwritten Devanagri documents for this work We have obtained experimental handwriting data from subjects who have contributed handwriting samples in their natural handwriting Other authors are asked to imitate the writing style of the subjects to produce a skilled forgery sample Most of the literature dealing with writer recognition focus on signatures and very few reports have addressed the problem of detecting forgeries for handwritten Indian scripts We also use multiple words based classification for the targeted task of forgery detection Our experiments show encouraging results

Proceedings ArticleDOI
01 Dec 2013
TL;DR: A framework for retrieving metric information for repeated objects from single perspective image based on relative affine structure along X, Y and Z axes and the possible extension of this framework for motion analysis - structure from motion and motion segmentation is proposed.
Abstract: We propose a framework for retrieving metric information for repeated objects from single perspective image. Relative affine structure, which is an invariant, is directly proportional to the Euclidean distance of a three dimensional point from a reference plane. The proposed method is based on this fundamental concept. The first object undergoes 4 × 4 transformation and results in a repeated object. We represent this transformation in terms of three relative affine structures along X, Y and Z axes. Additionally, we propose the possible extension of this framework for motion analysis - structure from motion and motion segmentation.

Book ChapterDOI
10 Dec 2013
TL;DR: An optimized learning method for large feature-sets using AdaBoost to produce hardware-efficient boosted decision stumps and a method for training decisionStumps to construct the ensemble is proposed.
Abstract: This paper proposes an optimized learning method for large feature-sets using AdaBoost to produce hardware-efficient boosted decision stumps. The paper also proposes a method for training decision stumps to construct the ensemble. AdaBoost sequentially searches for the best weak classifier in the pool and adds it to the ensemble, using weighted training samples. In the proposed method, Particle Swarm Optimization quickens the selection of decision stumps. It is shown experimentally that the optimized method is more than 60% faster than the exhaustive search method.

Book ChapterDOI
TL;DR: Novel formulation of multiple kernel learning in hashing for multimedia indexing using genetic algorithm based framework that learns combination of multiple features/ modalities for defining composite document indices in genetic algorithmbased framework is presented.
Abstract: In this paper, we explore the use of machine learning for multimedia indexing and retrieval involving single/multiple features. Indexing of large image collection has been well researched problem. However, machine learning for combination of features in image indexing and retrieval framework is not explored. In this context, the paper presents novel formulation of multiple kernel learning in hashing for multimedia indexing. The framework learns combination of multiple features/ modalities for defining composite document indices in genetic algorithm based framework. We have demonstrated the evaluation of framework on dataset of handwritten digit images. Subsequently, the utility of the framework is explored for development for multi-modal retrieval of document images.