scispace - formally typeset
Search or ask a question

Showing papers by "Santanu Chaudhury published in 2011"


Journal ArticleDOI
TL;DR: The efficacy of the ontology-based approach is demonstrated by constructing an ontology for the cultural heritage domain of Indian classical dance, and a browsing application is developed for semantic access to the heritage collection of Indian dance videos.
Abstract: Preservation of intangible cultural heritage, such as music and dance, requires encoding of background knowledge together with digitized records of the performances. We present an ontology-based approach for designing a cultural heritage repository for that purpose. Since dance and music are recorded in multimedia format, we use Multimedia Web Ontology Language (MOWL) to encode the domain knowledge. We propose an architectural framework that includes a method to construct the ontology with a labeled set of training data and use of the ontology to automatically annotate new instances of digital heritage artifacts. The annotations enable creation of a semantic navigation environment in a cultural heritage repository. We have demonstrated the efficacy of our approach by constructing an ontology for the cultural heritage domain of Indian classical dance, and have developed a browsing application for semantic access to the heritage collection of Indian dance videos.

66 citations


Patent
30 Dec 2011
TL;DR: In this paper, a video compression framework based on parametric object and background compression is proposed, where an object is detected and frames are segmented into regions corresponding to the foreground object and the background.
Abstract: A video compression framework based on parametric object and background compression is proposed. At the encoder, an object is detected and frames are segmented into regions corresponding to the foreground object and the background. The encoder generates object motion and appearance parameters. The motion or warping parameters may include at least two parameters for object translation; two parameters for object scaling in two primary axes and one object orientation parameter indicating a rotation of the object. Particle filtering may be employed to generate the object motion parameters. The proposed methodology is the formalization of the concept and usability for perceptual quality scalability layer for Region(s) of Interest. A coded video sequence format is proposed which aims at “network friendly” video representation supporting appearance and generalized motion of object(s).

60 citations


Proceedings ArticleDOI
17 Sep 2011
TL;DR: The project is an attempt to implement an integrated platform for OCR of different Indian languages and currently is being enhanced for handling the space and time constraints, achieving higher recognition accuracies and adding new functionalities.
Abstract: This paper presents integration and testing scheme for managing a large Multilingual OCR Project. The project is an attempt to implement an integrated platform for OCR of different Indian languages. Software engineering, workflow management and testing processes have been discussed in this paper. The OCR has now been experimentally deployed for some specific applications and currently is being enhanced for handling the space and time constraints, achieving higher recognition accuracies and adding new functionalities.

26 citations



Book ChapterDOI
27 Jun 2011
TL;DR: An improved, macroblock (MB) level, visual saliency algorithm, aimed at video compression, is presented and a video compression architecture for propagation of saliency values, saving tremendous amount of computation, is proposed.
Abstract: In this paper an improved, macroblock (MB) level, visual saliency algorithm, aimed at video compression, is presented. A Relevance Vector Machine (RVM) is trained over 3 dimensional feature vectors, pertaining to global, local and rarity measures of conspicuity, to yield probabalistic values which form the saliency map. These saliency values are used for non-uniform bit-allocation over video frames. A video compression architecture for propagation of saliency values, saving tremendous amount of computation, is also proposed.

15 citations


Proceedings ArticleDOI
18 Sep 2011
TL;DR: A novel framework for segmentation of documents with complex layouts performed by combination of clustering and conditional random fields (CRF) based modeling and has been extensively tested on multi-colored document images with text overlapping graphics/image.
Abstract: In this paper, we propose a novel framework for segmentation of documents with complex layouts. The document segmentation is performed by combination of clustering and conditional random fields (CRF) based modeling. The bottom-up approach for segmentation assigns each pixel to a cluster plane based on color intensity. A CRF based discriminative model is learned to extract the local neighborhood information in different cluster/color planes. The final category assignment is done by a top-level CRF based on the semantic correlation learned across clusters. The proposed framework has been extensively tested on multi-colored document images with text overlapping graphics/image.

12 citations


Proceedings ArticleDOI
17 Sep 2011
TL;DR: The proposed framework presents a top-down approach by performing page, block/paragraph and word level script identification in multiple stages by utilizing texture and shape based information embedded in the documents at different levels for feature extraction.
Abstract: Script identification in a multi-lingual document environment has numerous applications in the field of document image analysis, such as indexing and retrieval or as an initial step towards optical character recognition. In this paper, we propose a novel hierarchical framework for script identification in bi-lingual documents. The framework presents a top-down approach by performing page, block/paragraph and word level script identification in multiple stages. We utilize texture and shape based information embedded in the documents at different levels for feature extraction. The prediction task at different levels of hierarchy is performed by Support Vector Machine (SVM) and Rejection based classifier defined using AdaBoost. Experimental evaluation of the proposed concept on document collections of Hindi/English and Bangla/English scripts have shown promising results.

12 citations


Journal ArticleDOI
01 Jun 2011
TL;DR: A scheme to automatically generate fuzzy rules for MR image segmentation to classify tissue is proposed based on hybrid approach of two popular genetic algorithm based machine learning (GBML) techniques, Michigan and Pittsburg approach.
Abstract: Magnetic resonance system generates image data, where the contrast is dependent on various parameters like proton density (PD), spin lattice relaxation time (T1), spin-spin relaxation time (T2), chemical shift, flow effect, diffusion, and perfusion. There is a lot of variability in the intensity pattern in the magnetic resonance (MR) image data due to various reasons. For example a T2 weighted image of same patient can be generated by different pulse sequence (Spin Echo, Fast Spin Echo, Inversion recovery, etc.) or on different MR system (1T, 1.5T, 3T, system, etc.) or using different RF coil system. Hence, there is a need for an adaptive scheme for segmentation, which can be modified depending on the imaging scheme and nature of the MR images. This paper proposes a scheme to automatically generate fuzzy rules for MR image segmentation to classify tissue. The scheme is based on hybrid approach of two popular genetic algorithm based machine learning (GBML) techniques, Michigan and Pittsburg approach. The proposed method uses a training data set generated from manual segmented images with the help of an expert in magnetic resonance imaging (MRI). Features from image histogram and spatial neighbourhood of pixels have been used in fuzzy rules. The method is tested for classifying brain T2 weighted 2-D axial images acquired by different pulse sequences into three primary tissue types: white matter (WM), gray matter (GM), and cerebro spinal fluid (CSF). Results were matched with manual segmentation by experts. The performance of our scheme was comparable.

8 citations


Proceedings ArticleDOI
15 Dec 2011
TL;DR: A novel dance posture based annotation model by combining features using Multiple Kernel Learning (MKL) and a novel feature representation which represents the local texture properties of the image is proposed.
Abstract: We present a novel dance posture based annotation model by combining features using Multiple Kernel Learning (MKL). We have proposed a novel feature representation which represents the local texture properties of the image. The annotation model is defined in the direct a cyclic graph structure using the binary MKL algorithm. The bag-of-words model is applied for image representation. The experiments have been performed on the image collection belonging to two Indian classical dances (Bharatnatyam and Odissi). The annotation model has been tested using SIFT and the proposed feature individually and by optimally combining both the features. The experiments have shown promising results.

7 citations


Journal ArticleDOI
TL;DR: The proposed method provides good coverage as it dynamically relocates the sensors either for avoiding coverage hole or for improving the quality of event sensing.
Abstract: We are proposing a self deploying mobile sensor network which is empowered with event based relocation of redundant sensors for enhancing the quality of event sensing. Energy efficient Cell quorum based protocol is used for communication between event location and redundant sensors. Computationally light cascaded reorganization of sensor is suggested for relocation of sensors. The proposed method provides good coverage as it dynamically relocates the sensors either for avoiding coverage hole or for improving the quality of event sensing.

6 citations


Proceedings ArticleDOI
15 Dec 2011
TL;DR: Multi-Objective Genetic Algorithm is used to maximize the camera coverage with optimum illumination of the sensing space and this paper outlines the camera and light source location optimization problem with multiple objective functions.
Abstract: Optimal placement of visual sensors along with good lighting conditions is indispensable for the successful execution of surveillance applications. Limited field-of-view, depth-of-field, occlusion due to presence of different objects in the scene form the major constraints for visual sensor placement. While over/under exposed objects, shadowing and light rays directly incident on the camera lens are some of the constraints for light source placement. Because of the nature of the constraints and complexity of the problem, the placement problem is considered to be a multi-objective global optimization problem. The paper outlines the camera and light source location optimization problem with multiple objective functions. Multi-Objective Genetic Algorithm is used to maximize the camera coverage with optimum illumination of the sensing space.

Proceedings ArticleDOI
01 Nov 2011
TL;DR: A novel approach for event detection in sports videos by topic based graphical model learning where characteristics features defining various sport events are extracted by contextual grouping of low-level video and audio features using topic modeling.
Abstract: The paper presents a novel approach for event detection in sports videos by topic based graphical model learning. The characteristics features defining various sport events are extracted by contextual grouping of low-level video and audio features using topic modeling. Event detection is performed by learning the structure of context based distribution of characteristic features by CRF based graphical model. Experimental evaluation of the proposed concept is presented on recorded video of Handball and Soccer game.

Proceedings ArticleDOI
18 Sep 2011
TL;DR: A novel document indexing framework which attends the document digitization errors in the indexing process to improve the overall retrieval accuracy and is based on topic modeling using Latent Dirichlet Allocation (LDA).
Abstract: Indexing and retrieval performance over digitized document collection significantly depends on the performance of available Optical Character Recognition (OCR). The paper presents a novel document indexing framework which attends the document digitization errors in the indexing process to improve the overall retrieval accuracy. The proposed indexing framework is based on topic modeling using Latent Dirichlet Allocation (LDA). The OCR's confidence in correctly recognizing a symbol is propagated in topic learning process such that semantic grouping of word examples carefully distinguishes between commonly confusing words. We present a novel application of Lucene with topic modeling for document indexing application. The experimental evaluation of the proposed framework is presented on document collection belonging to Devanagari script.

Proceedings ArticleDOI
19 Feb 2011
TL;DR: DCT and DWT based transform domain watermarking techniques were implemented on RROI and the performance of the two were compared on the basis of PSNR to check the robustness of the watermark.
Abstract: Forensic watermarking is a tool for secure transmission of forensic questioned documents. The reported cases of fraudulent manipulation of forensic case documents are basic motivation behind the application proposed in this paper. Earlier approaches in forensic watermarking embed watermark onto whole document which is highly susceptible to attack. We have proposed a RROI based forensic watermarking of case documents which basically represents the rectangular region of a document (fingerprint/signature) that needs to be watermarked. In this, DCT and DWT based transform domain watermarking techniques were implemented on RROI and the performance of the two were compared on the basis of PSNR. Moreover, some attacks like scaling and addition of noise have been performed to check the robustness of the watermark.

Proceedings ArticleDOI
15 Dec 2011
TL;DR: An approach for correcting character recognition errors of an OCR which can recognise Indic Scripts and achieves maximum error rate reduction of 33% over simple character recognition system.
Abstract: In this paper we present an approach for correcting character recognition errors of an OCR which can recognise Indic Scripts. Suffix tree is used to index the lexicon in lexicographical order to facilitate the probabilistic search. To obtain the best probable match against the mis-recognised string, it is compared with the sub-strings (edges of suffix tree) using similarity measure as weighted Levenshtein distance, where Confusion probabilities of characters (Unicodes) are used as substitution cost, until it exceeds the specified cost k. Retrieved candidates are sorted and selected on the basis of their lowest edit cost. Exploiting this information, the system can correct non-word errors and achieves maximum error rate reduction of 33% over simple character recognition system.

Patent
30 Dec 2011
TL;DR: In this paper, a video compression framework based on parametric object and background compression is proposed, where an object is detected and frames are segmented into regions corresponding to the foreground object and the background.
Abstract: A video compression framework based on parametric object and background compression is proposed. At the encoder, an object is detected and frames are segmented into regions corresponding to the foreground object and the background. The encoder generates object motion and appearance parameters. The motion or warping parameters may include at least two parameters for object translation; two parameters for object scaling in two primary axes and one object orientation parameter indicating a rotation of the object. Particle filtering may be employed to generate the object motion parameters. The proposed methodology is the formalization of the concept and usability for perceptual quality scalability layer for Region(s) of Interest. A coded video sequence format is proposed which aims at “network friendly” video representation supporting appearance and generalized motion of object(s).

Proceedings ArticleDOI
18 Sep 2011
TL;DR: This algorithm proposes a language independent tool for recording the handwriting of user in its original essence and results obtained from preliminary testing on MATLAB and Android platform show significant improvement in compression ratio over the traditional storage and compression schemes.
Abstract: We present here an idea of compression of user fed data from a touch screen input interface for storage and transmission over relatively lower bandwidth. The input is taken in the form of hand-written text, graphics, symbols or patterns and recorded as strokes in order of their temporal occurrence. The patterns are segmented into primitive forms each of which is then modeled with third order B-Spline Curves. The number of control points driving the Spline Curve is determined beforehand by recognizing the dominant points in the pattern. The significant reduction of redundancy in data can be exploited in wide application base including low-cost handheld device communication. This algorithm hence proposes a language independent tool for recording the handwriting of user in its original essence. Results obtained from preliminary testing on MATLAB and Android platform show significant improvement in compression ratio over the traditional storage and compression schemes.

Proceedings ArticleDOI
01 Oct 2011
TL;DR: This paper proposes the hardware accelerator capable of detecting real time changes in a scene, which uses clustering based change detection scheme, and is designed and simulated using VHDL and implemented on Xilinx XUP Virtex-IIPro FPGA board.
Abstract: Smart Cameras are important components in Human Computer Interaction. In any remote surveillance scenario, smart cameras have to take intelligent decisions to select frames of significant changes to minimize communication and processing overhead. Among many of the algorithms for change detection, one based on clustering based scheme was proposed for smart camera systems. However, such an algorithm could achieve low frame rate far from real-time requirements on a general purpose processors (like PowerPC) available on FPGAs. This paper proposes the hardware accelerator capable of detecting real time changes in a scene, which uses clustering based change detection scheme. The system is designed and simulated using VHDL and implemented on Xilinx XUP Virtex-IIPro FPGA board. Resulted frame rate is 30 frames per second for QVGA resolution in gray scale.

Proceedings ArticleDOI
15 Dec 2011
TL;DR: An image analogy based super-resolution technique that can be used as an effective tool for document image compression and multi-resolution viewing of the document.
Abstract: In this work, we propose an image analogy based super-resolution technique that can be used as an effective tool for document image compression and multi-resolution viewing of the document. The technique uses Dugad and Ahuja method for resizing document images. Next, image analogies framework is applied to add the missing high frequency information. The encoder allows user to compress spatially lower resolution version of the image using any standard image compression technique, thus enabling substantial compression. At the decoder end, the image is resized using Dugad and Ahuja method and then enhanced using image analogy by appending the missing high frequency details, using a training pair of the same class of document image.

Journal ArticleDOI
TL;DR: An embedded platform based framework for implementing summary generation scheme using HW-SW Co-Design based methodology is proposed and the complete system is implemented on Xilinx XUP Virtex-II Pro FPGA board.
Abstract: In any remote surveillance scenario, smart cameras have to take intelligent decisions to generate summary frames to minimize communication and processing overhead. Video summary generation, in the context of smart camera, is the process of merging the information from multiple frames. A summary generation scheme based on clustering based change detection algorithm has been implemented in our smart camera system for generating frames to deliver requisite information. In this paper we propose an embedded platform based framework for implementing summary generation scheme using HW-SW Co-Design based methodology. The complete system is implemented on Xilinx XUP Virtex-II Pro FPGA board. The overall algorithm is running on PowerPC405 and some of the blocks which are computationally intensive and more frequently called are implemented in hardware using VHDL. The system is designed using Xilinx Embedded Design Kit (EDK).

Proceedings ArticleDOI
18 Sep 2011
TL;DR: A novel word image based document indexing scheme by combination of string matching and hashing is presented for two document image collections belonging to Devanagari and Bengali script.
Abstract: We present a novel word image based document indexing scheme by combination of string matching and hashing The word image representation is defined by string codes obtained by unsupervised learning over graphical primitives The indexing framework is defined by distance based hashing function which does the object projection to hash space by preserving their distances We have used edit distance based string matching for defining the hashing function and for approximate nearest neighbor based retrieval The application of the proposed indexing framework is presented for two document image collections belonging to Devanagari and Bengali script

Patent
25 May 2011
TL;DR: In this paper, a method, device and system are provided to classify an image based on content which improved accuracy of an image classification and reduce a processing time, where an electronic device(105) includes an identification unit(160), a extracting unit(165), a determining unit(170), a grouped unit(175), an index unit(180), and a classifying unit(185).
Abstract: PURPOSE: A method, device and system are provided to classify an image based on content which improved accuracy of an image classification and reduce a processing time. CONSTITUTION: An electronic device(105) includes an identification unit(160), a extracting unit(165), a determining unit(170), a grouped unit(175), an index unit(180), and a classifying unit(185). The identification unit identifying unit identifies at least one interest area from a plurality of images which is related to a category. The extracting unit extracts a plurality of pixels from at least one identified interest areas. The determining unit determines color values about extracted pixels. The grouped unit groups the color values in a code book corresponding to a category.

Proceedings ArticleDOI
18 Dec 2011
TL;DR: This talk presents an overview of applications of distributed mobile robots in disaster management, focusing on control and coordination for multiple robots that have to move as a group with user-defined relative positions, i.e., in formations for performing different tasks.
Abstract: Disasters themselves are not limited to specific parts of world, though certain areas might be more prone to certain specific types of disasters. Some countries are more prone to terrorist activities, some coastal areas are more prone to cyclones, some areas are more prone to floods while some other areas are prone to oil spills. Loss of human life and property are obvious consequences of disasters. However, the level of preparedness is the key element that can limit the extent of damage. Use of sensor network based technologies can enhance the level of preparedness and the ability to handle consequences of the disaster. This higher level of preparedness can provide a better control over the loss. A team of mobile robots can quickly set up a network of mobile sensors and actuators for rapid action. This talk presents an overview of applications of distributed mobile robots in disaster management.Applications which have human risks such as handling of nuclear waste, identification of location of explosives, etc., show the potential of use of mobile robots functioning as a group. Mobile robots have been used in search and rescue operation of World Trade centre terrorist attack and Hanshin-Awaji earthquake. In such situations mobile robots can enter voids too small or deep for a person, and can begin surveying larger voids that people are not permitted to enter until a fire has been put out or the structure has been reinforced. Robots can carry cameras, thermal imagers, hazardous material detectors, and medical payloads into the interior of a rubble pile and set up communication link with human operator using the ad-hoc network set-up by these robots. Each robot equipped with accelerometer, gyroscope and magnetic compass as sensor devices, can plan its navigational path with reference to each other and can get the sensor network dynamically relocated. Team of mobile robots equipped with appropriate sensors and distributed and cooperative planning algorithms can also autonomously generate maps for oil spill or radiation leaks.In this context obviously the protocol for coalition formation between multiple robots becomes an important issue. Formation Control strategies have been developed focusing on control and coordination for multiple robots that have to move as a group with user-defined relative positions, i.e., in formations for performing different tasks. In case of disaster management, with human in the loop, a new problem, that of coalition formation in a team consisting of multiple robots and human beings, needs to be addressed.

Book ChapterDOI
27 Jun 2011
TL;DR: A novel perception-driven approach to low-cost tele-presence systems, to support immersive experience in continuity between projected video and conferencing room, using geometry and spectral correction to impart for perceptual continuity to the whole scene.
Abstract: We present a novel perception-driven approach to low-cost tele-presence systems, to support immersive experience in continuity between projected video and conferencing room.We use geometry and spectral correction to impart for perceptual continuity to the whole scene. The geometric correction comes from a learning-based approach to identifying horizontal and vertical surfaces. Our method redraws the projected video to match its vanishing point with that of the conference room in which it is projected. We quantify intuitive concepts such as the depth-of-field using a Gabor filter analysis of overall images of the conference room. We equalise spectral features across the projected video and the conference room, for spectral continuity between the two.

Book ChapterDOI
27 Jun 2011
TL;DR: A novel hierarchical framework for scene categorization is proposed using Conditional Random Fields in a hierarchical setting for discovering the global context of latent topics extracted by Latent Dirichlet Allocation.
Abstract: We propose a novel hierarchical framework for scene categorization. The scene representation is defined by latent topics extracted by Latent Dirichlet Allocation. The interaction of these topics across scene categories is learned by probabilistic graphical modelling. We use Conditional Random Fields in a hierarchical setting for discovering the global context of these topics. The learned random fields are further used for categorization of a new scene. The experimental results of the proposed framework is presented on standard datasets and on image collection obtained from the internet.

Journal ArticleDOI
TL;DR: A hierarchical system to perform automatic categorization and reorientation of images using content analysis is pre-sented and finds it applications to various digital media products and brings pattern recognition solutions to the consumer electronics domain.
Abstract: A hierarchical system to perform automatic categorization and reorientation of images using content analysis is pre-sented. The proposed system first categorizes images to some a priori defined categories using rotation invariant features. At the second stage, it detects their correct orientation out of {0o, 90o, 180o, and 270o} using category specific model. The system has been specially designed for embedded devices applications using only low level color and edge features. Machine learning algorithms optimized to suit the embedded implementation like support vector machines (SVMs) and scalable boosting have been used to develop classifiers for categorization and orientation detection. Results are presented on a collection of about 7000 consumer images collected from open resources. The proposed system finds it applications to various digital media products and brings pattern recognition solutions to the consumer electronics domain.

Book ChapterDOI
27 Jun 2011
TL;DR: A novel architecture for an interactive 3DTV system based on multiple uncalibrated cameras placed at general positions is proposed and is compatible with the standard multi view coding framework making it amenable to using existing coding and compression algorithms.
Abstract: In this paper we propose a novel architecture for an interactive 3DTV system based on multiple uncalibrated cameras placed at general positions. The signal representation scheme proposed is compatible with the standard multi view coding framework making it amenable to using existing coding and compression algorithms. The proposed scheme also fits naturally to the concept of true 3DTV viewing experience where the viewer can choose a novel viewpoint based on the contents of the scene.

Proceedings ArticleDOI
15 Dec 2011
TL;DR: A novel multi-resolution robust methodology that is invariant to a large range of distortions, illumination changes, and is relatively resilient to noise and unmodelled objects present as clutter is proposed.
Abstract: Given a part of a document image taken with any camera at an arbitrary orientation and sometimes far-from-perfect illumination, an important problem is to match this query image to the corresponding full image from a document database. We propose a novel multi-resolution robust methodology for the same. The method combines information from independent sources of measurement in a probabilistic framework. The proposed method is invariant to a large range of distortions, illumination changes, and is relatively resilient to noise and unmodelled objects present as clutter. To the best of our knowledge, no related work address all these issues.