scispace - formally typeset
Search or ask a question

Showing papers by "Svetha Venkatesh published in 2004"


Journal ArticleDOI
01 Jan 2004
TL;DR: A multiple window incremental learning algorithm that distinguishes between virtual concept drift and real concept drift, and uses a novel approach to tracking concept drift that involves the use of competing windows to interpret the data.
Abstract: In this paper we present a multiple window incremental learning algorithm that distinguishes between virtual concept drift and real concept drift The algorithm is unsupervised and uses a novel approach to tracking concept drift that involves the use of competing windows to interpret the data Unlike previous methods which use a single window to determine the drift in the data, our algorithm uses three windows of different sizes to estimate the change in the data The advantage of this approach is that it allows the system to progressively adapt and predict the change thus enabling it to deal more effectively with different types of drift We give a detailed description of the algorithm and present the results obtained from its application to two real world problems: background image processing and sound recognition We also compare its performance with FLORA, an existing concept drift tracking algorithm

129 citations


Proceedings Article
01 Jan 2004
TL;DR: Results indicated that the threshold obtained via the proposed technique provides a balanced recognition in term of precision and recall and demonstrated that the energy histogram algorithm outperformed the well-known Eigenface algorithm.
Abstract: In this paper, we investigate the face recognition problem via energy histogram of the DCT coefficients. Several issues related to the recognition performance are discussed, In particular the issue of histogram bin sizes and feature sets. In addition, we propose a technique for selecting the classification threshold incrementally. Experimentation was conducted on the Yale face database and results indicated that the threshold obtained via the proposed technique provides a balanced recognition in term of precision and recall. Furthermore, it demonstrated that the energy histogram algorithm outperformed the well-known Eigenface algorithm.

87 citations


Proceedings Article
25 Jul 2004
TL;DR: In this paper, a general hierarchical hidden Markov model (HHMM) is presented, in which the state hierarchy can be a lattice allowing arbitrary sharing of substructures.
Abstract: The hierarchical hidden Markov model (HHMM) is an extension of the hidden Markov model to include a hierarchy of the hidden states. This form of hierarchical modeling has been found useful in applications such as handwritten character recognition, behavior recognition, video indexing, and text retrieval. Nevertheless, the state hierarchy in the original HHMM is restricted to a tree structure. This prohibits two different states from having the same child, and thus does not allow for sharing of common substructures in the model. In this paper, we present a general HHMM in which the state hierarchy can be a lattice allowing arbitrary sharing of substructures. Furthermore, we provide a method for numerical scaling to avoid underflow, an important issue in dealing with long observation sequences. We demonstrate the working of our method in a simulated environment where a hierarchical behavioral model is automatically learned and later used for recognition.

86 citations


Proceedings Article
01 Jan 2004
TL;DR: This paper presents a general HHMM in which the state hierarchy can be a lattice allowing arbitrary sharing of substructures, and provides a method for numerical scaling to avoid underflow, an important issue in dealing with long observation sequences.

85 citations


Proceedings ArticleDOI
05 Jan 2004
TL;DR: This work uses a Hidden Markov Model approach for gesture modeling with both isolated gestures and gestures segmented from a stream to augment video with data obtained from accelerometers worn as wrist bands by one or more officials.
Abstract: We present results on the recognition of intentional human gestures for video annotation and retrieval. We define a gesture as a particular, repeatable, human movement having a predefined meaning. An obvious application of the work is in sports video annotation where umpire gestures indicate specific events. Our approach is to augment video with data obtained from accelerometers worn as wrist bands by one or more officials. We present the recognition performance using a Hidden Markov Model approach for gesture modeling with both isolated gestures and gestures segmented from a stream.

28 citations


Book ChapterDOI
09 Aug 2004
TL;DR: It is shown how the implicit state duration in the HMM can create a situation in which highly abnormal deviation as either less than or more than the usually observed activity duration can fail to be detected and how the explicit state duration HMM (ESD-HMM) helps alleviate the problem.
Abstract: Much of the current work in human behaviour modelling concentrates on activity recognition, recognising actions and events through pose, movement, and gesture analysis. Our work focuses on learning and detecting abnormality in higher level behavioural patterns. The hidden Markov model (HMM) is one approach for learning such behaviours given a vision tracker recording observations about a person's activity. Duration of human activity is an important consideration if we are to accurately model a person's behavioural patterns. We show how the implicit state duration in the HMM can create a situation in which highly abnormal deviation as either less than or more than the usually observed activity duration can fail to be detected and how the explicit state duration HMM (ESD-HMM) helps alleviate the problem.

26 citations


Proceedings ArticleDOI
23 Aug 2004
TL;DR: This paper approaches the problem of segmenting higher-level activities into their component sub-actions using hidden Markov models modified to handle missing data in the observation vector by controlling the use of missing data, thus performing segmentation and classification simultaneously.
Abstract: Segmentation of individual actions from a stream of human motion is an open problem in computer vision. This paper approaches the problem of segmenting higher-level activities into their component sub-actions using hidden Markov models modified to handle missing data in the observation vector. By controlling the use of missing data, action labels can be inferred from the observation vector during inferencing, thus performing segmentation and classification simultaneously. The approach is able to segment both prominent and subtle actions, even when subtle actions are grouped together. The advantage of this method over sliding windows and Viterbi state sequence interrogation is that segmentation is performed as a trainable task, and the temporal relationship between actions is encoded in the model and used as evidence for action labelling.

18 citations


Journal ArticleDOI
TL;DR: This research takes an action-centered approach to automatically learning and classifying functional objects by using the human-object interaction signature to find and classify objects on the basis of how humans interact with those objects.
Abstract: Our research takes an action-centered approach to automatically learning and classifying functional objects. Our premise is that interpreting human motion is much easier than recognizing arbitrary objects because the human body has constraints on its motion. Moreover, humans tend to interact differently with different objects, so you should be able to identify an object by analyzing how people move when they manipulate it. We call these motions the human-object interaction signature. An interaction signature is a method to find and classify objects on the basis of how humans interact with those objects. The method addresses many key problems encountered in smart-home monitoring systems.

14 citations


Book ChapterDOI
TL;DR: The problem of automatic segmentation and robust gesture classification is solved using a hierarchical hidden Markov model in conjunction with a filler model to handle extraneous umpire movements.
Abstract: We present results on an extension to our approach for automatic sports video annotation. Sports video is augmented with accelerometer data from wrist bands worn by umpires in the game. We solve the problem of automatic segmentation and robust gesture classification using a hierarchical hidden Markov model in conjunction with a filler model. The hierarchical model allows us to consider gestures at different levels of abstraction and the filler model allows us to handle extraneous umpire movements. Results are presented for labeling video for a game of Cricket.

13 citations


Journal ArticleDOI
TL;DR: Algorithms for forming a line, circle, and regular polygon from a given set of random positions are presented and demonstrated that the algorithms are robust against random errors in the sensors and actuators.
Abstract: Mobile wireless sensor networks (MWSNs) will enable information systems to gather detailed information about the environment on an unprecedented scale. These self-organizing, distributed networks of sensors, processors, and actuators that are capable of movement have a broad range of potential applications, including military reconnaissance, surveillance, planetary exploration, and geophysical mapping. In many of the foreseen applications, the MWSN will need to form a geometric pattern without assistance from the user. In military reconnaissance, for example, the nodes will be dropped onto the battlefield from a plane and land at random positions. The nodes will be expected to arrange themselves into a predetermined formation in order to perform a specific task. Thus, we present algorithms for forming a line, circle, and regular polygon from a given set of random positions. The algorithms are distributed and use no communication between the nodes to minimize energy consumption. Unlike past studies of geometric problems where algorithms are either tested in simulations where each node has global knowledge of all the other nodes or implemented on a small number of robots, the robustness of our algorithms has been studied with simulations that model the sensor system in detail. The simulations demonstrate that the algorithms are robust against random errors in the sensors and actuators. © 2004 Wiley Periodicals, Inc.

13 citations


Book ChapterDOI
09 Aug 2004
TL;DR: A modified "median value" model is presented in which the detection threshold adapts to global changes in illumination, and the responses of several models are compared, demonstrating the effectiveness of the new model.
Abstract: Background elimination models are widely used in motion tracking systems. Our aim is to develop a system that performs reliably under adverse lighting conditions. In particular, this includes indoor scenes lit partly or entirely by diffuse natural light. We present a modified "median value" model in which the detection threshold adapts to global changes in illumination. The responses of several models are compared, demonstrating the effectiveness of the new model.

Book ChapterDOI
TL;DR: The techniques that can be used to learn the different camera noise models and the human movement models to be used in this distributed surveillance system are described and results are provided showing the system is able to identify behaviours of people from their movement signatures.
Abstract: In surveillance systems for monitoring people behaviours, it is important to build systems that can adapt to the signatures of people’s tasks and movements in the environment. At the same time, it is important to cope with noisy observations produced by a set of cameras with possibly different characteristics. In previous work, we have implemented a distributed surveillance system designed for complex indoor environments [1]. The system uses the Abstract Hidden Markov mEmory Model (AHMEM) for modelling and specifying complex human behaviours that can take place in the environment. Given a sequence of observations from a set of cameras, the system employs approximate probabilistic inference to compute the likelihood of different possible behaviours in real-time. This paper describes the techniques that can be used to learn the different camera noise models and the human movement models to be used in this system. The system is able to monitor and classify people behaviours as data is being gathered, and we provide classification results showing the system is able to identify behaviours of people from their movement signatures.

01 Jan 2004
TL;DR: Results indicate that the automatic eigenvectors and threshold selection methods provide an optimum recognition in terms of precision and recall rates and it is shown that the eigenvector selection method outperforms energy and stretching dimension methods in Terms of selected number of eigenvctors and computation cost.
Abstract: In this paper, we investigate the parameter selection issues for Eigenfaces. Our focus is on the eigenvectors and threshold selection issues. We propose a systematic approach in selecting the eigenvectors based on the relative errors of the eigenvalues. In addition, we have designed a method for selecting the classification threshold that utilizes the information obtained from the training database effectively. Experimentation was conducted on the ORL and AMP face databases with results indicating that the automatic eigenvectors and threshold selection methods provide an optimum recognition in terms of precision and recall rates. Furthermore, we show that the eigenvector selection method outperforms energy and stretching dimension methods in terms of selected number of eigenvectors and computation cost.

01 Jan 2004
TL;DR: This paper proposes to shift the responsibility for dealing with missing pose data away from the pose estimator and onto the action classifier, with data missing during both training and classification, to address the problem of occlusions.
Abstract: Currently, most human action recognition systems are trained with feature sets that have no missing data. Unfortunately, the use of human pose estimation models to provide more descriptive features also entails an increased sensitivity to occlusions, meaning that incomplete feature information will be unavoidable for realistic scenarios. To address the problem of occlusions, this paper proposes to shift the responsibility for dealing with missing pose data away from the pose estimator and onto the action classifier, with data missing during both training and classification. In this paper, missing data is specifically made the responsibility of the action recognition system where data is missing during both training and classification. This allows the use of a simple, real-time pose estimation technique that does not estimate the positions of limbs it cannot find quickly. The technique produces a stick-figure skeleton whose features are used with discrete Hidden Markov Models to recognise limb-level motions such as sitting, standing, walking, typing, drinking and reading. Results show that recognition in the presence of missing data is easily capable of producing good classification accuracy, though accuracy drops off sharply after more than around 50% of the data is missing.

Book ChapterDOI
09 Aug 2004
TL;DR: The approach is to shift the responsibility for dealing with occluded pose data away from the pose estimator and onto the action classifier, which allows the use of a simple, real-time pose estimation (stick-figure) that does not estimate the positions of limbs it cannot find quickly.
Abstract: Currently, most human action recognition systems are trained with feature sets that have no missing data. Unfortunately, the use of human pose estimation models to provide more descriptive features also entails an increased sensitivity to occlusions, meaning that incomplete feature information will be unavoidable for realistic scenarios. To address this, our approach is to shift the responsibility for dealing with occluded pose data away from the pose estimator and onto the action classifier. This allows the use of a simple, real-time pose estimation (stick-figure) that does not estimate the positions of limbs it cannot find quickly. The system tracks people via background subtraction and extracts the (possibly incomplete) pose skeleton from their silhouette. Hidden Markov Models modified to handle missing data are then used to successfully classify several human actions using the incomplete pose features.

Proceedings ArticleDOI
10 Oct 2004
TL;DR: A PDA platform is used to deliver 3d visualizations of shot directives, instructions to the user about the type of footage to capture, and issues connected with realizing high-level representations in concrete first person animations.
Abstract: We present a new aspect of our ongoing research aimed at providing technology for the amateur home videographer. We aim to enable the production of quality video presentations that are well structured and use the expressive properties of the medium to full effect, regardless of the technical or artistic abilities of the user. This task requires that help be given to the user at or before capture time. We use a PDA platform to deliver 3d visualizations of shot directives, instructions to the user about the type of footage to capture, and discuss issues connected with realizing high-level representations in concrete first person animations. Additionally, we discuss the mechanism for mating that metadata with captured footage and implementation issues.

Journal ArticleDOI
TL;DR: This paper is all about authoring multimedia authoring tools, and discourse theory, domain distinctives, and multimedia data description standards all have a part to play.
Abstract: If we are to create effective multimedia authoring tools, we must avail ourselves of the various disciplines that we normally throw in the "someone-else's-problem" basket. Discourse theory, domain distinctives such as media aesthetics, human-computer interface issues, and multimedia data description standards all have a part to play. However, none of these fields stands still, so we need to continually query them for new insights that might impact our multimedia authoring endeavor. This paper is all about authoring multimedia authoring tools.

Proceedings ArticleDOI
10 Oct 2004
TL;DR: Various issues related to the multimedia information retrieval and media access are discussed and the feasible solutions for automatic signal-based analysis of media content are analyzed.
Abstract: Various issues related to the multimedia information retrieval and media access are discussed. The feasible solutions for automatic signal-based analysis of media content are analyzed. The extent of user involvement in the content creation process is emphasized. The applications driving the creation and usage of context and metadata are also elaborated.

Proceedings ArticleDOI
05 Jan 2004
TL;DR: Certain filmic elements such as montage, centre/cutaway, dialogue, temporal flow, zone change, dramatic progression, shot association, scene introduction, scene resolution, master shot and editing orchestration can be identified from a scene through the signature arrangements of nodes and edges in the DR-TTD.
Abstract: In this paper, we study the application of a scene structure visualizing technique called double-Ring Take-Transition-Diagram (DR-TTD). This technique presents takes and their transitions during a film scene via nodes and edges of a 'graph' consisting of two rings as its back-bone. We describe how certain filmic elements such as montage, centre/cutaway, dialogue, temporal flow, zone change, dramatic progression, shot association, scene introduction, scene resolution, master shot and editing orchestration can be identified from a scene through the signature arrangements of nodes and edges in the DR-TTD.

Proceedings ArticleDOI
30 Jun 2004
TL;DR: An integrated media creation environment is discussed, its efficacy in the generation of two simple home movies is demonstrated, and content repurposing powered by those same transformations added to the rich semantic information maintained at each phase of the process.
Abstract: We discuss the design and implementation of an integrated media creation environment, and demonstrate its efficacy in the generation of two simple home movies. The significance for the average user seeking to create home movies lies in the flexible and automatic application of film principles to the task, removal of tedious low-level editing by means of well-formed media transformations in terms of high-level film constructs (e.g. tempo), and content repurposing powered by those same transformations added to the rich semantic information maintained at each phase of the process

Proceedings ArticleDOI
05 Jan 2004
TL;DR: An application designed to improve the quality of amateur video production that leverages the age-old communicative powers of story to answer the what and how of home movie material.
Abstract: In this paper, we present an application designed to improve the quality of amateur video production. The majority of home movie material is negatively impacted by two factors: lack of narrative content - "what to shoot?", and the absence or inappropriate use of cinesthestic elements for effective reinforcement of content - "how to shoot?" We leverage the age-old communicative powers of story to answer the what. For the second problem, the how, we turn to the corpus of aesthetic principles that constitute the film profession, which impact both technical and cinematic considerations for a given project.

Book ChapterDOI
TL;DR: This paper model the hierarchy of topical structures by an hhmm and demonstrate the usefulness of the model in detecting topic transitions and the expressiveness of this model is concentrated on a specific class of video – educational videos.
Abstract: In this paper, we present an application of the hierarchical hmm for structure discovery in educational videos. The hhmm has recently been extended to accommodate the concept of shared structure, ie: a state might multiply inherit from more than one parents. Utilising the expressiveness of this model, we concentrate on a specific class of video – educational videos – in which the hierarchy of semantic units is simpler and clearly defined in terms of topics and its sub-units. We model the hierarchy of topical structures by an hhmm and demonstrate the usefulness of the model in detecting topic transitions.

Proceedings ArticleDOI
01 Jan 2004
TL;DR: This paper uses the data from nine training videos to learn the parameters of the HHMM, and thus naturally extract the hierarchy, and then studies this hierarchy and examines the nature of the structure at different levels of abstraction.
Abstract: In this paper we present a coherent approach using the hierarchical HMM with shared structures to extract the structural units form the building blocks of an education/training video. Rather than using hand-crafted approaches to define the structural units, we use the data from nine training videos to learn the parameters of the HHMM, and thus naturally extract the hierarchy. We then study this hierarchy and examine the nature of the structure at different levels of abstraction. Since the observable is continuous, we also show how to extend the parameter learning in the HHMM to deal with continuous observations.

Book ChapterDOI
30 Nov 2004
TL;DR: In this article, three types of detectors for detecting abnormal activity are developed using negative selection, and results have shown that the classifier is able to discriminate abnormal from normal activities in terms of both trajectory and time spent at a location.
Abstract: Inspired by the human immune system, and in particular the negative selection algorithm, we propose a learning mechanism that enables the detection of abnormal activities. Three types of detectors for detecting abnormal activity are developed using negative selection. Tracks gathered by people's movements in a room are used for experimentation and results have shown that the classifier is able to discriminate abnormal from normal activities in terms of both trajectory and time spent at a location.