scispace - formally typeset
Search or ask a question

Showing papers on "Motion analysis published in 2002"


Journal ArticleDOI
TL;DR: A coupled multiphase propagation that imposes the idea of mutually exclusive propagating curves and increases the robustness as well as the convergence rate is proposed and has been validated using three important applications in computer vision.

406 citations


Journal ArticleDOI
TL;DR: Experimental results show that motion patterns of hand gestures can be extracted and recognized accurately using motion trajectories and applied to recognize 40 hand gestures of American Sign Language.
Abstract: We present an algorithm for extracting and classifying two-dimensional motion in an image sequence based on motion trajectories. First, a multiscale segmentation is performed to generate homogeneous regions in each frame. Regions between consecutive frames are then matched to obtain two-view correspondences. Affine transformations are computed from each pair of corresponding regions to define pixel matches. Pixels matches over consecutive image pairs are concatenated to obtain pixel-level motion trajectories across the image sequence. Motion patterns are learned from the extracted trajectories using a time-delay neural network. We apply the proposed method to recognize 40 hand gestures of American Sign Language. Experimental results show that motion patterns of hand gestures can be extracted and recognized accurately using motion trajectories.

366 citations


Journal ArticleDOI
TL;DR: A complete dynamic motion layer representation in which spatial and temporal constraints on shape, motion and layer appearance are modeled and estimated in a maximum a-posteriori (MAP) framework using the generalized expectation-maximization (EM) algorithm is introduced.
Abstract: Decomposing video frames into coherent 2D motion layers is a powerful method for representing videos. Such a representation provides an intermediate description that enables applications such as object tracking, video summarization and visualization, video insertion, and sprite-based video compression. Previous work on motion layer analysis has largely concentrated on two-frame or multi-frame batch formulations. The temporal coherency of motion layers and the domain constraints on shapes have not been exploited. This paper introduces a complete dynamic motion layer representation in which spatial and temporal constraints on shape, motion and layer appearance are modeled and estimated in a maximum a-posteriori (MAP) framework using the generalized expectation-maximization (EM) algorithm. In order to limit the computational complexity of tracking arbitrarily shaped layer ownership, we propose a shape prior that parameterizes the representation of shape and prevents motion layers from evolving into arbitrary shapes. In this work, a Gaussian shape prior is chosen to specifically develop a near-real-time tracker for vehicle tracking in aerial videos. However, the general idea of using a parametric shape representation as part of the state of a tracker is a powerful one that can be extended to other domains as well. Based on the dynamic layer representation, an iterative algorithm is developed for continuous object tracking over time. The proposed method has been successfully applied in an airborne vehicle tracking system. Its performance is compared with that of a correlation-based tracker and a motion change-based tracker to demonstrate the advantages of the new method. Examples of tracking when the backgrounds are cluttered and the vehicles undergo various rigid motions and complex interactions such as passing, turning, and stop-and-go demonstrate the strength of the complete dynamic layer representation.

348 citations


Journal ArticleDOI
TL;DR: In this paper, the authors developed a first-generation smart camera system that can detect people and analyze their movement in real-time, which is a leading edge application for embedded system research.
Abstract: Recent technological advances are enabling a new generation of smart cameras that represent a quantum leap in sophistication. While today's digital cameras capture images, smart cameras capture high-level descriptions of the scene and analyze what they see. These devices could support a wide variety of applications including human and animal detection, surveillance, motion analysis, and facial identification. Video processing has an insatiable demand for real-time performance. Smart cameras leverage very large-scale integration to meet this need in a low-cost, low-power system with substantial memory. Moving well beyond pixel processing and compression, these VLSI systems run a wide range of algorithms to extract meaning from streaming video. Recently, Princeton University researchers developed a first-generation smart camera system that can detect people and analyze their movement in real time. Because they push the design space in so many dimensions, these smart cameras are a leading-edge application for embedded system research.

278 citations


Book ChapterDOI
25 Sep 2002
TL;DR: The overall goal of the ongoing project is to develop methods for spatio-temporal analysis of relative motion within groups of moving point objects, such as GPS-tracked animals, using the analysis concept called REMO (RElative MOtion).
Abstract: The overall goal of the ongoing project is to develop methods for spatio-temporal analysis of relative motion within groups of moving point objects, such as GPS-tracked animals. Whereas recent efforts of dealing with dynamic phenomena within the GIScience community mainly concentrated on modeling and representation, this research project concentrates on the analytic task. The analysis is performed on a process level and does not use the traditional cartographic approach of comparing snapshots. The analysis concept called REMO (RElative MOtion) is based on the comparison of motion parameters of objects over time. Therefore the observation data is transformed into a 2.5-dimensional analysis matrix, featuring a time axis, an object axis and motion parameters. This matrix reveals basic searchable relative movement patterns. The current approach handles points in a pure featureless space. Case study data of GPS-observed animals and political entities in an ideological space are used for illustration purposes.

168 citations


Proceedings ArticleDOI
11 Aug 2002
TL;DR: An algorithm is developed that is capable of extracting motion elements and recombining them in novel ways, including the action performed and a motion signature that captures the distinctive pattern of movement of a particular individual.
Abstract: Human motion is the composite consequence of multiple elements, including the action performed and a motion signature that captures the distinctive pattern of movement of a particular individual. We develop an algorithm that is capable of extracting these motion elements and recombining them in novel ways. The algorithm analyzes motion data spanning multiple subjects performing different actions. The analysis yields a generative motion model that can synthesize new motions in the distinctive styles of these individuals. Our algorithms can also recognize people and actions from new motions by comparing motion signatures and action parameters.

166 citations


Journal ArticleDOI
TL;DR: It is shown that complete surfel-based reconstructions can be created by repeatedly applying an algorithm called Surfel Sampling that combines sampling and parameter estimation to fit a single surfel to a small, bounded region of space-time.
Abstract: In this paper we study the problem of recovering the 3D shape, reflectance, and non-rigid motion properties of a dynamic 3D scene. Because these properties are completely unknown and because the scene's shape and motion may be non-smooth, our approach uses multiple views to build a piecewise-continuous geometric and radiometric representation of the scene's trace in space-time. A basic primitive of this representation is the dynamic surfel, which (1) encodes the instantaneous local shape, reflectance, and motion of a small and bounded region in the scene, and (2) enables accurate prediction of the region's dynamic appearance under known illumination conditions. We show that complete surfel-based reconstructions can be created by repeatedly applying an algorithm called Surfel Sampling that combines sampling and parameter estimation to fit a single surfel to a small, bounded region of space-time. Experimental results with the Phong reflectance model and complex real scenes (clothing, shiny objects, skin) illustrate our method's ability to explain pixels and pixel variations in terms of their underlying causes—shape, reflectance, motion, illumination, and visibility.

154 citations


Proceedings ArticleDOI
20 May 2002
TL;DR: This paper presents a method to recover the full-motion (3 rotations and 3 translations) of the head using a cylindrical model and uses the iteratively re-weighted least squares (IRLS) technique in conjunction with the image gradient to deal with non-rigid motion and occlusion.
Abstract: This paper presents a method to recover the full-motion (3 rotations and 3 translations) of the head using a cylindrical model. The robustness of the approach is achieved by a combination of three techniques. First, we use the iteratively re-weighted least squares (IRLS) technique in conjunction with the image gradient to deal with non-rigid motion and occlusion. Second, while tracking, the templates are dynamically updated to diminish the effects of self-occlusion and gradual lighting changes and keep tracking the head when most of the face is not visible. Third, because the dynamic templates may cause error accumulation, we re-register images to a reference frame when head pose is close to a reference pose. The performance of the real-time tracking program was evaluated in three separate experiments using image sequences (both synthetic and real) for which ground truth head motion is known. The real sequences included pitch and yaw of as large as 40/spl deg/ and 75/spl deg/ respectively. The average recovery accuracy of the 3D rotations was found to be about 3/spl deg/.

150 citations


Proceedings ArticleDOI
TL;DR: A two-step detection/tracking method for pedestrian detection and tracking using a night vision video camera installed on the vehicle to deal with the nonrigid nature of human appearance on the road is proposed.
Abstract: This paper presents a method for pedestrian detection and tracking using a night vision video camera installed on the vehicle. To deal with the nonrigid nature of human appearance on the road, a two-step detection/tracking method is proposed. The detection phase is performed by a support vector machine (SVM) with size-normalized pedestrian candidates and the tracking phase is a combination of Kalman filter prediction and mean shift tracking. The detection phase is further strengthened by information obtained by a road detection module that provides key information for pedestrian validation. Experimental comparisons have been carried out on gray-scale SVM recognition vs. binary SVM recognition and entire body detection vs. upper body detection.

135 citations


Journal ArticleDOI
TL;DR: This paper shows how the multi-frame subspace constraints can be used for constraining the 2D correspondence estimation process itself, and shows that these constraints are valid not only for affine cameras, but also for a variety of imaging models, scene models, and motion models.
Abstract: When a rigid scene is imaged by a moving camera, the set of all displacements of all points across multiple frames often resides in a low-dimensional linear subspace. Linear subspace constraints have been used successfully in the past for recovering 3D structure and 3D motion information from multiple frames (e.g., by using the factorization method of Tomasi and Kanade (1992, International Journal of Computer Vision, 9:137–154)). These methods assume that the 2D correspondences have been precomputed. However, correspondence estimation is a fundamental problem in motion analysis. In this paper we show how the multi-frame subspace constraints can be used for constraining the 2D correspondence estimation process itself. We show that the multi-frame subspace constraints are valid not only for affine cameras, but also for a variety of imaging models, scene models, and motion models. The multi-frame subspace constraints are first translated from constraints on correspondences to constraints directly on image measurements (e.g., image brightness quantities). These brightness-based subspace constraints are then used for estimating the correspondences, by requiring that all corresponding points across all video frames reside in the appropriate low-dimensional linear subspace. The multi-frame subspace constraints are geometrically meaningful, and are {not} violated at depth discontinuities, nor when the camera-motion changes abruptly. These constraints can therefore replace {heuristic} constraints commonly used in optical-flow estimation, such as spatial or temporal smoothness.

121 citations


Journal ArticleDOI
TL;DR: This paper proposes a new and fast FS motion estimation algorithm, which obtains faster elimination of inappropriate motion vectors using efficient matching units from localization of a complex area in image data and suggests two fast matching scan algorithms.
Abstract: To reduce the amount of computations for a full search (FS) algorithm for fast motion estimation, we propose a new and fast FS motion estimation algorithm. The computational reduction of our FS motion estimation algorithm comes from fast elimination of impossible motion vectors. We obtain faster elimination of inappropriate motion vectors using efficient matching units from localization of a complex area in image data. In this paper, we show three properties in block matching of motion estimation. We suggest two fast matching scan algorithms: one from adaptive matching scan and the other from fixed dithering order. Experimentally, we remove the unnecessary computations by about 30% with our proposed algorithm compared with the conventional fast FS algorithms.

Journal ArticleDOI
TL;DR: A method to measure the soft tissue motion in three dimensions in the orbit during gaze using T1-weighted magnetic resonance (MR) imaging volume sequences acquired during gaze andsoft tissue motion is quantified using a generalization of the Lucas and Kanade optical flow algorithm to three dimensions.
Abstract: This work presents a method to measure the soft tissue motion in three dimensions in the orbit during gaze. It has been shown that two-dimensional (2-D) quantification of soft tissue motion in the orbit is effective in the study of orbital anatomy and motion disorders. However, soft tissue motion is a three-dimensional (3-D) phenomenon and part of the kinematics is lost in any 2-D measurement. Therefore, T1-weighted magnetic resonance (MR) imaging volume sequences are acquired during gaze and soft tissue motion is quantified using a generalization of the Lucas and Kanade optical flow algorithm to three dimensions. New techniques have been developed for visualizing the 3-D flow field as a series of color-texture mapped 2-D slices or as a combination of volume rendering for display of the anatomy and scintillation rendering for the display of the motion field. We have studied the performance of the algorithm on four-dimensional volume sequences of synthetic motion, simulated motion of a static object imaged by MR, an MR-imaged rotating object and MR-imaged motion in the human orbit during gaze. The accuracy of the analysis is sufficient to characterize motion in the orbit and scintillation rendering is an effective visualization technique for 3-D motion in the orbit.

Journal ArticleDOI
TL;DR: The application of the video-based motion analysis system with surface markers to thumb kinematics is warranted and the similarities of the two different marker techniques throughout the motion cycle were high.

Proceedings ArticleDOI
11 Aug 2002
TL;DR: The mapping between these gait modes clusters better than the original signatures and can be used for recognition purposes alone, or to buttress both of the signatures.
Abstract: The intimate relationship between human walking and running lies within the skeleto-muscular structure. This is expressed as a mapping that can transform computer vision derived gait signatures from running to walking and vice versa, for purposes of deployment in gait as a biometric or for animation in computer graphics. The computer vision technique can extract leg motion by temporal template matching with a model defined by forced coupled oscillators as the basis. The (biometric) signature is derived from Fourier analysis of the variation in the motion of the thigh and lower leg. In fact, the mapping between these gait modes clusters better than the original signatures (of which running is the more potent) and can be used for recognition purposes alone, or to buttress both of the signatures. Moreover, the two signatures can be made invariant to gait mode by using the new mapping.

Journal ArticleDOI
TL;DR: The inverted distance transform of the edge map is used as an edge indicator function for contour detection and the problem of background clutter can be relaxed by taking the object motion into account.
Abstract: We propose a new method for contour tracking in video. The inverted distance transform of the edge map is used as an edge indicator function for contour detection. Using the concept of topographical distance, the watershed segmentation can be formulated as a minimization. This new viewpoint gives a way to combine the results of the watershed algorithm on different surfaces. In particular, our algorithm determines the contour as a combination of the current edge map and the contour, predicted from the tracking result in the previous frame. We also show that the problem of background clutter can be relaxed by taking the object motion into account. The compensation with object motion allows to detect and remove spurious edges in background. The experimental results confirm the expected advantages of the proposed method over the existing approaches.

Proceedings ArticleDOI
11 Aug 2002
TL;DR: An application-oriented solution which has proven accurate, reliable and efficient as demonstrated by experiments on numerous real situations is developed.
Abstract: The paper is concerned with the detection and tracking of obstacles from a camera mounted on a vehicle with a view to driver assistance. To achieve this goal, we have designed a technique entirely based on image motion analysis. We perform the robust estimation of the dominant image motion assumed to be due to the camera motion. Then by considering the outliers to the estimated dominant motion, we can straightforwardly detect obstacles in order to assist car driving. We have added to the detection step a tracking module that also relies on a motion consistency criterion. Time-to-collision is then computed for each validated obstacle. We have thus developed an application-oriented solution which has proven accurate, reliable and efficient as demonstrated by experiments on numerous real situations.

Book ChapterDOI
28 May 2002
TL;DR: In this article, a generative method combining statistical models and algorithms from both texture and motion analysis is proposed to detect textured motion patterns in natural scenes, such as falling snow, raining, flying birds, firework and waterfall.
Abstract: Natural scenes contain rich stochastic motion patterns which are characterized by the movement of a large number of small elements, such as falling snow, raining, flying birds, firework and waterfall. In this paper, we call these motion patterns textured motion and present a generative method that combines statistical models and algorithms from both texture and motion analysis. The generative method includes the following three aspects. 1). Photometrically, an image is represented as a superposition of linear bases in atomic decomposition using an overcomplete dictionary, such as Gabor or Laplacian. Such base representation is known to be generic for natural images, and it is low dimensional as the number of bases is often 100 times smaller than the number of pixels. 2). Geometrically, each moving element (called moveton), such as the individual snowflake and bird, is represented by a deformable template which is a group of several spatially adjacent bases. Such templates are learned through clustering. 3). Dynamically, the movetons are tracked through the image sequence by a stochastic algorithm maximizing a posterior probability. A classic second order Markov chain model is adopted for the motion dynamics. The sources and sinks of the movetons are modeled by birth and death maps. We adopt an EM-like stochastic gradient algorithm for inference of the hidden variables: bases, movetons, birth/death maps, parameters of the dynamics. The learned models are also verified through synthesizing random textured motion sequences which bear similar visual appearance with the observed sequences.

Journal ArticleDOI
01 Aug 2002
TL;DR: A frame rate up-conversion algorithm using adaptive motion compensation (MC) that reduces the blocking artifacts due to block-based processing is proposed.
Abstract: We propose a new frame rate up-conversion (FRC) algorithm using the adaptive motion compensation (MC) that reduces the blocking artifacts due to block-based processing. In the proposed scheme, after conventional motion estimation (ME) between two adjacent frames is performed to construct the motion vectors for the frame to be interpolated, the motion analysis is used to determine the type of motion and the motion-compensated interpolation (MCI) is applied adaptively. Unlike conventional MCI algorithms, the proposed technique utilizes similar neighboring motion vectors to produce multiple motion trajectories. When the proposed MCI is applied, multiple motion trajectories are considered in order to increase the accuracy of the MCI. The proposed method provides high quality format conversion with significantly reduced blocking artifacts.

Journal ArticleDOI
TL;DR: Three algorithms for target motion analysis from range and range-rate target measurements are developed and compared to the theoretical bounds of performance and the results of the application of the developed theory to the ISAR data collected in the recent trials with the Ingara radar are presented.

Journal ArticleDOI
TL;DR: This paper proposes another version of the Kalman filter, to be called Structural Kalman Filter, which can successfully work its role of estimating motion information under such a deteriorating condition as occlusion and experimental results show that the suggested approach is very effective in estimating and tracking non-rigid moving objects reliably.

Journal ArticleDOI
TL;DR: In this paper, a low-cost ultrasonic motion analysis system is described that is capable of measuring these temporal and spatial parameters while subjects walk on the floor by using the propagation delay of sound when transmitted in air.

Proceedings ArticleDOI
03 Dec 2002
TL;DR: The proposed kinematic-based approach for automatic human motion analysis from IR image sequences achieves good performance in gait analysis with different view angles With respect to the walking direction, and is promising for further gait recognition.
Abstract: In an infrared (IR) image sequence of human walking, the human silhouette can be reliably extracted from the background regardless of lighting conditions and colors of the human surfaces and backgrounds in most cases. Moreover, some important regions containing skin, such as face and hands, can be accurately detected in IR image sequences. In this paper, we propose a kinematic-based approach for automatic human motion analysis from IR image sequences. The proposed approach estimates 3D human walking parameters by performing a modified least squares fit of the 3D kinematic model to the 2D silhouette extracted from a monocular IR image sequence, where continuity and symmetry of human walking and detected hand regions are also considered in the optimization function. Experimental results show that the proposed approach achieves good performance in gait analysis with different view angles With respect to the walking direction, and is promising for further gait recognition.

Proceedings ArticleDOI
Yu-Fei Ma1, Hong-Jiang Zhang1
10 Dec 2002
TL;DR: This paper proposes a new motion descriptor to capture motion pattern from video clip by transforming motion vector field to a number of directional slices of energy, which form a multi-dimensional vector, called motion texture.
Abstract: Motion is an important cue for video content perception. However, the lacking of effective motion representation becomes a barrier for automatic video content analysis. In this paper, we propose a new motion descriptor to capture motion pattern from video clip. First, we transform motion vector field to a number of directional slices of energy. Then, these slices are measured by a set of moments. As a result, a multi-dimensional vector, called motion texture, is formed The effectiveness and efficiency of the proposed representation had been validated by motion-based shot retrieval experiments.

Proceedings ArticleDOI
10 Dec 2002
TL;DR: A new combination of medical knowledge, image processing and regression analysis can be used to label human motion in image sequences.
Abstract: We describe a new method for analyzing and extracting human gait motion by combining statistical methods with image processing. The periodic motion of human gait is modeled by trigonometric-polynomial interpolant functions. The gait description is derived by topological analysis guided by medical studies that selects areas from which joint angles are derived by regression analysis. Then, the interpolant functions are fitted to the gait data and whilst showing fidelity to earlier medical studies, also show recognition capability. As such, a new combination of medical knowledge, image processing and regression analysis can be used to label human motion in image sequences.

Proceedings ArticleDOI
11 Aug 2002
TL;DR: A hierarchical finite state machine constructed from 3D motion capture data serves as a prior motion model and motion templates are used as the observation model for a "tracking as recognition" approach.
Abstract: Estimating mode (walking/running/standing) and phases of human locomotion is important for video understanding. We present a "tracking as recognition" approach. A hierarchical finite state machine constructed from 3D motion capture data serves as a prior motion model. Motion templates are used as the observation model. Robustness is achieved by making inferences in the prior motion model which resolves the short-term ambiguity of the observations that may cause a regular tracking formulation to fail. Experiments show very promising results on some difficult sequences.

01 Jan 2002
TL;DR: Two methods for gesture analysis and mapping to music are presented, both independent of specific orientation or location of the subject, and the first deals with gestural segmentation, while the second uses pattern recognition.
Abstract: We report research performed on gesture analysis and mapping to music. Various movements were recorded using 3D optical motion capture. Using this system, we produced animations from movements/dance, and generate in parallel the soundtrack from the dancer's movements. Prior to the actual sound mapping process, we performed various motion analyses. We present here two methods, both independent of specific orientation or location of the subject. The first deals with gestural segmentation, while the second uses pattern recognition.

Proceedings ArticleDOI
05 Dec 2002
TL;DR: A robust subspace approach to extracting layers from images reliably is presented by taking advantage of the fact that homographies induced by planar patches in the scene form a low dimensional linear subspace, which provides a constraint for detecting outliers in the local measurements, thus making the algorithm robust to outliers.
Abstract: Representing images with layers has many important applications, such as video compression, motion analysis, and 3D scene analysis. The paper presents a robust subspace approach to extracting layers from images reliably by taking advantage of the fact that homographies induced by planar patches in the scene form a low dimensional linear subspace. Such a subspace provides not only a feature space where layers in the image domain are mapped onto denser and better-defined clusters, but also a constraint for detecting outliers in the local measurements, thus making the algorithm robust to outliers. By enforcing the subspace constraint, spatial and temporal redundancy from multiple frames are simultaneously utilized, and noise can be effectively reduced. Good layer descriptions are shown to be extracted in the experimental results.

Proceedings ArticleDOI
08 May 2002
TL;DR: In this paper, a structural analysis based method for fault diagnosis purposes is presented, which uses the structural model of the system and utilizes the matching idea to extract system's inherent redundant information.
Abstract: The paper presents a structural analysis based method for fault diagnosis purposes. The method uses the structural model of the system and utilizes the matching idea to extract system's inherent redundant information. The structural model is represented by a bipartite directed graph. FDI possibilities are examined by further analysis of the obtained information. The method is illustrated by applying on the LTI model of motion of a fixed-wing aircraft.

Proceedings ArticleDOI
10 Dec 2002
TL;DR: The results suggest that the body dynamics have rich complexity in phase space, and within an envelope, small nodes may exist to give variation and controllability without damaging stability.
Abstract: A novel concept for controlling a humanoid robot, global dynamics, is investigated by motion capture experiments. This concept generalises human/humanoid body motion as successive transitions of "envelopes", where body dynamics is exploited and high level control input is adopted only at the "nodes", where the body is unstable and control input is necessary. Dynamical rising is chosen for our experiment and full body motion is measured. By evaluating the coordination between joint angles, we have seen variation of motion and corresponding envelope volume (stable region within the phase space). Also, by analysis in the phase space (including variables both of positions and their time derivatives), we have seen variations not only according to boundary conditions for the motion (i.e., adopting physical restriction) but also according to experience. Our results suggest that the body dynamics have rich complexity in phase space. Within an envelope, small nodes may exist to give variation and controllability without damaging stability.

Journal ArticleDOI
TL;DR: A gesture-tracking system using real-time local range on-demand and a method performing range processing only when necessary and where necessary, which results in dynamic regional range images that contain only information needed by the system.
Abstract: This paper presents a new approach to the range data utilization in a gesture-tracking system. The use of three-dimensional data is essential for human motion analysis; however, the speed of complete range estimation prohibits from including it in most real-time systems. This work describes a gesture-tracking system using real-time local range on-demand. The system represents a gesture-controlled interface for interactive visual exploration of large data sets. The paper describes a method performing range processing only when necessary and where necessary. Range data is processed only for non-static regions of interest. This is accomplished by a set of filters on the color, motion, and range data. The speed-up achieved is between 1.70 and 2.15. The algorithm also includes a robust skin-color segmentation insensitive to illumination changes. Selective range processing results in dynamic regional range images that contain only information needed by the system.