scispace - formally typeset
Search or ask a question

Showing papers on "Motion analysis published in 1997"


Proceedings ArticleDOI
16 Jun 1997
TL;DR: The paper gives an overview of the various tasks involved in motion analysis of the human body, and focuses on three major areas related to interpreting human motion: motion analysis involving human body parts, tracking of human motion using single or multiple cameras, and recognizing human activities from image sequences.
Abstract: Human motion analysis is receiving increasing attention from computer vision researchers. This interest is motivated by a wide spectrum of applications, such as athletic performance analysis, surveillance, man-machine interfaces, content-based image storage and retrieval, and video conferencing. The paper gives an overview of the various tasks involved in motion analysis of the human body. The authors focus on three major areas related to interpreting human motion: 1) motion analysis involving human body parts, 2) tracking of human motion using single or multiple cameras, and 3) recognizing human activities from image sequences. Motion analysis of human body parts involves the low-level segmentation of the human body into segments connected by joints, and recovers the 3D structure of the human body using its 2D projections over a sequence of images. Tracking human motion using a single or multiple camera focuses on higher-level processing, in which moving humans are tracked without identifying specific parts of the body structure. After successfully matching the moving human image from one frame to another in image sequences, understanding the human movements or activities comes naturally, which leads to a discussion of recognizing human activities. The review is illustrated by examples.

1,665 citations


Journal ArticleDOI
TL;DR: A computer vision system for observing facial motion by using an optimal estimation optical flow method coupled with geometric, physical and motion-based dynamic models describing the facial structure produces a reliable parametric representation of the face's independent muscle action groups, as well as an accurate estimate of facial motion.
Abstract: We describe a computer vision system for observing facial motion by using an optimal estimation optical flow method coupled with geometric, physical and motion-based dynamic models describing the facial structure. Our method produces a reliable parametric representation of the face's independent muscle action groups, as well as an accurate estimate of facial motion. Previous efforts at analysis of facial expression have been based on the facial action coding system (FACS), a representation developed in order to allow human psychologists to code expression from static pictures. To avoid use of this heuristic coding scheme, we have used our computer vision system to probabilistically characterize facial motion and muscle activation in an experimental population, thus deriving a new, more accurate, representation of human facial expressions that we call FACS+. Finally, we show how this method can be used for coding, analysis, interpretation, and recognition of facial expressions.

877 citations


Journal ArticleDOI
TL;DR: A new registration algorithm based on spline representations of the displacement field which can be specialized to solve all of the problems in multiframe image analysis, including the computation of optic flow, stereo correspondence, structure from motion, and feature tracking.
Abstract: The problem of image registration subsumes a number of problems and techniques in multiframe image analysis, including the computation of optic flow (general pixel-based motion), stereo correspondence, structure from motion, and feature tracking. We present a new registration algorithm based on spline representations of the displacement field which can be specialized to solve all of the above mentioned problems. In particular, we show how to compute local flow, global (parametric) flow, rigid flow resulting from camera egomotion, and multiframe versions of the above problems. Using a spline-based description of the flow removes the need for overlapping correlation windows, and produces an explicit measure of the correlation between adjacent flow estimates. We demonstrate our algorithm on multiframe image registration and the recovery of 3D projective scene geometry. We also provide results on a number of standard motion sequences.

535 citations


Journal ArticleDOI
TL;DR: This work has shown that the paraperspective factorization method can be applied to a much wider range of motion scenarios, including image sequences containing motion toward the camera and aerial image sequences of terrain taken from a low-altitude airplane.
Abstract: The factorization method, first developed by Tomasi and Kanade (1992), recovers both the shape of an object and its motion from a sequence of images, using many images and tracking many feature points to obtain highly redundant feature position information. The method robustly processes the feature trajectory information using singular value decomposition (SVD), taking advantage of the linear algebraic properties of orthographic projection. However, an orthographic formulation limits the range of motions the method can accommodate. Paraperspective projection, first introduced by Ohta et al. (1981), is a projection model that closely approximates perspective projection by modeling several effects not modeled under orthographic projection, while retaining linear algebraic properties. Our paraperspective factorization method can be applied to a much wider range of motion scenarios, including image sequences containing motion toward the camera and aerial image sequences of terrain taken from a low-altitude airplane.

511 citations


Proceedings ArticleDOI
17 Jun 1997
TL;DR: This work presents a variant of the EM algorithm that can segment image sequences by fitting multiple smooth flow fields to the spatiotemporal data and shows how the estimation of a single smooth flow field can be performed in closed form, thus making the multiple model estimation computationally feasible.
Abstract: Grouping based on common motion, or "common fate" provides a powerful cue for segmenting image sequences. Recently a number of algorithms have been developed that successfully perform motion segmentation by assuming that the motion of each group can be described by a low dimensional parametric model (e.g. affine). Typically the assumption is that motion segments correspond to planar patches in 3D undergoing rigid motion. Here we develop an alternative approach, where the motion of each group is described by a smooth dense flow field and the stability of the estimation is ensured by means of a prior distribution on the class of flow fields. We present a variant of the EM algorithm that can segment image sequences by fitting multiple smooth flow fields to the spatiotemporal data. Using the method of Green's functions, we show how the estimation of a single smooth flow field can be performed in closed form, thus making the multiple model estimation computationally feasible. Furthermore, the number of models is estimated automatically using similar methods to those used in the parametric approach. We illustrate the algorithm's performance on synthetic and real image sequences.

251 citations


Journal ArticleDOI
TL;DR: To assist human analysis of video data, a technique has been developed to perform automatic, content-based video indexing from object motion to analyse the semantic content of the video.

188 citations


Journal ArticleDOI
TL;DR: This paper presents a general framework for image-based analysis of 3D repeating motions that addresses two limitations in the state of the art, and derives necessary and sufficient conditions for an image sequence to be the projection of a3D repeating motion, accounting for changes in viewpoint and other camera parameters.
Abstract: This paper presents a general framework for image-based analysis of 3D repeating motions that addresses two limitations in the state of the art First, the assumption that a motion be perfectly even from one cycle to the next is relaxed Real repeating motions tend not to be perfectly even, ie, the length of a cycle varies through time because of physically important changes in the scene A generalization of period is defined for repeating motions that makes this temporal variation explicit This representation, called the period trace, is compact and purely temporal, describing the evolution of an object or scene without reference to spatial quantities such as position or velocity Second, the requirement that the observer be stationary is removed Observer motion complicates image analysis because an object that undergoes a 3D repeating motion will generally not produce a repeating sequence of images Using principles of affine invariance, we derive necessary and sufficient conditions for an image sequence to be the projection of a 3D repeating motion, accounting for changes in viewpoint and other camera parameters Unlike previous work in visual invariance, however, our approach is applicable to objects and scenes whose motion is highly non-rigid Experiments on real image sequences demonstrate how the approach may be used to detect several types of purely temporal motion features, relating to motion trends and irregularities Applications to athletic and medical motion analysis are discussed

171 citations


Journal ArticleDOI
TL;DR: The decomposition of image motion into a 2D parametric motion and residual epipolar parallax displacements avoids many of the inherent ambiguities and instabilities associated with decomposing the imagemotion into its rotational and translational components, and hence makes the computation of ego-motion or 3D structure estimation more robust.
Abstract: A method for computing the 3D camera motion (the ego-motion) in a static scene is described, where initially a detected 2D motion between two frames is used to align corresponding image regions. We prove that such a 2D registration removes all effects of camera rotation, even for those image regions that remain misaligned. The resulting residual parallax displacement field between the two region-aligned images is an epipolar field centered at the FOE (Focus-of-Expansion). The 3D camera translation is recovered from the epipolar field. The 3D camera rotation is recovered from the computed 3D translation and the detected 2D motion. The decomposition of image motion into a 2D parametric motion and residual epipolar parallax displacements avoids many of the inherent ambiguities and instabilities associated with decomposing the image motion into its rotational and translational components, and hence makes the computation of ego-motion or 3D structure estimation more robust.

160 citations


Proceedings ArticleDOI
TL;DR: A novel technique for motion analysis is proposed where, according to classification of motion vectors using the stability and colinearity, an iterative operation is performed to obtain an accurate solution.
Abstract: A three-dimensional reconstruction system -- the 3DR system -- has been developed. The main feature of the 3DR system is that it converts monocular image sequences to stereoscopic ones. This provides the following advantages: (1) Stereoscopic images can be reproduced even from films taken in the past. (2) A compact 3D-scene capturing system using a monocular camera is realized. The key 3DR technology is depth sensing based on motion parallax. A novel technique for motion analysis is proposed where, according to classification of motion vectors using the stability and colinearity, an iterative operation is performed to obtain an accurate solution. Preliminary evaluations have shown that not only was the motion parallax analyzed very accurately but also stereoscopic images of high quality were generated.

59 citations


Journal ArticleDOI
TL;DR: It was concluded that the patterns of hindfoot motion are similar amongst individuals in some joint parameters but not in others, and dorsiflexion/plantarflexion had a very consistent pattern, followed by inversion/eversion and to a lesser extent by compression/distraction.

58 citations


Proceedings ArticleDOI
17 Jun 1997
TL;DR: The authors present a fast electronic image stabilization system that compensates for 3D rotation using the extended Kalman filter framework to estimate the rotation between frames, which is represented using unit quaternions.
Abstract: The authors present a fast electronic image stabilization system that compensates for 3D rotation. The extended Kalman filter framework is employed to estimate the rotation between frames, which is represented using unit quaternions. A small set of automatically selected and tracked feature points are used as measurements. The effectiveness of this technique is also demonstrated by constructing mosaic images from the motion estimates, and comparing them to mosaics built from 2D stabilization algorithms. Two different stabilization schemes are presented. The first, implemented in a real-time platform based on a Datacube MV200 board, estimates the motion between two consecutive frames and is able to process gray level images of resolution 128/spl times/120 at 10 Hz. The second scheme estimates the motion between the current frame and an inverse mosaic; this allows better estimation without the need for indexing the new image frames. Experimental results for both schemes using real and synthetic image sequences are presented.

Proceedings ArticleDOI
26 Oct 1997
TL;DR: A system has been developed to analyze and index surveillance videos based on the motions of objects in the scene using a segmentation and tracking system that extracts trajectories from compressed video in a multiresolution manner and stored in a database.
Abstract: A system has been developed to analyze and index surveillance videos based on the motions of objects in the scene. A segmentation and tracking system extracts trajectories from compressed video, which are represented in a multiresolution manner and stored in a database. Hand-drawn queries can be submitted to the system for imprecise searches. The system was tested on real video footage with numerous moving objects; the success rates for tracking and recall in most cases exceeded 70 percent, a promising result.

Proceedings ArticleDOI
17 Jun 1997
TL;DR: This paper first extracts a spatial texture-based partition using an unsupervised MRF approach and then groups the regions obtained according to a motion-based criterion, which relies on two motion estimation techniques and exploits centextual information between regions.
Abstract: This paper deals with the problem of motion-based segmentation of image sequences. Such partitions are multiple-purpose in dynamic scene analysis. We first extract a spatial texture-based partition using an unsupervised MRF approach. The regions obtained are then grouped according to a motion-based criterion. This grouping process relies on two motion estimation techniques and exploits centextual information between regions. In contrast with clustering techniques, region grouping is formalized as a motion-based graph labeling process, within a Markovian framework. Results on real-world image sequences are shown and validate the proposed method.

Journal ArticleDOI
TL;DR: This work uses adaptive Hough transform to iteratively refine the relevant parameter space in a "coarse-to-fine" fashion and can robustly recover 3D motion parameters, reject outliers of the flow estimates, and deal with multiple moving objects present in the scene.
Abstract: We present a method to determine 3D motion and structure of multiple objects from two perspective views, using adaptive Hough transform. In our method, segmentation is determined based on a 3D rigidity constraint. Instead of searching candidate solutions over the entire five-dimensional translation and rotation parameter space, we only examine the two-dimensional translation space. We divide the input image into overlapping patches, and, for each sample of the translation space, we compute the rotation parameters of patches using least-squares fit. Every patch votes for a sample in the five-dimensional parameter space. For a patch containing multiple motions, we use a redescending M-estimator to compute rotation parameters of a dominant motion within the patch. To reduce computational and storage burdens of standard multidimensional Hough transform, we use adaptive Hough transform to iteratively refine the relevant parameter space in a "coarse-to-fine" fashion. Our method can robustly recover 3D motion parameters, reject outliers of the flow estimates, and deal with multiple moving objects present in the scene. Applications of the proposed method to both synthetic and real image sequences are demonstrated with promising results.

Proceedings ArticleDOI
17 Jun 1997
TL;DR: A novel application of the Expectation-Maximization algorithm to the global analysis of articulated motion utilizes a kinematic model to constrain the motion estimates, producing a segmentation of the flow field into parts with different articulated motions.
Abstract: We present a novel application of the Expectation-Maximization algorithm to the global analysis of articulated motion. The approach utilizes a kinematic model to constrain the motion estimates, producing a segmentation of the flow field into parts with different articulated motions. Experiments with synthetic and real images are described.

Proceedings ArticleDOI
09 Nov 1997
TL;DR: This work describes an approach which evaluates motion information comprised in optical flow vectors which has been implemented on a hardware platform which consists of the system MiniVISTA and two Motorola PowerPCs and has been installed in an experimental road vehicle.
Abstract: Within a video-based driver assistance system, it is important to detect and track nearby objects. We describe an approach which evaluates motion information comprised in optical flow vectors. Optical flow contains information about the motion of a camera and about the scene's 3D structure. We use that information to detect obstacles in front of the vehicle. Since the detection is based on motion no a-priori knowledge about obstacle shape is required. Optical flow vectors are estimated from spatio-temporal derivatives of the gray value function which are computed at video frame rate by the hardware MiniVISTA. To eliminate outliers and to speed up obstacle detection the estimated vectors are clustered before they are passed to the obstacle test. The obstacle test separates moving objects from the stationary environment and separates elevated objects from the ground plane. This test is a state estimation problem and enables us to enlarge the motion stereo basis by applying a Kalman filter to track optical flow vectors over subsequent image frames. After 3D grouping of tracked flow vectors of the same obstacle class a state description of each moving object is automatically initialized and updated using a Kalman filter. This algorithm has been implemented on a hardware platform which consists of the system MiniVISTA and two Motorola PowerPCs. This system has been installed in an experimental road vehicle. Results obtained under very different weather conditions show the robustness of the approach.

Proceedings ArticleDOI
21 Apr 1997
TL;DR: This paper proposes the use of motion analysis (MA) to adapt to scene content and can achieve from 2% to 13.9% savings in bits while maintaining similar quality.
Abstract: The MPEG video compression standard effectively exploits spatial, temporal, and coding redundancies in the algorithm In its generic form, however, only a minimal amount of scene adaptation is performed Video can be further compressed by taking advantage of scenes where the temporal statistics allow larger inter-reference frame distances This paper proposes the use of motion analysis (MA) to adapt to scene content The actual picture type (I, P, or B) decision is made by examining the accumulation of motion measurements since the last reference frame was labeled Depending on the video content, this proposed algorithm can achieve from 2% to 139% savings in bits while maintaining similar quality

Proceedings ArticleDOI
10 Jan 1997
TL;DR: In this article, a temporal segmentation algorithm is proposed to automatically extract key frames for each video object in an MPEG-4 compressed sequence based on the prediction model chosen by the encoder for individual macroblocks.
Abstract: The MPEG-4 object-based coding standard, designed as a common platform for all multimedia applications, is inherently well-suited for video indexing applications. To fully exploit the advantages offered by MPEG-4, however, a reconsideration of existing indexing strategies is required. This paper proposes a new object-based framework for video indexing and retrieval that treats as the basic indexing unit the object itself, where changes in content are detected through observations made on the objects in the video sequence. We present a temporal segmentation algorithm that is designed to automatically extract key frames for each video object in an MPEG-4 compressed sequence based on the prediction model chosen by the encoder for individual macroblocks. An extension to the existing MPEG-4 syntax is presented for conducting and facilitating vast database searches. The data presented in the proposed 'indexing field' are: the birth and death frames of individual objects, global motion characteristics/camera operations observed in the scene, representative key frames that capture the major transformations each object undergoes, and the dominant motion characteristics of each object throughout its lifetime. We present the validity of the proposed scheme by results obtained on several MPEG-4 test sequences.© (1997) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Journal ArticleDOI
TL;DR: A search paradigm which is based on geometric properties of the normal flow field, and consists in considering a family of search subspaces to estimate the egomotion parameters, and various algorithms are proposed within this framework.
Abstract: We address the problem of egomotion estimation for a monocular observer moving under arbitrary translation and rotation, in an unknown environment. The method we propose is uniquely based on the spatio-temporal image derivatives, or the normal flow. We introduce a search paradigm which is based on geometric properties of the normal flow field, and consists in considering a family of search subspaces to estimate the egomotion parameters. Various algorithms are proposed within this framework. In order to decrease the noise sensitivity of the estimation methods, we use statistical tools, based on robust regression theory. Finally, we present and discuss a wide variety of experiments with synthetic and real images, for various kinds of camera motion.

Patent
25 Nov 1997
TL;DR: In this paper, a scene change detector sums the correlation maximum values of correlation surfaces produced by a motion vector determining circuit across a current image and compares this with a threshold value (Thres) to detect scene changes.
Abstract: A scene change detector sums the correlation maximum values of correlation surfaces produced by a motion vector determining circuit (2) across a current image and compares this with a threshold value (Thres) to detect scene changes. A statistical analysis of signals (V x , V y , Y) representing the current image may be made and a resulting value differentiated. Peaks in this differentiated value represent scene changes. Finally, rapid changes in the number of valid vectors found in a motion analysis of a current image may also be used to indicate a scene change.

Proceedings ArticleDOI
14 Jul 1997
TL;DR: Experiments show that the peak signal noise ratio (PSNR) between camera and reconstructed synthetic images can be increased by up to 7 dB compared to global illumination compensation and the average estimation error of the motion parameters is reduced by 40%.
Abstract: We present a model-based algorithm for the estimation of three-dimensional motion parameters of an object moving in 3D-space. Photometric effects are taken into account by adding different illumination models to the virtual scene. Using the additional information from three-dimensional geometric models of the scene leads to linear algorithms for the parameter estimation of the illumination models which are all computationally efficient. Experiments show that the peak signal noise ratio (PSNR) between camera and reconstructed synthetic images can be increased by up to 7 dB compared to global illumination compensation. The average estimation error of the motion parameters is at the same time reduced by 40%.

Proceedings ArticleDOI
07 Sep 1997
TL;DR: This work analyzes the maximum task workspace of executability, the optimal robot-task configuration and the range of necessary or possible scaling factors for motion and forces to investigate if a compliant motion trajectory obtained by the above described demonstration processes can be executed on a given robot.
Abstract: Compliant motion tasks can be programmed by human demonstration without the use of the actual robot system. In space applications and sensor off-line programming systems the task is planned and programmed in a virtual reality environment which simulates the world model of the task to be performed. Furthermore, in industrial and service applications teaching devices measuring the force and motion of the human hand allow the operator to demonstrate a task in a very natural way. By using dynamic manipulability analysis of motion and force, we investigate if a compliant motion trajectory obtained by the above described demonstration processes can be executed on a given robot. We analyze the maximum task workspace of executability, the optimal robot-task configuration and the range of necessary or possible scaling factors for motion and forces. The approach is demonstrated using a numerical example.

Journal ArticleDOI
TL;DR: In this algorithm, the segmentation into regions is based on an advanced motion analysis, successively achieving a multiresolution motion field estimation and a segmentation based on a Markovian statistical approach, which insures a good temporal coherence of segmentation maps and an identification of covered areas.
Abstract: This paper describes a region-based coding algorithm developed by Thomson Multimedia and submitted to MPEG-4 tests of November 1995 and January 1996. In this algorithm, the segmentation into regions is based on an advanced motion analysis, successively achieving a multiresolution motion field estimation and a segmentation based on a Markovian statistical approach, which insures a good temporal coherence of segmentation maps and an identification of covered areas. Moreover, the accurate motion description allows intermediate frames interpolation which can increase the displayed frame rate. Simulation results and MPEG-4 tests of January 1996 have shown that the algorithm is as efficient as block-based coding algorithms like MPEG-1 or H.263 for the compression functionality, while offering temporal scalability by higher frame rates. Moreover, description of the scene in coherent motion regions may be seen as an intermediate step toward object-based functionalities.

Journal ArticleDOI
01 Sep 1997-Spine
TL;DR: A system has been developed capable of calculating three‐dimensional spinal motion based on measurements of a series of computed tomography images that has an accuracy similar to that of current motion analysis methods, but future studies will be necessary to apply this system in vivo.
Abstract: STUDY DESIGN A three-dimensional, noninvasive motion analysis method was developed by monitoring the orientation of the principal axes of each vertebra. OBJECTIVES To develop a method of performing three-dimensional, noninvasive motion analysis of the spine using computed tomography data. SUMMARY OF BACKGROUND DATA The concept of using principal axes of the moment of inertia tensor to measure the orientation and position of a rigid body has been applied to the wrist and subtalar joints, but has not yet been applied to the spine. METHODS Scans were taken of two isolated vertebrae in various known positions. Centroids, area, moments, and product of inertia of each scan were determined using a commercial program. Custom software combined data using the parallel axis theorem to give three-dimensional data for each vertebra. Changes in the centroid and principal axes were used to calculate translation and rotation, respectively. RESULTS The system accuracy was within 1.0 degree in rotation and 1.0 mm in translation. Some errors occurred in minor motions when a smaller number of scans were used. System resolution was 0.43 mm. CONCLUSIONS A system has been developed capable of calculating three-dimensional spinal motion based on measurements of a series of computed tomography images. The system has an accuracy similar to that of current motion analysis methods, but future studies will be necessary to apply this system in vivo.

Journal ArticleDOI
TL;DR: Inverse dynamic modeling can be an effective tool for motion analysis in patients with cerebellar disorders and gives further insight into the parameters that may be controlled by the central nervous system, as well as improving the sensitivity of motion analysis of limb movements.

Journal ArticleDOI
TL;DR: The effect on selected temporal characteristics were that as the speed increased the cadence changed and the affected side single support times as a percentage of the gait cycle were altered.
Abstract: The aim of this study was to assess, by means of gait analysis, the effect on the gait of a transtibia1 amputee of altering the mass and the moment of inertia of a dynamic elastic response prosthesis. One male amputee was analysed for four to five walking trials at normal and fast cadences, using the VICON system of motion analysis and an AMTI force plate. The kinematic variables of cadence, swing time, single support time and joint angles for the knee and hip on the affected and intact sides were analysed. The ground reaction force was also analysed. The sample size was limited to one as an example to indicate the changes which are possible through simply changing the inertial characteristics. Descriptive statistics are used to demonstrate these changes. Three mass conditions for the prosthesis were analysed m,: 1080g; m,: 1080 + 530g; m,: 1080 +1460g. The mi condition is the mass of the prosthesis with no added weight while m2 and m, were attachments of the same geometrical shape but were made from different materials. It was felt that the large mass range would highlight biomechanical adjustments as a result of its alteration. The effect on selected temporal characteristics were that as the speed increased the cadence changed and the affected side single support times as a percentage of the gait cycle were altered. The effect on the joint All correspondence to be addressed to

Journal ArticleDOI
TL;DR: There was a high correlation between the mean motion paths produced by the two systems, indicating that they were very similar, and the electromagnetic motion analysis system was able to produce these similar results in a fraction of the time required by the video-based system.
Abstract: The purpose of this study was to compare two-dimensional rearfoot motion during walking measured by a traditional video-based motion analysis system to that of an electromagnetic analysis system. Twenty-five individuals (15 men, 10 women) with a mean age of 29.8 years served as subjects for this study. The results of the study showed that there was a high correlation (r = 0.945) between the mean motion paths produced by the two systems, indicating that they were very similar. The electromagnetic motion analysis system was able to produce these similar results in a fraction of the time required by the video-based system.

Proceedings ArticleDOI
26 Feb 1997
TL;DR: Experimental results demonstrate robust, real-time recognition and tracking over thousands of image frames over which cars are recognized and tracked.
Abstract: A hard real time vision system has been developed that recognizes and tracks multiple cars from video sequences taken from a car driving on highways and country roads. Recognition is accomplished by combining the analysis of single image frames with the analysis of the motion information provided by multiple consecutive image frames. In single image frames, cars are recognized by matching deformable gray-scale templates, by detecting image features, such as corners, and by evaluating how these features relate to each other. Cars are also recognized by tracking motion parameters that are typical for cars. The vision system utilizes the hard real-time operating system Maruti which guarantees that the timing constraints on the various vision processes are satisfied. The dynamic creation and termination of tracking processes optimizes the amount of computational resources spent and allows fast detection and tracking of multiple cars. Experimental results demonstrate robust, real-time recognition and tracking over thousands of image frames.© (1997) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Journal ArticleDOI
TL;DR: In this paper, a method of tracking moving objects in a cluttered scene is described, in which the contours are determined by using optical flow and edges in a long sequence, and the object boundary is determined by the edges near the predicted contours.
Abstract: This paper describes a method of tracking moving objects in a cluttered scene. If the motion of an object is similar to that of the background, the contour is not determined only by optical flow. It is also difficult to determine the contour only from edges because many edges may be extracted in the background and no edges may be extracted on some parts of the contour. But the contour is determined by using optical flow and edges in a long sequence. When an object motion is different from the neighborhood, the motion boundary is extracted as the contour. Some edges near the motion boundary are extracted and replace corresponding parts of the motion boundary. While the object is moving, more edges are added to the contour. When two objects with similar motion overlap, the contours are predicted by the stored contours and the current optical flow. The object boundary is determined by the edges near the predicted contours. Experimental results for synthetic and real scenes show the usefulness of the method.

Proceedings ArticleDOI
21 Apr 1997
TL;DR: It turns out that the wavelet transform can be used efficiently in a Kalman filtering framework to perform detection and tracking of moving objects in digital image sequences.
Abstract: This paper addresses the problem of detecting and tracking moving objects in digital image sequences. The main goal is to detect and select mobile objects in a scene, construct the trajectories, and eventually reconstruct the target objects or their signatures. It is assumed that the image sequences are acquired from imaging sensors. The method is based on spatio-temporal continuous wavelet transforms, discretized for digital signal analysis. It turns out that the wavelet transform can be used efficiently in a Kalman filtering framework to perform detection and tracking. Several families of wavelets are considered for motion analysis according to the specific spatio-temporal transformation. Their construction is based on mechanical parameters describing uniform motion, translation, rotation, acceleration, and deformation. The main idea is that each kind of motion generates a specific signal transformation, which is analyzed by a suitable family of continuous wavelets. The analysis is therefore associated with a set of operators that describe the signal transformations at hand. These operators are then associated with a set of selectivity criteria. This leads to a set of filters that are tuned to the moving objects of interest.