scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Human motion analysis: a review

16 Jun 1997-pp 90-102
TL;DR: The paper gives an overview of the various tasks involved in motion analysis of the human body, and focuses on three major areas related to interpreting human motion: motion analysis involving human body parts, tracking of human motion using single or multiple cameras, and recognizing human activities from image sequences.
Abstract: Human motion analysis is receiving increasing attention from computer vision researchers. This interest is motivated by a wide spectrum of applications, such as athletic performance analysis, surveillance, man-machine interfaces, content-based image storage and retrieval, and video conferencing. The paper gives an overview of the various tasks involved in motion analysis of the human body. The authors focus on three major areas related to interpreting human motion: 1) motion analysis involving human body parts, 2) tracking of human motion using single or multiple cameras, and 3) recognizing human activities from image sequences. Motion analysis of human body parts involves the low-level segmentation of the human body into segments connected by joints, and recovers the 3D structure of the human body using its 2D projections over a sequence of images. Tracking human motion using a single or multiple camera focuses on higher-level processing, in which moving humans are tracked without identifying specific parts of the body structure. After successfully matching the moving human image from one frame to another in image sequences, understanding the human movements or activities comes naturally, which leads to a discussion of recognizing human activities. The review is illustrated by examples.
Citations
More filters
Journal ArticleDOI
TL;DR: The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends to discuss the important issues related to tracking including the use of appropriate image features, selection of motion models, and detection of objects.
Abstract: The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends. Object tracking, in general, is a challenging problem. Difficulties in tracking objects can arise due to abrupt object motion, changing appearance patterns of both the object and the scene, nonrigid object structures, object-to-object and object-to-scene occlusions, and camera motion. Tracking is usually performed in the context of higher-level applications that require the location and/or shape of the object in every frame. Typically, assumptions are made to constrain the tracking problem in the context of a particular application. In this survey, we categorize the tracking methods on the basis of the object and motion representations used, provide detailed descriptions of representative methods in each category, and examine their pros and cons. Moreover, we discuss the important issues related to tracking including the use of appropriate image features, selection of motion models, and detection of objects.

5,318 citations

Journal ArticleDOI
TL;DR: A new approach toward target representation and localization, the central component in visual tracking of nonrigid objects, is proposed, which employs a metric derived from the Bhattacharyya coefficient as similarity measure, and uses the mean shift procedure to perform the optimization.
Abstract: A new approach toward target representation and localization, the central component in visual tracking of nonrigid objects, is proposed. The feature histogram-based target representations are regularized by spatial masking with an isotropic kernel. The masking induces spatially-smooth similarity functions suitable for gradient-based optimization, hence, the target localization problem can be formulated using the basin of attraction of the local maxima. We employ a metric derived from the Bhattacharyya coefficient as similarity measure, and use the mean shift procedure to perform the optimization. In the presented tracking examples, the new method successfully coped with camera motion, partial occlusions, clutter, and target scale variations. Integration with motion filters and data association techniques is also discussed. We describe only a few of the potential applications: exploitation of background information, Kalman tracking using motion models, and face tracking.

4,996 citations

Proceedings ArticleDOI
23 Aug 2004
TL;DR: This paper construct video representations in terms of local space-time features and integrate such representations with SVM classification schemes for recognition and presents the presented results of action recognition.
Abstract: Local space-time features capture local events in video and can be adapted to the size, the frequency and the velocity of moving patterns. In this paper, we demonstrate how such features can be used for recognizing complex motion patterns. We construct video representations in terms of local space-time features and integrate such representations with SVM classification schemes for recognition. For the purpose of evaluation we introduce a new video database containing 2391 sequences of six human actions performed by 25 people in four different scenarios. The presented results of action recognition justify the proposed method and demonstrate its advantage compared to other relative approaches for action recognition.

3,238 citations


Cites background from "Human motion analysis: a review"

  • ...All of these conditions introduce challenging problems that have been addressed in computer vision in the past (see [1, 11] for a review)....

    [...]

Journal ArticleDOI
TL;DR: A view-based approach to the representation and recognition of human movement is presented, and a recognition method matching temporal templates against stored instances of views of known actions is developed.
Abstract: A view-based approach to the representation and recognition of human movement is presented. The basis of the representation is a temporal template-a static vector-image where the vector value at each point is a function of the motion properties at the corresponding spatial location in an image sequence. Using aerobics exercises as a test domain, we explore the representational power of a simple, two component version of the templates: The first value is a binary value indicating the presence of motion and the second value is a function of the recency of motion in a sequence. We then develop a recognition method matching temporal templates against stored instances of views of known actions. The method automatically performs temporal segmentation, is invariant to linear changes in speed, and runs in real-time on standard platforms.

2,932 citations

Journal ArticleDOI
TL;DR: The context for socially interactive robots is discussed, emphasizing the relationship to other research fields and the different forms of “social robots”, and a taxonomy of design methods and system components used to build socially interactive Robots is presented.

2,869 citations

References
More filters
Proceedings ArticleDOI
12 Nov 1981
TL;DR: In this article, a method for finding the optical flow pattern is presented which assumes that the apparent velocity of the brightness pattern varies smoothly almost everywhere in the image, and an iterative implementation is shown which successfully computes the Optical Flow for a number of synthetic image sequences.
Abstract: Optical flow cannot be computed locally, since only one independent measurement is available from the image sequence at a point, while the flow velocity has two components. A second constraint is needed. A method for finding the optical flow pattern is presented which assumes that the apparent velocity of the brightness pattern varies smoothly almost everywhere in the image. An iterative implementation is shown which successfully computes the optical flow for a number of synthetic image sequences. The algorithm is robust in that it can handle image sequences that are quantized rather coarsely in space and time. It is also insensitive to quantization of brightness levels and additive noise. Examples are included where the assumption of smoothness is violated at singular points or along lines in the image.

8,078 citations

Journal ArticleDOI
TL;DR: Pfinder is a real-time system for tracking people and interpreting their behavior that uses a multiclass statistical model of color and shape to obtain a 2D representation of head and hands in a wide range of viewing conditions.
Abstract: Pfinder is a real-time system for tracking people and interpreting their behavior. It runs at 10 Hz on a standard SGI Indy computer, and has performed reliably on thousands of people in many different physical locations. The system uses a multiclass statistical model of color and shape to obtain a 2D representation of head and hands in a wide range of viewing conditions. Pfinder has been successfully used in a wide range of applications including wireless interfaces, video databases, and low-bandwidth coding.

4,280 citations

Journal ArticleDOI
TL;DR: The human visual process can be studied by examining the computational problems associated with deriving useful information from retinal images by applying the approach to the problem of representing three-dimensional shapes for the purpose of recognition.
Abstract: The human visual process can be studied by examining the computational problems associated with deriving useful information from retinal images. In this paper, we apply this approach to the problem of representing three-dimensional shapes for the purpose of recognition. 1. Three criteria, accessibility, scope and uniqueness, and stability and sensitivity, are presented for judging the usefulness of a representation for shape recognition. 2. Three aspects of a representation9s design are considered, (i) the representation9s coordinate system, (ii) its primitives, which are the primary units of shape information used in the representation, and (iii) the organization the representation imposes on the information in its descriptions. 3. In terms of these design issues and the criteria presented, a shape representation for recognition should: (i) use an object-centred coordinate system, (ii) include volumetric primitives of varied sizes, and (iii) have a modular organization. A representation based on a shape9s natural axes (for example the axes identified by a stick figure) follows directly from these choices. 4. The basic process for deriving a shape description in this representation must involve: (i) a means for identifying the natural axes of a shape in its image and (ii) a mechanism for transforming viewer-centred axis specifications to specifications in an object-centred coordinate system. 5. Shape recognition involves: (i) a collection of stored shape descriptions, and (ii) various indexes into the collection that allow a newly derived description to be associated with an appropriate stored description. The most important of these indexes allows shape recognition to proceed conservatively from the general to the specific based on the specificity of the information available from the image. 6. New constraints supplied by a conservative recognition process can be used to extract more information from the image. A relaxation process for carrying out this constraint analysis is described.

2,256 citations

Proceedings ArticleDOI
15 Jun 1992
TL;DR: The recognition rate is improved by increasing the number of people used to generate the training data, indicating the possibility of establishing a person-independent action recognizer.
Abstract: A human action recognition method based on a hidden Markov model (HMM) is proposed. It is a feature-based bottom-up approach that is characterized by its learning capability and time-scale invariability. To apply HMMs, one set of time-sequential images is transformed into an image feature vector sequence, and the sequence is converted into a symbol sequence by vector quantization. In learning human action categories, the parameters of the HMMs, one per category, are optimized so as to best describe the training sequences from the category. To recognize an observed sequence, the HMM which best matches the sequence is chosen. Experimental results for real time-sequential images of sports scenes show recognition rates higher than 90%. The recognition rate is improved by increasing the number of people used to generate the training data, indicating the possibility of establishing a person-independent action recognizer. >

1,477 citations


"Human motion analysis: a review" refers background or methods in this paper

  • ...In our later discussion, human body motion is addressed by the movement of the limbs and hands [ 50 , 28,6,33], such as the velocities of the hand or limb segments, or the angular velocity of various body parts....

    [...]

  • ...HMM has been very popular in speech recognition, but only recently has it been adopted for recognition of human motion sequences in computer vision [ 50 ]....

    [...]

  • ...The work by Yamato et al. [ 50 ] is perhaps the first one on recognition of human action in this category....

    [...]

Journal ArticleDOI
TL;DR: The author uses projective relations as the theoretical foundation of his investigations of visual space and motion and concludes that during locomotion the components of the human visual environment are interpreted as rigid structures in relative motion.
Abstract: In this article the author uses projective relations as the theoretical foundation of his investigations of visual space and motion. Several laboratory experiments involving perceptual vector analysis and its geometric basis are described. In most of the experiments the visual stimuli consisted of computer-controlled patterns displayed on a televisionlike screen and projected into the eyes of subjects by means of a collimating device that removed parallax as well as the possibility of seeing the screen. A common characteristics of the experiments was that the observer was evidently not free to choose between a Euclidean interpretation of the changing geometry of the figure in the display and a projective interpretation. For example, the observer could not persuade himself that what he was seeing was simply a square growing larger and smaller in the same visual plane; his visual system insisted on telling him that he was seeing a square of constant size approaching and receding. Hence he perceived rigid motion in depth, rotation in a specific slant, bending in depth and so on, paired with the highest possible degree of object constancy. Further experiments were conducted to determine if the principles of perceptual analysis hold true for the more complex paterns of motions encountered in everyday life. These experiments led to the conclusion that during locomotion the components of the human visual environment are interpreted as rigid structures in relative motion.

930 citations


"Human motion analysis: a review" refers background in this paper

  • ...This concept was initially considered by Johansson [ 23 ], who marked joints as moving light displays (MLD)....

    [...]