scispace - formally typeset
Search or ask a question
Proceedings Article•DOI•

Real-time tracking of multiple fingertips and gesture recognition for augmented desk interface systems

TL;DR: In this paper, the location of each fingertip is located in each input infrared image frame and correspondences of detected fingertips between successive image frames are determined based on a prediction technique, which is particularly advantageous for human-computer interaction.
Abstract: We propose a fast and robust method for tracking a user's hand and multiple fingertips; we then demonstrate gesture recognition based on measured fingertip trajectories for augmented desk interface systems. Our tracking method is capable of tracking multiple fingertips in a reliable manner even in a complex background under a dynamically changing lighting condition without any markers. First, based on its geometrical features, the location of each fingertip is located in each input infrared image frame. Then, correspondences of detected fingertips between successive image frames are determined based on a prediction technique. Our gesture recognition system is particularly advantageous for human-computer interaction (HCI) in that users can achieve interactions based on symbolic gestures at the same time that they perform direct manipulation with their own hands and fingers. The effectiveness of our proposed method has been successfully demonstrated via a number of experiments.
Citations
More filters
Journal Article•DOI•
TL;DR: A literature review on the second research direction, which aims to capture the real 3D motion of the hand, which is a very challenging problem in the context of HCI.

901 citations

Patent•
Kikuo Fujimura1, Xia Liu1•
13 May 2005
TL;DR: Sign-understanding technology can be used for remote control of home devices, mouse-less operation of computer consoles, gaming, and man-robot communication to give instructions among others.
Abstract: Communication is an important issue in man-to-robot interaction. Signs can be used to interact with machines by providing user instructions or commands. Embodiment of the present invention include human detection, human body parts detection, hand shape analysis, trajectory analysis, orientation determination, gesture matching, and the like. Many types of shapes and gestures are recognized in a non-intrusive manner based on computer vision. A number of applications become feasible by this sign-understanding technology, including remote control of home devices, mouse-less (and touch-less) operation of computer consoles, gaming, and man-robot communication to give instructions among others. Active sensing hardware is used to capture a stream of depth images at a video rate, which is consequently analyzed for information extraction.

587 citations

Proceedings Article•DOI•
27 Jun 2004
TL;DR: A method is presented for robust tracking in highly cluttered environments that makes effective use of 3D depth sensing technology, resulting in illumination-invariant tracking.
Abstract: A method is presented for robust tracking in highly cluttered environments. The method makes effective use of 3D depth sensing technology, resulting in illumination-invariant tracking. A few applications using tracking are presented including face tracking and hand tracking.

507 citations

Proceedings Article•DOI•
13 Oct 2004
TL;DR: By segmenting the hand regions from the video images and then augmenting them transparently into a graphical interface, the Visual Touchpad provides a compelling direct manipulation experience without the need for more expensive tabletop displays or touch-screens, and with significantly less self-occlusion.
Abstract: This paper presents the Visual Touchpad, a low-cost vision-based input device that allows for fluid two-handed interactions with desktop PCs, laptops, public kiosks, or large wall displays. Two downward-pointing cameras are attached above a planar surface, and a stereo hand tracking system provides the 3D positions of a user's fingertips on and above the plane. Thus the planar surface can be used as a multi-point touch-sensitive device, but with the added ability to also detect hand gestures hovering above the surface. Additionally, the hand tracker not only provides positional information for the fingertips but also finger orientations. A variety of one and two-handed multi-finger gestural interaction techniques are then presented that exploit the affordances of the hand tracker. Further, by segmenting the hand regions from the video images and then augmenting them transparently into a graphical interface, our system provides a compelling direct manipulation experience without the need for more expensive tabletop displays or touch-screens, and with significantly less self-occlusion.

343 citations

Patent•
14 Jun 2004
TL;DR: In this article, a system for estimating orientation of a target based on real-time video data using depth data included in the video to determine the estimated orientation is presented, which includes a time-of-flight camera capable of depth sensing within a depth window.
Abstract: A system for estimating orientation of a target based on real-time video data uses depth data included in the video to determine the estimated orientation. The system includes a time-of-flight camera capable of depth sensing within a depth window. The camera outputs hybrid image data (color and depth). Segmentation is performed to determine the location of the target within the image. Tracking is used to follow the target location from frame to frame. During a training mode, a target-specific training image set is collected with a corresponding orientation associated with each frame. During an estimation mode, a classifier compares new images with the stored training set to determine an estimated orientation. A motion estimation approach uses an accumulated rotation/translation parameter calculation based on optical flow and depth constrains. The parameters are reset to a reference value each time the image corresponds to a dominant orientation.

327 citations

References
More filters
Journal Article•DOI•
TL;DR: The purpose of this tutorial paper is to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.
Abstract: The basic theory of Markov chains has been known to mathematicians and engineers for close to 80 years, but it is only in the past decade that it has been applied explicitly to problems in speech processing. One of the major reasons why speech models, based on Markov chains, have not been developed until recently was the lack of a method for optimizing the parameters of the Markov model to match observed signal patterns. Such a method was proposed in the late 1960's and was immediately applied to speech processing in several research institutions. Continued refinements in the theory and implementation of Markov modelling techniques have greatly enhanced the method, leading to a wide range of applications of these models. It is the purpose of this tutorial paper to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.

4,546 citations

Journal Article•DOI•
TL;DR: A fraction of the recycle slurry is treated with sulphuric acid to convert at least some of the gypsum to calcium sulphate hemihydrate and the slurry comprising hemihYDrate is returned to contact the mixture of phosphate rock, phosphoric acid and recycle Gypsum slurry.
Abstract: The use of hand gestures provides an attractive alternative to cumbersome interface devices for human-computer interaction (HCI). In particular, visual interpretation of hand gestures can help in achieving the ease and naturalness desired for HCI. This has motivated a very active research area concerned with computer vision-based analysis and interpretation of hand gestures. We survey the literature on visual interpretation of hand gestures in the context of its role in HCI. This discussion is organized on the basis of the method used for modeling, analyzing, and recognizing gestures. Important differences in the gesture interpretation approaches arise depending on whether a 3D model of the human hand or an image appearance model of the human hand is used. 3D hand models offer a way of more elaborate modeling of hand gestures but lead to computational hurdles that have not been overcome given the real-time requirements of HCI. Appearance-based models lead to computationally efficient "purposive" approaches that work well under constrained situations but seem to lack the generality desirable for HCI. We also discuss implemented gestural systems as well as other potential applications of vision-based gesture recognition. Although the current progress is encouraging, further theoretical as well as computational advances are needed before gestures can be widely used for HCI. We discuss directions of future research in gesture recognition, including its integration with other natural modes of human-computer interaction.

1,973 citations

Proceedings Article•DOI•
15 Jun 1992
TL;DR: The recognition rate is improved by increasing the number of people used to generate the training data, indicating the possibility of establishing a person-independent action recognizer.
Abstract: A human action recognition method based on a hidden Markov model (HMM) is proposed. It is a feature-based bottom-up approach that is characterized by its learning capability and time-scale invariability. To apply HMMs, one set of time-sequential images is transformed into an image feature vector sequence, and the sequence is converted into a symbol sequence by vector quantization. In learning human action categories, the parameters of the HMMs, one per category, are optimized so as to best describe the training sequences from the category. To recognize an observed sequence, the HMM which best matches the sequence is chosen. Experimental results for real time-sequential images of sports scenes show recognition rates higher than 90%. The recognition rate is improved by increasing the number of people used to generate the training data, indicating the possibility of establishing a person-independent action recognizer. >

1,477 citations

Journal Article•DOI•
Pierre Wellner1•
TL;DR: The DigitalDesk is built around an ordinary physical desk and can be used as such, but it has extra capabilities, including a video camera mounted above the desk that can detect where the user is pointing, and it can read documents that are placed on the desk.

1,127 citations

Dissertation•
01 Feb 1995
TL;DR: Using hidden Markov models (HMM's), an unobstrusive single view camera system is developed that can recognize hand gestures, namely, a subset of American Sign Language (ASL), achieving high recognition rates for full sentence ASL using only visual cues.
Abstract: : Using hidden Markov models (HMM's), an unobstrusive single view camera system is developed that can recognize hand gestures, namely, a subset of American Sign Language (ASL) Previous systems have concentrated on finger spelling or isolated word recognition, often using tethered electronic gloves for input We achieve high recognition rates for full sentence ASL using only visual cues A forty word lexicon consisting of personal pronouns, verbs, nouns, and adjectives is used to create 494 randomly constructed five word sentences that are signed by the subject to the computer The data is separated into a 395 sentence training set and an independent 99 sentence test set While signing, the 2D position, orientation, and eccentricity of bounding ellipses of the hands are tracked in real time with the assistance of solidly colored gloves Simultaneous recognition and segmentation of the resultant stream of feature vectors occurs five times faster than real time on an HP 735 With a strong grammar, the system achieves an accuracy of 97%; with no grammar, an accuracy of 91% is reached (95% correct)

758 citations