Real-time tracking of multiple fingertips and gesture recognition for augmented desk interface systems

doi:10.1109/AFGR.2002.1004191

Home
/
Papers
/
Real-time tracking of multiple fingertips and gesture recognition for augmented desk interface systems

Proceedings Article•DOI•

Real-time tracking of multiple fingertips and gesture recognition for augmented desk interface systems

Kenji Oka¹, Yoichi Sato¹, Hideki Koike²•Institutions (2)

University of Tokyo¹, University of Electro-Communications²

20 May 2002-pp 429-434

TL;DR: In this paper, the location of each fingertip is located in each input infrared image frame and correspondences of detected fingertips between successive image frames are determined based on a prediction technique, which is particularly advantageous for human-computer interaction.

read less

Abstract: We propose a fast and robust method for tracking a user's hand and multiple fingertips; we then demonstrate gesture recognition based on measured fingertip trajectories for augmented desk interface systems. Our tracking method is capable of tracking multiple fingertips in a reliable manner even in a complex background under a dynamically changing lighting condition without any markers. First, based on its geometrical features, the location of each fingertip is located in each input infrared image frame. Then, correspondences of detected fingertips between successive image frames are determined based on a prediction technique. Our gesture recognition system is particularly advantageous for human-computer interaction (HCI) in that users can achieve interactions based on symbolic gestures at the same time that they perform direct manipulation with their own hands and fingers. The effectiveness of our proposed method has been successfully demonstrated via a number of experiments.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Vision-based hand pose estimation: A review

[...]

Ali Erol¹, George Bebis¹, Mircea Nicolescu¹, Richard Boyle², Xander Twombly² - Show less +1 more•Institutions (2)

University of Nevada, Reno¹, Ames Research Center²

01 Oct 2007-Computer Vision and Image Understanding

TL;DR: A literature review on the second research direction, which aims to capture the real 3D motion of the hand, which is a very challenging problem in the context of HCI.

...read moreread less

901 citations

Patent•

Sign based human-machine interaction

[...]

Kikuo Fujimura¹, Xia Liu¹•Institutions (1)

Ohio State University¹

13 May 2005

TL;DR: Sign-understanding technology can be used for remote control of home devices, mouse-less operation of computer consoles, gaming, and man-robot communication to give instructions among others.

...read moreread less

Abstract: Communication is an important issue in man-to-robot interaction. Signs can be used to interact with machines by providing user instructions or commands. Embodiment of the present invention include human detection, human body parts detection, hand shape analysis, trajectory analysis, orientation determination, gesture matching, and the like. Many types of shapes and gestures are recognized in a non-intrusive manner based on computer vision. A number of applications become feasible by this sign-understanding technology, including remote control of home devices, mouse-less (and touch-less) operation of computer consoles, gaming, and man-robot communication to give instructions among others. Active sensing hardware is used to capture a stream of depth images at a video rate, which is consequently analyzed for information extraction.

...read moreread less

587 citations

Proceedings Article•DOI•

Visual Tracking Using Depth Data

[...]

H. Nanda¹, Kikuo Fujimura²•Institutions (2)

University of Maryland, College Park¹, Honda²

27 Jun 2004

TL;DR: A method is presented for robust tracking in highly cluttered environments that makes effective use of 3D depth sensing technology, resulting in illumination-invariant tracking.

...read moreread less

Abstract: A method is presented for robust tracking in highly cluttered environments. The method makes effective use of 3D depth sensing technology, resulting in illumination-invariant tracking. A few applications using tracking are presented including face tracking and hand tracking.

...read moreread less

507 citations

Proceedings Article•DOI•

Visual touchpad: a two-handed gestural input device

[...]

Shahzad Malik¹, Joe Laszlo¹•Institutions (1)

University of Toronto¹

13 Oct 2004

TL;DR: By segmenting the hand regions from the video images and then augmenting them transparently into a graphical interface, the Visual Touchpad provides a compelling direct manipulation experience without the need for more expensive tabletop displays or touch-screens, and with significantly less self-occlusion.

...read moreread less

Abstract: This paper presents the Visual Touchpad, a low-cost vision-based input device that allows for fluid two-handed interactions with desktop PCs, laptops, public kiosks, or large wall displays. Two downward-pointing cameras are attached above a planar surface, and a stereo hand tracking system provides the 3D positions of a user's fingertips on and above the plane. Thus the planar surface can be used as a multi-point touch-sensitive device, but with the added ability to also detect hand gestures hovering above the surface. Additionally, the hand tracker not only provides positional information for the fingertips but also finger orientations. A variety of one and two-handed multi-finger gestural interaction techniques are then presented that exploit the affordances of the hand tracker. Further, by segmenting the hand regions from the video images and then augmenting them transparently into a graphical interface, our system provides a compelling direct manipulation experience without the need for more expensive tabletop displays or touch-screens, and with significantly less self-occlusion.

...read moreread less

343 citations

Patent•

Target orientation estimation using depth sensing

[...]

Kikuo Fujimura¹, Youding Zhu²•Institutions (2)

Honda¹, Ohio State University²

14 Jun 2004

TL;DR: In this article, a system for estimating orientation of a target based on real-time video data using depth data included in the video to determine the estimated orientation is presented, which includes a time-of-flight camera capable of depth sensing within a depth window.

...read moreread less

Abstract: A system for estimating orientation of a target based on real-time video data uses depth data included in the video to determine the estimated orientation. The system includes a time-of-flight camera capable of depth sensing within a depth window. The camera outputs hybrid image data (color and depth). Segmentation is performed to determine the location of the target within the image. Tracking is used to follow the target location from frame to frame. During a training mode, a target-specific training image set is collected with a corresponding orientation associated with each frame. During an estimation mode, a classifier compares new images with the stored training set to determine an estimated orientation. A motion estimation approach uses an accumulated rotation/translation parameter calculation based on optical flow and depth constrains. The parameters are reset to a reference value each time the image corresponds to a dominant orientation.

...read moreread less

327 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

An introduction to hidden Markov models

[...]

Lawrence R. Rabiner¹, Biing-Hwang Juang•Institutions (1)

Bell Labs¹

01 Jan 1986-IEEE Assp Magazine

TL;DR: The purpose of this tutorial paper is to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.

...read moreread less

Abstract: The basic theory of Markov chains has been known to mathematicians and engineers for close to 80 years, but it is only in the past decade that it has been applied explicitly to problems in speech processing. One of the major reasons why speech models, based on Markov chains, have not been developed until recently was the lack of a method for optimizing the parameters of the Markov model to match observed signal patterns. Such a method was proposed in the late 1960's and was immediately applied to speech processing in several research institutions. Continued refinements in the theory and implementation of Markov modelling techniques have greatly enhanced the method, leading to a wide range of applications of these models. It is the purpose of this tutorial paper to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.

...read moreread less

4,546 citations

Journal Article•DOI•

Visual interpretation of hand gestures for human-computer interaction: a review

[...]

Vladimir Pavlovic¹, Rajeev Sharma¹, Thomas S. Huang¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jul 1997-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A fraction of the recycle slurry is treated with sulphuric acid to convert at least some of the gypsum to calcium sulphate hemihydrate and the slurry comprising hemihYDrate is returned to contact the mixture of phosphate rock, phosphoric acid and recycle Gypsum slurry.

...read moreread less

Abstract: The use of hand gestures provides an attractive alternative to cumbersome interface devices for human-computer interaction (HCI). In particular, visual interpretation of hand gestures can help in achieving the ease and naturalness desired for HCI. This has motivated a very active research area concerned with computer vision-based analysis and interpretation of hand gestures. We survey the literature on visual interpretation of hand gestures in the context of its role in HCI. This discussion is organized on the basis of the method used for modeling, analyzing, and recognizing gestures. Important differences in the gesture interpretation approaches arise depending on whether a 3D model of the human hand or an image appearance model of the human hand is used. 3D hand models offer a way of more elaborate modeling of hand gestures but lead to computational hurdles that have not been overcome given the real-time requirements of HCI. Appearance-based models lead to computationally efficient "purposive" approaches that work well under constrained situations but seem to lack the generality desirable for HCI. We also discuss implemented gestural systems as well as other potential applications of vision-based gesture recognition. Although the current progress is encouraging, further theoretical as well as computational advances are needed before gestures can be widely used for HCI. We discuss directions of future research in gesture recognition, including its integration with other natural modes of human-computer interaction.

...read moreread less

1,973 citations

Proceedings Article•DOI•

Recognizing human action in time-sequential images using hidden Markov model

[...]

Junji Yamato, J. Ohya, K. Ishii

15 Jun 1992

TL;DR: The recognition rate is improved by increasing the number of people used to generate the training data, indicating the possibility of establishing a person-independent action recognizer.

...read moreread less

Abstract: A human action recognition method based on a hidden Markov model (HMM) is proposed. It is a feature-based bottom-up approach that is characterized by its learning capability and time-scale invariability. To apply HMMs, one set of time-sequential images is transformed into an image feature vector sequence, and the sequence is converted into a symbol sequence by vector quantization. In learning human action categories, the parameters of the HMMs, one per category, are optimized so as to best describe the training sequences from the category. To recognize an observed sequence, the HMM which best matches the sequence is chosen. Experimental results for real time-sequential images of sports scenes show recognition rates higher than 90%. The recognition rate is improved by increasing the number of people used to generate the training data, indicating the possibility of establishing a person-independent action recognizer. >

...read moreread less

1,477 citations

Journal Article•DOI•

Interacting with paper on the DigitalDesk

[...]

Pierre Wellner¹•Institutions (1)

Xerox¹

01 Jul 1993-Communications of The ACM

TL;DR: The DigitalDesk is built around an ordinary physical desk and can be used as such, but it has extra capabilities, including a video camera mounted above the desk that can detect where the user is pointing, and it can read documents that are placed on the desk.

...read moreread less

1,127 citations

Dissertation•

Visual Recognition of American Sign Language Using Hidden Markov Models.

[...]

Thad Starner

01 Feb 1995

TL;DR: Using hidden Markov models (HMM's), an unobstrusive single view camera system is developed that can recognize hand gestures, namely, a subset of American Sign Language (ASL), achieving high recognition rates for full sentence ASL using only visual cues.

...read moreread less

Abstract: : Using hidden Markov models (HMM's), an unobstrusive single view camera system is developed that can recognize hand gestures, namely, a subset of American Sign Language (ASL) Previous systems have concentrated on finger spelling or isolated word recognition, often using tethered electronic gloves for input We achieve high recognition rates for full sentence ASL using only visual cues A forty word lexicon consisting of personal pronouns, verbs, nouns, and adjectives is used to create 494 randomly constructed five word sentences that are signed by the subject to the computer The data is separated into a 395 sentence training set and an independent 99 sentence test set While signing, the 2D position, orientation, and eccentricity of bounding ellipses of the hands are tracked in real time with the assistance of solidly colored gloves Simultaneous recognition and segmentation of the resultant stream of feature vectors occurs five times faster than real time on an HP 735 With a strong grammar, the system achieves an accuracy of 97%; with no grammar, an accuracy of 91% is reached (95% correct)

...read moreread less

758 citations