scispace - formally typeset
Search or ask a question

Showing papers by "Rajeev Sharma published in 1998"


Journal ArticleDOI
01 May 1998
TL;DR: It is clear that further research is needed for interpreting and fitting multiple sensing modalities in the context of HCI and the fundamental issues in integrating them at various levels, from early signal level to intermediate feature level to late decision level.
Abstract: Recent advances in various signal processing technologies, coupled with an explosion in the available computing power, have given rise to a number of novel human-computer interaction (HCI) modalities: speech, vision-based gesture recognition, eye tracking, electroencephalograph, etc. Successful embodiment of these modalities into an interface has the potential of easing the HCI bottleneck that has become noticeable with the advances in computing and communication. It has also become increasingly evident that the difficulties encountered in the analysis and interpretation of individual sensing modalities may be overcome by integrating them into a multimodal human-computer interface. We examine several promising directions toward achieving multimodal HCI. We consider some of the emerging novel input modalities for HCI and the fundamental issues in integrating them at various levels, from early signal level to intermediate feature level to late decision level. We discuss the different computational approaches that may be applied at the different levels of modality integration. We also briefly review several demonstrated multimodal HCI systems and applications. Despite all the recent developments, it is clear that further research is needed for interpreting and fitting multiple sensing modalities in the context of HCI. This research can benefit from many disparate fields of study that increase our understanding of the different human communication modalities and their potential role in HCI.

330 citations


Proceedings ArticleDOI
23 Jun 1998
TL;DR: A multiple cite-based localization scheme combined with a tracking framework to reliably track the human arm dynamics in unconstrained environments and an interaction scheme between tracking and localization for improving the estimation process while reducing the computational requirements is proposed.
Abstract: The use of hard gestures provides an attractive means of interacting naturally with a computer generated display. Using one or more video cameras, the hand movements can potentially be interpreted as meaningful gestures. One key problem in building such am interface without a restricted setup is the ability to localize and track the human arm robust in image sequences. This paper proposes a multiple cite-based localization scheme combined with a tracking framework to reliably track the human arm dynamics in unconstrained environments. The localization scheme integrates the multiple cues of motion, shape, and color for locating a set of key image features. Using constraint fusion, these features are tracked by a modified Extended Kalman Filter that exploits the articulated structure of the arm. We also propose an interaction scheme between tracking and localization for improving the estimation process while reducing the computational requirements. The performance of the frameworks is validated with the help of extensive experiments and simulations.

63 citations


01 Jan 1998
TL;DR: A continuous HMM based gesture recognition framework is implemented and the possibility of improving continuous gesture recognition results based on the co-occurrence analysis of different gestures with some spoken keywords is shown.
Abstract: In order to incorporate naturalness in the design of Human Computer Interfaces (HCI), it is desirable to develop recognition techniques capable of handling coninous natural gesture and speech inputs. Hidden Markov Models (HMMs) provide a good framework for continous gesture recognition and also for multimodal fusion. Many different researchers have reported high recognition rates for gesture recognition using HMMs. However the gestures which were used for recognition by them were defined precisely and were bound with syntactical and grammatical constraints. But natural gestures do not string together in syntactical bindings. Moreover strict classificaiton of natural gesture is not feasible. In this paper we have examined hand gestures made in a very natural domain, that of a weather person narrating in front of a weather map. The gestures made by the weather person are embedded in a narration. This provides us with abundant data from an uncontrolled environment to study the interaction between speech and gesture in the context of a display. We hypothesize that this domain is very similar to that of a natural HCI interface. We have implemented a continuous HMM based gesture recognition framework. In order to understand the interaction between the gesture and speech, we have done a co-occurrence analysis of different gestures with some spoken keywords. We have also shown the possiblity of improving continuous gesture recognition results based on the co-occurrence analysis. Fast feature extraction and tracking is accomplished by the use of a predictive Kalman filtering on color segmented stream of video images. The results in the weather domain should be a step toward natural gesture/speeck HCI.

41 citations


Journal ArticleDOI
TL;DR: This work proposes the use of additional “exploratory motion” in the direction in which it is most needed, thus considerably improving the estimation of the image Jacobian and study the role such exploratory motion can play in a visual servoing task.

30 citations


Journal ArticleDOI
TL;DR: This paper uses a geometric error analysis method that models the quantization error as projected pyramids and the uncertainty region as an ellipsoid around the polyhedron intersection of the pyramids to determine the uncertainty ellipSOid for an arbitrary number of cameras.
Abstract: An important source of error when estimating the 3-D position of a point from two (stereo), three (trinocular), or more cameras is that of quantization error on the image planes. In this paper, we are concerned with bounding the quantization errors when using multiple cameras defined in terms of uncertainty regions in 3-D. We use a geometric error analysis method that models the quantization error as projected pyramids and the uncertainty region as an ellipsoid around the polyhedron intersection of the pyramids. We present a computational technique for determining the uncertainty ellipsoid for an arbitrary number of cameras. A numerical measure of uncertainty bound such as the volume of the ellipsoid can then be computed for aiding camera placement, trajectory planning, and various other multiple camera applications.

29 citations


Proceedings ArticleDOI
14 Apr 1998
TL;DR: A multimodal localization scheme combined with a tracking framework that exploits the articulated structure of the arm is proposed that uses the multiple cues of motion, shape and color to locate a set of image features.
Abstract: A key problem in building an interface in which the user uses hand gestures to control a computer generated display without restrictions is the ability to localize and track the human arm in image sequences. The paper proposes a multimodal localization scheme combined with a tracking framework that exploits the articulated structure of the arm. The localization uses the multiple cues of motion, shape and color to locate a set of image features. Using constraint fusion, these features are tracked by a modified extended Kalman filter. An interaction scheme between tracking and localization is proposed in order to improve the estimation while decreasing the computational requirement. The results of extensive simulations and experiments with real data are described including a large database of hand gestures involved in display control.

21 citations


Journal ArticleDOI
TL;DR: An efficient Vector Associative Map (VAM)-based learning scheme is proposed to learn a joint-based representation of 3D targets that is invariant to changing camera configurations for a robotic active vision system.

10 citations


Proceedings ArticleDOI
14 Mar 1998
TL;DR: An interactive evaluation tool is developed, which uses augmentation schemes for visualizing and evaluating assembly sequences and guides the user step-by-step through an assembly sequence to help evaluate the feasibility and efficiency of a particular sequence to assemble a mechanical object from its components.
Abstract: Summary form only given. Augmented reality (AR) provides an intuitive interface to enhance the user's understanding of a scene. We consider the problem of scene augmentation in the context of assembly of a mechanical object. Concepts from robot assembly planning are used to develop a systematic framework for presenting augmentation stimuli for this assembly domain. An interactive evaluation tool is developed, which uses augmentation schemes for visualizing and evaluating assembly sequences. This system also guides the user step-by-step through an assembly sequence. Computer vision provides the sensing mechanism necessary to interpret the assembly scene. The goal of this system is to help evaluate the feasibility and efficiency of a particular sequence to assemble a mechanical object from its components. This is done by guiding the operator through each step in the sequence. The augmentation is provided with the help of a see-through head-mounted display that superimposes 3D graphics over the assembly scene and on nearby computer monitors. We incorporate these ideas into the design of an integrated system that we call AREAS (Augmented Reality System for Evaluating Assembly Sequences) and explore its use for evaluating assembly sequences using the concept of mixed prototyping.

10 citations


Journal ArticleDOI
01 May 1998-Robotica
TL;DR: A novel self-organizing neural network is proposed that learns a calibration-free spatial representation of 3D point targets in a manner that is invariant to changing camera configurations and decouples active camera control from robot control.
Abstract: Assembly robots that use an active camera system for visual feedback can achieve greater flexibility, including the ability to operate in an uncertain and changing environment. Incorporating active vision into a robot control loop involves some inherent difficulties, including calibration, and the need for redefining the servoing goal as the camera configuration changes. In this paper, we propose a novel self-organizing neural network that learns a calibration-free spatial representation of 3D point targets in a manner that is invariant to changing camera configurations. This representation is used to develop a new framework for robot control with active vision. The salient feature of this framework is that it decouples active camera control from robot control. The feasibility of this approach is established with the help of computer simulations and experiments with the University of Illinois Active Vision System (UIAVS).

3 citations


Book ChapterDOI
01 Jan 1998
TL;DR: In this article, the authors discuss the advantages of using active vision for visual servoing, with the aim of improving the measurement of image parameters, interpreting the image parameters in terms of corresponding world parameters, and the control of a robot.
Abstract: A purposeful change of camera parameters or “active vision” can be used to improve the process of extracting visual information. Thus if a robot visual servo loop incorporates active vision, it can lead to a better performance while increasing the scope of the control tasks. Although significant advances have been made in this direction, much of the potential improvement is still unrealized. This chapter discusses the advantages of using active vision for visual servoing. It reviews some of the past research in active vision relevant to visual servoing, with the aim of improving: (1) the measurement of image parameters, (2) the process of interpreting the image parameters in terms of the corresponding world parameters, and (3) the control of a robot in terms of the visual information extracted.

3 citations


Journal Article
TL;DR: In this article, the authors discuss the advantages of using active vision for visual servoing, with the aim of improving the measurement of image parameters, interpreting the image parameters in terms of corresponding world parameters, and the control of a robot.
Abstract: A purposeful change of camera parameters or active vision can be used to improve the process of extracting visual information. Thus if a robot visual servo loop incorporates active vision, it can lead to a better performance while increasing the scope of the control tasks. Although significant advances have been made in this direction, much of the potential improvement is still unrealized. This chapter discusses the advantages of using active vision for visual servoing. It reviews some of the past research in active vision relevant to visual servoing, with the aim of improving: (1) the measurement of image parameters, (2) the process of interpreting the image parameters in terms of the corresponding world parameters, and (3) the control of a robot in terms of the visual information extracted.