scispace - formally typeset
Posted Content

Toward Natural Gesture/Speech Control of a Large Display

...read more


Citations
More filters
Patent

[...]

Andrew D. Wilson1
26 Feb 2009
TL;DR: In this article, a system and process that controls a group of networked electronic components using a multimodal integration scheme in which inputs from a speech recognition subsystem, gesture recognition subsystem employing a wireless pointing device and pointing analysis subsystem also employing the pointing device, are combined to determine what component a user wants to control and what control action is desired.
Abstract: The present invention is directed toward a system and process that controls a group of networked electronic components using a multimodal integration scheme in which inputs from a speech recognition subsystem, gesture recognition subsystem employing a wireless pointing device and pointing analysis subsystem also employing the pointing device, are combined to determine what component a user wants to control and what control action is desired. In this multimodal integration scheme, the desired action concerning an electronic component is decomposed into a command and a referent pair. The referent can be identified using the pointing device to identify the component by pointing at the component or an object associated with it, by using speech recognition, or both. The command may be specified by pressing a button on the pointing device, by a gesture performed with the pointing device, by a speech recognition event, or by any combination of these inputs.

59 citations

Patent

[...]

07 Dec 2012
TL;DR: In this paper, a mobile device is described that provides a user interface experience to a user who is operating the mobile device within a vehicle, using mode functionality, which operates by receiving inference-input information from one or more input sources.
Abstract: A mobile device is described herein that provides a user interface experience to a user who is operating the mobile device within a vehicle. The mobile device provides the user interface experience using mode functionality. The mode functionality operates by receiving inference-input information from one or more input sources. At least one input source corresponds to at least one movement-sensing device, provided by the mobile device, that determines movement of the mobile device. The mode functionality then infers a state of the vehicle based on the inference-input information and presents a user interface experience that is appropriate for the vehicle state. In one scenario, the mode functionality can also infer that the vehicle is in a distress condition. In response, the mode functionality can solicit assistance for the user.

43 citations

Proceedings ArticleDOI

[...]

13 Oct 2004
TL;DR: An empirical study is described showing that the removal of auditory information significantly impairs the ability of human raters to classify gestures, and an automatic gesture classification system is presented based solely on an n-gram model of linguistic context.
Abstract: Classification of natural hand gestures is usually approached by applying pattern recognition to the movements of the hand. However, the gesture categories most frequently cited in the psychology literature are fundamentally multimodal; the definitions make reference to the surrounding linguistic context. We address the question of whether gestures are naturally multimodal, or whether they can be classified from hand-movement data alone. First, we describe an empirical study showing that the removal of auditory information significantly impairs the ability of human raters to classify gestures. Then we present an automatic gesture classification system based solely on an n-gram model of linguistic context; the system is intended to supplement a visual classifier, but achieves 66% accuracy on a three-class classification problem on its own. This represents higher accuracy than human raters achieve when presented with the same information.

32 citations

Proceedings ArticleDOI

[...]

30 Jul 2006
TL;DR: An empirical study is described showing that the removal of auditory information significantly impairs the ability of human raters to classify gestures, and an automatic gesture classification system is presented based solely on an n-gram model of linguistic context.
Abstract: Classification of natural hand gestures is usually approached by applying pattern recognition to the movements of the hand. However, the gesture categories most frequently cited in the psychology literature are fundamentally multimodal; the definitions make reference to the surrounding linguistic context. We address the question of whether gestures are naturally multimodal, or whether they can be classified from hand-movement data alone. First, we describe an empirical study showing that the removal of auditory information significantly impairs the ability of human raters to classify gestures. Then we present an automatic gesture classification system based solely on an n-gram model of linguistic context; the system is intended to supplement a visual classifier, but achieves 66% accuracy on a three-class classification problem on its own. This represents higher accuracy than human raters achieve when presented with the same information.

23 citations

Patent

[...]

Andrew D. Wilson1
18 Apr 2008
TL;DR: In this article, a system and process that controls a group of networked electronic components using a multimodal integration scheme in which inputs from a speech recognition subsystem, gesture recognition subsystem employing a wireless pointing device and pointing analysis subsystem also employing the pointing device, are combined to determine what component a user wants to control and what control action is desired.
Abstract: The present invention is directed toward a system and process that controls a group of networked electronic components using a multimodal integration scheme in which inputs from a speech recognition subsystem, gesture recognition subsystem employing a wireless pointing device and pointing analysis subsystem also employing the pointing device, are combined to determine what component a user wants to control and what control action is desired. In this multimodal integration scheme, the desired action concerning an electronic component is decomposed into a command and a referent pair. The referent can be identified using the pointing device to identify the component by pointing at the component or an object associated with it, by using speech recognition, or both. The command may be specified by pressing a button on the pointing device, by a gesture performed with the pointing device, by a speech recognition event, or by any combination of these inputs.

9 citations


References
More filters
Journal ArticleDOI

[...]

TL;DR: A fraction of the recycle slurry is treated with sulphuric acid to convert at least some of the gypsum to calcium sulphate hemihydrate and the slurry comprising hemihYDrate is returned to contact the mixture of phosphate rock, phosphoric acid and recycle Gypsum slurry.
Abstract: The use of hand gestures provides an attractive alternative to cumbersome interface devices for human-computer interaction (HCI). In particular, visual interpretation of hand gestures can help in achieving the ease and naturalness desired for HCI. This has motivated a very active research area concerned with computer vision-based analysis and interpretation of hand gestures. We survey the literature on visual interpretation of hand gestures in the context of its role in HCI. This discussion is organized on the basis of the method used for modeling, analyzing, and recognizing gestures. Important differences in the gesture interpretation approaches arise depending on whether a 3D model of the human hand or an image appearance model of the human hand is used. 3D hand models offer a way of more elaborate modeling of hand gestures but lead to computational hurdles that have not been overcome given the real-time requirements of HCI. Appearance-based models lead to computationally efficient "purposive" approaches that work well under constrained situations but seem to lack the generality desirable for HCI. We also discuss implemented gestural systems as well as other potential applications of vision-based gesture recognition. Although the current progress is encouraging, further theoretical as well as computational advances are needed before gestures can be widely used for HCI. We discuss directions of future research in gesture recognition, including its integration with other natural modes of human-computer interaction.

1,906 citations

Book

[...]

01 Jan 1976
TL;DR: The role of gaze in human social interaction was investigated experimentally by Argyle and Cook as mentioned in this paper, who set up a research group at Oxford with Ted Crossman and Adam Kendon, to study non-verbal communication and gaze as an important aspect of this behaviour.
Abstract: One of the first psychologists to investigate experimentally the role of gaze in human behaviour was Michael Argyle. In 1963 he set up a research group at Oxford with Ted Crossman and Adam Kendon, to study non-verbal communication in human social interaction, which included gaze as an important aspect of this behaviour. Shortly afterwards, Mark Cook joined this group which was funded until 1975, during which time considerable research on gaze had been carried out both at Oxford and elsewhere. This book summarises much of the work done in this field up until that time.

1,423 citations

Proceedings ArticleDOI

[...]

27 Mar 1997
TL;DR: The present research analyzed multimodal interaction while people spoke and wrote to a simulated dynamic map system and revealed that the temporal precedence of writing over speech was a major theme, with pen input conveying location information first in a sentence.
Abstract: Our ability to develop robust multimodal systems will depend on knowledge of the natural integration patterns that typify people's combined use of different input modes. To provide a foundation for theory and design, the present research analyzed multimodal interaction while people spoke and wrote to a simulated dynamic map system. Task analysis revealed that multimodal interaction occurred most frequently during spatial location commands, and with intermediate frequency during selection commands. In addition, microanalysis of input signals identified sequential, simultaneous, point-and-speak, and compound integration patterns, as well as data on the temporal precedence of modes and on inter-modal lags. In synchronizing input streams, the temporal precedence of writing over speech was a major theme, with pen input conveying location information first in a sentence. Linguistic analysis also revealed that the spoken and written modes consistently supplied complementary semantic information, rather than redundant. One long-term goal of this research is the development of predictive models of natural modality integration to guide the design of emerging multimodal architectures.

435 citations

Book

[...]

01 Jan 1995
TL;DR: The nature of gesture, signed and spoken languages differently organized, and the origin of syntax: gesture as name and relation are discussed.
Abstract: This book proposes a radical alternative to dominant views of the evolution of language, and in particular the origins of syntax. The authors argue that manual and vocal communication developed in parallel, and that the basic elements of syntax are intrinsic to gesture. They draw on evidence from areas such as primatology, anthropology, and linguistics, to present a groundbreaking account of the notion that language emerged through visible bodily action. They go on to examine the implications of their findings for linguistic theory and theories of the biological evolution of the capacity for language. Written in a clear and accessible style, Gesture and the Nature of Language will be indispensable reading for all those interested in the origins of language.

404 citations

Journal ArticleDOI

[...]

01 May 1998
TL;DR: It is clear that further research is needed for interpreting and fitting multiple sensing modalities in the context of HCI and the fundamental issues in integrating them at various levels, from early signal level to intermediate feature level to late decision level.
Abstract: Recent advances in various signal processing technologies, coupled with an explosion in the available computing power, have given rise to a number of novel human-computer interaction (HCI) modalities: speech, vision-based gesture recognition, eye tracking, electroencephalograph, etc. Successful embodiment of these modalities into an interface has the potential of easing the HCI bottleneck that has become noticeable with the advances in computing and communication. It has also become increasingly evident that the difficulties encountered in the analysis and interpretation of individual sensing modalities may be overcome by integrating them into a multimodal human-computer interface. We examine several promising directions toward achieving multimodal HCI. We consider some of the emerging novel input modalities for HCI and the fundamental issues in integrating them at various levels, from early signal level to intermediate feature level to late decision level. We discuss the different computational approaches that may be applied at the different levels of modality integration. We also briefly review several demonstrated multimodal HCI systems and applications. Despite all the recent developments, it is clear that further research is needed for interpreting and fitting multiple sensing modalities in the context of HCI. This research can benefit from many disparate fields of study that increase our understanding of the different human communication modalities and their potential role in HCI.

319 citations