scispace - formally typeset
Open AccessPosted Content

Toward Natural Gesture/Speech Control of a Large Display

Reads0
Chats0
TLDR
In this article, a structured approach for studying patterns of multimodal language in the context of a 2D-display control is proposed, where gestures from observable kinematical primitives to their semantics are considered pertinent to a linguistic structure.
Abstract
In recent years because of the advances in computer vision research, free hand gestures have been explored as means of human-computer interaction (HCI). Together with improved speech processing technology it is an important step toward natural multimodal HCI. However, inclusion of non-predefined continuous gestures into a multimodal framework is a challenging problem. In this paper, we propose a structured approach for studying patterns of multimodal language in the context of a 2D-display control. We consider systematic analysis of gestures from observable kinematical primitives to their semantics as pertinent to a linguistic structure. Proposed semantic classification of co-verbal gestures distinguishes six categories based on their spatio-temporal deixis. We discuss evolution of a computational framework for gesture and speech integration which was used to develop an interactive testbed (iMAP). The testbed enabled elicitation of adequate, non-sequential, multimodal patterns in a narrative mode of HCI. Conducted user studies illustrate significance of accounting for the temporal alignment of gesture and speech parts in semantic mapping. Furthermore, co-occurrence analysis of gesture/speech production suggests syntactic organization of gestures at the lexical level.

read more

Citations
More filters
Patent

Controlling objects via gesturing

TL;DR: In this article, a system and process that controls a group of networked electronic components using a multimodal integration scheme in which inputs from a speech recognition subsystem, gesture recognition subsystem employing a wireless pointing device and pointing analysis subsystem also employing the pointing device, are combined to determine what component a user wants to control and what control action is desired.
Patent

Providing a user interface experience based on inferred vehicle state

TL;DR: In this paper, a mobile device is described that provides a user interface experience to a user who is operating the mobile device within a vehicle, using mode functionality, which operates by receiving inference-input information from one or more input sources.
Proceedings ArticleDOI

Visual and linguistic information in gesture classification

TL;DR: An empirical study is described showing that the removal of auditory information significantly impairs the ability of human raters to classify gestures, and an automatic gesture classification system is presented based solely on an n-gram model of linguistic context.
Proceedings ArticleDOI

Visual and linguistic information in gesture classification

TL;DR: An empirical study is described showing that the removal of auditory information significantly impairs the ability of human raters to classify gestures, and an automatic gesture classification system is presented based solely on an n-gram model of linguistic context.
Patent

Determining a position of a pointing device

TL;DR: In this article, a system and process that controls a group of networked electronic components using a multimodal integration scheme in which inputs from a speech recognition subsystem, gesture recognition subsystem employing a wireless pointing device and pointing analysis subsystem also employing the pointing device, are combined to determine what component a user wants to control and what control action is desired.
References
More filters
Book ChapterDOI

Movement Phase in Signs and Co-Speech Gestures, and Their Transcriptions by Human Coders

TL;DR: A syntagmatic rule system for movement phases that applies to both co-speech gestures and signs is proposed that can be used for the technology of automatic recognition of signs and co- speech gestures in order to segment continuous production and identify the potentially meaningbearing phase.
Proceedings ArticleDOI

Multimodal interfaces for dynamic interactive maps

TL;DR: In this paper, the authors analyzed the performance difficulties associated with speech-only map interactions, including elevated performance errors, spontaneous disfluencies, and lengthier task completion time, that declined substantially when people could interact multimodally with the map.
Proceedings ArticleDOI

Reliable tracking of human arm dynamics by multiple cue integration and constraint fusion

TL;DR: A multiple cite-based localization scheme combined with a tracking framework to reliably track the human arm dynamics in unconstrained environments and an interaction scheme between tracking and localization for improving the estimation process while reducing the computational requirements is proposed.
Journal ArticleDOI

Understanding gestures in multimodal human computer interaction

TL;DR: Results of user studies revealed that gesture primitives, originally extracted from weather map narration, form patterns of co-occurrence with speech parts in association with their meaning in a visual display control system, defining a direction in approaching interpretation in natural gesture-speech interfaces.
Book ChapterDOI

Are Listeners Paying Attention to the Hand Gestures of an Anthropomorphic Agent? An Evaluation Using a Gaze Tracking Method

TL;DR: In this article, a pilot study was conducted, using a gaze tracking method, on relevant aspects of an anthropomorphic agent's hand gestures in a real-time setting and revealed that a highly informative, one-handed gesture with seemingly-interactive speech attracted attention when it had a slower stroke and/or a long post-stroke hold at the Center-Center space and upper position.
Related Papers (5)