scispace - formally typeset
Search or ask a question
Author

S. Greenberg

Bio: S. Greenberg is an academic researcher. The author has contributed to research in topics: Semantics & Gesture recognition. The author has an hindex of 1, co-authored 1 publications receiving 37 citations.

Papers
More filters
Journal Article
TL;DR: In this paper, a structured approach for studying patterns of multimodal language in the context of a 2D-display control is proposed, where gestures from observable kinematical primitives to their semantics are considered pertinent to a linguistic structure.
Abstract: In recent years because of the advances in computer vision research, free hand gestures have been explored as means of human-computer interaction (HCI). Together with improved speech processing technology it is an important step toward natural multimodal HCI. However, inclusion of non-predefined continuous gestures into a multimodal framework is a challenging problem. In this paper, we propose a structured approach for studying patterns of multimodal language in the context of a 2D-display control. We consider systematic analysis of gestures from observable kinematical primitives to their semantics as pertinent to a linguistic structure. Proposed semantic classification of co-verbal gestures distinguishes six categories based on their spatio-temporal deixis. We discuss evolution of a computational framework for gesture and speech integration which was used to develop an interactive testbed (iMAP). The testbed enabled elicitation of adequate, non-sequential, multimodal patterns in a narrative mode of HCI. Conducted user studies illustrate significance of accounting for the temporal alignment of gesture and speech parts in semantic mapping. Furthermore, co-occurrence analysis of gesture/speech production suggests syntactic organization of gestures at the lexical level.

38 citations


Cited by
More filters
Journal ArticleDOI
08 Sep 2003
TL;DR: The importance of multimodal interfaces in various aspects of crisis management is established and many issues in realizing successful speech-gesture driven, dialogue-enabled interfaces for crisis management are explored.
Abstract: Emergency response requires strategic assessment of risks, decisions, and communications that are time critical while requiring teams of individuals to have fast access to large volumes of complex information and technologies that enable tightly coordinated work. The access to this information by crisis management teams in emergency operations centers can be facilitated through various human-computer interfaces. Unfortunately, these interfaces are hard to use, require extensive training, and often impede rather than support teamwork. Dialogue-enabled devices, based on natural, multimodal interfaces, have the potential of making a variety of information technology tools accessible during crisis management. This paper establishes the importance of multimodal interfaces in various aspects of crisis management and explores many issues in realizing successful speech-gesture driven, dialogue-enabled interfaces for crisis management. This paper is organized in five parts. The first part discusses the needs of crisis management that can be potentially met by the development of appropriate interfaces. The second part discusses the issues related to the design and development of multimodal interfaces in the context of crisis management. The third part discusses the state of the art in both the theories and practices involving these human-computer interfaces. In particular, it describes the evolution and implementation details of two representative systems, Crisis Management (XISM) and Dialog Assisted Visual Environment for Geoinformation (DAVE/spl I.bar/G). The fourth part speculates on the short-term and long-term research directions that will help addressing the outstanding challenges in interfaces that support dialogue and collaboration. Finally, the fifth part concludes the paper.

159 citations

Journal ArticleDOI
TL;DR: The main novelty of this paper is a complete description of the GDL script language, its validation on a large dataset (1,600 recorded movement sequences) and the presentation of its possible application.
Abstract: In this paper we propose a classifier capable of recognizing human body static poses and body gestures in real time. The method is called the gesture description language (GDL). The proposed methodology is intuitive, easily thought and reusable for any kind of body gestures. The very heart of our approach is an automated reasoning module. It performs forward chaining reasoning (like a classic expert system) with its inference engine every time new portion of data arrives from the feature extraction library. All rules of the knowledge base are organized in GDL scripts having the form of text files that are parsed with a LALR-1 grammar. The main novelty of this paper is a complete description of our GDL script language, its validation on a large dataset (1,600 recorded movement sequences) and the presentation of its possible application. The recognition rate for examined gestures is within the range of 80.5---98.5 %. We have also implemented an application that uses our method: it is a three-dimensional desktop for visualizing 3D medical datasets that is controlled by gestures recognized by the GDL module.

100 citations

Proceedings ArticleDOI
14 Oct 2002
TL;DR: This paper presents a framework for designing a natural multimodal human computer interaction (HCI) system and found that the system performed according to its specifications in 95% of the cases and that users showed ad-hoc proficiency, indicating natural acceptance of such systems.
Abstract: This paper presents a framework for designing a natural multimodal human computer interaction (HCI) system. The core of the proposed framework is a principled method for combining information derived from audio and visual cues. To achieve natural interaction, both audio and visual modalities are fused along with feedback through a large screen display. Careful design along with due considerations of possible aspects of a systems interaction cycle and integration has resulted in a successful system. The performance of the proposed framework has been validated through the development of several prototype systems as well as commercial applications for the retail and entertainment industry. To assess the impact of these multimodal systems (MMS), informal studies have been conducted. It was found that the system performed according to its specifications in 95% of the cases and that users showed ad-hoc proficiency, indicating natural acceptance of such systems.

88 citations

Patent
Michael V. Johnston1, Derya Ozkan1
01 Dec 2011
TL;DR: In this article, the authors present a system that continuously monitors an audio stream associated with a gesture input stream, and detects a speech event in the audio stream, then the system identifies a temporal window associated with the time of the speech event, and analyzes data from the gesture input streams within the temporal window to identify a gesture event.
Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for processing multimodal input. A system configured to practice the method continuously monitors an audio stream associated with a gesture input stream, and detects a speech event in the audio stream. Then the system identifies a temporal window associated with a time of the speech event, and analyzes data from the gesture input stream within the temporal window to identify a gesture event. The system processes the speech event and the gesture event to produce a multimodal command. The gesture in the gesture input stream can be directed to a display, but is remote from the display. The system can analyze the data from the gesture input stream by calculating an average of gesture coordinates within the temporal window.

76 citations

Journal ArticleDOI
TL;DR: It will be shown that gesture phases show a particular distribution of the features, thus distinguishing one phase from another, and changes in the execution of phases in linear successions can be described by means of features.
Abstract: This paper presents a proposal for the description of gesture phases derived from articulatory characteristics observable in their execution. Based on the results of an explorative study examining the execution of gesture phases of ten German speakers, the paper presents two sets of articulatory features, i.e., distinctive and additional features by which gesture phases are characterized from a context-independent and context-sensitive point of view. It will be shown that gesture phases show a particular distribution of the features, thus distinguishing one phase from another. Furthermore, changes in the execution of phases in linear successions can be described by means of features. Contrary to other accounts, whose focus on gesture phases is primarily in relation to speech and/or adjacent phases, this proposal concentrates on the visible physical characteristics of gesture phases.

48 citations