scispace - formally typeset
Search or ask a question
Author

Sanshzar Kettebekov

Bio: Sanshzar Kettebekov is an academic researcher from Pennsylvania State University. The author has contributed to research in topics: Gesture & Gesture recognition. The author has an hindex of 9, co-authored 11 publications receiving 329 citations.

Papers
More filters
Proceedings ArticleDOI
14 Oct 2002
TL;DR: This paper presents a framework for designing a natural multimodal human computer interaction (HCI) system and found that the system performed according to its specifications in 95% of the cases and that users showed ad-hoc proficiency, indicating natural acceptance of such systems.
Abstract: This paper presents a framework for designing a natural multimodal human computer interaction (HCI) system. The core of the proposed framework is a principled method for combining information derived from audio and visual cues. To achieve natural interaction, both audio and visual modalities are fused along with feedback through a large screen display. Careful design along with due considerations of possible aspects of a systems interaction cycle and integration has resulted in a successful system. The performance of the proposed framework has been validated through the development of several prototype systems as well as commercial applications for the retail and entertainment industry. To assess the impact of these multimodal systems (MMS), informal studies have been conducted. It was found that the system performed according to its specifications in 95% of the cases and that users showed ad-hoc proficiency, indicating natural acceptance of such systems.

88 citations

Journal ArticleDOI
TL;DR: Results of user studies revealed that gesture primitives, originally extracted from weather map narration, form patterns of co-occurrence with speech parts in association with their meaning in a visual display control system, defining a direction in approaching interpretation in natural gesture-speech interfaces.
Abstract: In recent years because of the advances in computer vision research, free hand gestures have been explored as a means of human-computer interaction (HCI). Gestures in combination with speech can be an important step toward natural, multimodal HCI. However, interpretation of gestures in a multimodal setting can be a particularly challenging problem. In this paper, we propose an approach for studying multimodal HCI in the context of a computerized map. An implemented testbed allows us to conduct user studies and address issues toward understanding of hand gestures in a multimodal computer interface. Absence of an adequate gesture classification in HCI makes gesture interpretation difficult. We formalize a method for bootstrapping the interpretation process by a semantic classification of gesture primitives in HCI context. We distinguish two main categories of gesture classes based on their spatio-temporal deixis. Results of user studies revealed that gesture primitives, originally extracted from weather map narration, form patterns of co-occurrence with speech parts in association with their meaning in a visual display control system. The results of these studies indicated two levels of gesture meaning: individual stroke and motion complex. These findings define a direction in approaching interpretation in natural gesture-speech interfaces.

43 citations

Journal Article
TL;DR: In this paper, a structured approach for studying patterns of multimodal language in the context of a 2D-display control is proposed, where gestures from observable kinematical primitives to their semantics are considered pertinent to a linguistic structure.
Abstract: In recent years because of the advances in computer vision research, free hand gestures have been explored as means of human-computer interaction (HCI). Together with improved speech processing technology it is an important step toward natural multimodal HCI. However, inclusion of non-predefined continuous gestures into a multimodal framework is a challenging problem. In this paper, we propose a structured approach for studying patterns of multimodal language in the context of a 2D-display control. We consider systematic analysis of gestures from observable kinematical primitives to their semantics as pertinent to a linguistic structure. Proposed semantic classification of co-verbal gestures distinguishes six categories based on their spatio-temporal deixis. We discuss evolution of a computational framework for gesture and speech integration which was used to develop an interactive testbed (iMAP). The testbed enabled elicitation of adequate, non-sequential, multimodal patterns in a narrative mode of HCI. Conducted user studies illustrate significance of accounting for the temporal alignment of gesture and speech parts in semantic mapping. Furthermore, co-occurrence analysis of gesture/speech production suggests syntactic organization of gestures at the lexical level.

38 citations

Proceedings ArticleDOI
03 Dec 2002
TL;DR: Insight into the design aspects of the XISM system is provided, addressing the issues of extraction and fusion of gesture and speech modalities to allow more natural interactive behavior.
Abstract: This paper presents a multimodal crisis management system (XISM). It employs processing of natural gesture and speech commands elicited by a user to efficiently manage complex dynamic emergency scenarios on a large display. The developed prototype system demonstrates the means of incorporating unconstrained free-hand gestures and speech in a real-time interactive interface. This paper provides insights into the design aspects of the XISM system. In particular, it addresses the issues of extraction and fusion of gesture and speech modalities to allow more natural interactive behavior. Performance characteristics of the current prototype and considerations for future work are discussed. A series of studies indicated positive response with respect to ease of interacting with the current system.

36 citations

Book ChapterDOI
11 May 2001
TL;DR: A structured approach for studying patterns of multimodal language in the context of a 2D-display control and co-occurrence analysis of gesture/speech production suggests syntactic organization of gestures at the lexical level.
Abstract: In recent years because of the advances in computer vision research, free hand gestures have been explored as means of human-computer interaction (HCI). Together with improved speech processing technology it is an important step toward natural multimodal HCI. However, inclusion of nonpredefined continuous gestures into a multimodal framework is a challenging problem. In this paper, we propose a structured approach for studying patterns of multimodal language in the context of a 2D-display control. We consider systematic analysis of gestures from observable kinematical primitives to their semantics as pertinent to a linguistic structure. Proposed semantic classification of co-verbal gestures distinguishes six categories based on their spatio-temporal deixis. We discuss evolution of a computational framework for gesture and speech integration which was used to develop an interactive testbed (iMAP). The testbed enabled elicitation of adequate, non-sequential, multimodal patterns in a narrative mode of HCI. Conducted user studies illustrate significance of accounting for the temporal alignment of gesture and speech parts in semantic mapping. Furthermore, co-occurrence analysis of gesture/speech production suggests syntactic organization of gestures at the lexical level.

33 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This paper reviews the major approaches to multimodal human-computer interaction, giving an overview of the field from a computer vision perspective, and focuses on body, gesture, gaze, and affective interaction.

948 citations

Patent
01 Dec 2003
TL;DR: In this article, a perceptual user interface system includes a tracking component that detects object characteristics of at least one of a plurality of objects within a scene, and tracks the respective object.
Abstract: Architecture for implementing a perceptual user interface. The architecture comprises alternative modalities for controlling computer application programs and manipulating on-screen objects through hand gestures or a combination of hand gestures and verbal commands. The perceptual user interface system includes a tracking component that detects object characteristics of at least one of a plurality of objects within a scene, and tracks the respective object. Detection of object characteristics is based at least in part upon image comparison of a plurality of images relative to a course mapping of the images. A seeding component iteratively seeds the tracking component with object hypotheses based upon the presence of the object characteristics and the image comparison. A filtering component selectively removes the tracked object from the object hypotheses and/or at least one object hypothesis from the set of object hypotheses based upon predetermined removal criteria.

876 citations

Journal ArticleDOI
TL;DR: Body posture and finger pointing are a natural modality for human-machine interaction, but first the system must know what it's seeing.
Abstract: Body posture and finger pointing are a natural modality for human-machine interaction, but first the system must know what it's seeing.

641 citations

Journal ArticleDOI
TL;DR: An overview of the basic definitions and terminology of Human-Computer Interaction is provided, a survey of existing technologies and recent advances in the field, common architectures used in the design of HCI systems which includes unimodal and multimodal configurations, and finally the applications of H CI.
Abstract: The intention of this paper is to provide an overview on the subject of Human-Computer Interaction. The overview includes the basic definitions and terminology, a survey of existing technologies and recent advances in the field, common architectures used in the design of HCI systems which includes unimodal and multimodal configurations, and finally the applications of HCI. This paper also offers a comprehensive number of references for each concept, method, and application in the HCI.

351 citations

Book ChapterDOI
21 Oct 2005
TL;DR: This paper focuses on body, gesture, gaze, and affective interaction (facial expression recognition, and emotion in audio) in multimodal human computer interaction from a computer vision perspective.
Abstract: In this paper we review the major approaches to multimodal human computer interaction from a computer vision perspective. In particular, we focus on body, gesture, gaze, and affective interaction (facial expression recognition, and emotion in audio). We discuss user and task modeling, and multimodal fusion, highlighting challenges, open issues, and emerging applications for Multimodal Human Computer Interaction (MMHCI) research.

298 citations