scispace - formally typeset
Search or ask a question
Author

Karen Kuhn

Other affiliations: Portland State University
Bio: Karen Kuhn is an academic researcher from Oregon Health & Science University. The author has contributed to research in topics: Multimodal interaction & Modality (human–computer interaction). The author has an hindex of 4, co-authored 4 publications receiving 585 citations. Previous affiliations of Karen Kuhn include Portland State University.

Papers
More filters
Proceedings ArticleDOI
27 Mar 1997
TL;DR: The present research analyzed multimodal interaction while people spoke and wrote to a simulated dynamic map system and revealed that the temporal precedence of writing over speech was a major theme, with pen input conveying location information first in a sentence.
Abstract: Our ability to develop robust multimodal systems will depend on knowledge of the natural integration patterns that typify people's combined use of different input modes. To provide a foundation for theory and design, the present research analyzed multimodal interaction while people spoke and wrote to a simulated dynamic map system. Task analysis revealed that multimodal interaction occurred most frequently during spatial location commands, and with intermediate frequency during selection commands. In addition, microanalysis of input signals identified sequential, simultaneous, point-and-speak, and compound integration patterns, as well as data on the temporal precedence of modes and on inter-modal lags. In synchronizing input streams, the temporal precedence of writing over speech was a major theme, with pen input conveying location information first in a sentence. Linguistic analysis also revealed that the spoken and written modes consistently supplied complementary semantic information, rather than redundant. One long-term goal of this research is the development of predictive models of natural modality integration to guide the design of emerging multimodal architectures.

441 citations

Proceedings ArticleDOI
11 Jul 1997
TL;DR: The present research analyzed multimodal interaction while people spoke and wrote to a simulated dynamic map system and revealed that the temporal precedence of writing over speech was a major theme, with pen input conveying location information first in a sentence.
Abstract: Our ability to develop robust multimodal systems will depend on knowledge of the natural integration patterns that typify people's combined use of different input modes. To provide a foundation for theory and design, the present research analyzed multimodal interaction while people spoke and wrote to a simulated dynamic map system. Task analysis revealed that multimodal interaction occurred most frequently during spatial location commands, and with intermediate frequency during selection commands. In addition, microanalysis of input signals identified sequential, simultaneous, point-and-speak, and compound integration patterns, as well as data on the temporal precedence of modes and on inter-modal lags. In synchronizing input streams, the temporal precedence of writing over speech was a major theme, with pen input conveying location information first in a sentence. Linguistic analysis also revealed that the spoken and written modes consistently supplied complementary semantic information, rather than redundant. One long-term goal of this research is the development of predictive models of natural modality integration to guide the design of emerging multimodal architectures.

74 citations

Proceedings ArticleDOI
03 Oct 1996
TL;DR: Speech during error resolution primarily was longer in duration, including both elongation of the speech segment and substantial relative increases in the number and duration of pauses, and contained more clear speech phonological features and fewer spoken disfluencies.
Abstract: Hyperarticulate speech to computers remains a poorly understood phenomenon, in spite of its association with elevated recognition errors. The research presented analyzes the type and magnitude of linguistic adaptations that occur when people engage in error resolution with computers. A semi automatic simulation method incorporating a novel error generation capability was used to collect speech data immediately before and after system recognition errors, and under conditions varying in error base rates. Data on original and repeated spoken input, which were matched on speaker and lexical content, then were examined for type and magnitude of linguistic adaptations. Results indicated that speech during error resolution primarily was longer in duration, including both elongation of the speech segment and substantial relative increases in the number and duration of pauses. It also contained more clear speech phonological features and fewer spoken disfluencies. Implications of these findings are discussed for the development of more user centered and robust error handling in next generation systems.

56 citations

Proceedings Article
01 Jan 1998
TL;DR: This research presents a novel, scalable, scalable and scalable approaches that can be used to improve the quality of voice-to-text communication in the rapidly changing environment.
Abstract: ∗ This research was supported by Grant No IRI-9530666 from the National Science Foundation and Grant No DABT63-95-C-007 from DARPA ✝ First author: Center for Human-Computer Communication, Department of Computer Science, Oregon Graduate Institute of Science & Technology, PO Box 91000, Portland, OR, 97291 (oviatt@cseogiedu; http://wwwcseogiedu/CHCC/); Second author: Al Tech Corp, Boston, Mass ABSTRACT

27 citations


Cited by
More filters
Journal ArticleDOI
08 Sep 2003
TL;DR: It is argued that next-generation human-computer interaction (HCI) designs need to include the essence of emotional intelligence - the ability to recognize a user's affective states- in order to become more human-like, more effective, and more efficient.
Abstract: The ability to recognize affective states of a person we are communicating with is the core of emotional intelligence. Emotional intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for successful interpersonal social interaction. This paper argues that next-generation human-computer interaction (HCI) designs need to include the essence of emotional intelligence - the ability to recognize a user's affective states-in order to become more human-like, more effective, and more efficient. Affective arousal modulates all nonverbal communicative cues (facial expressions, body movements, and vocal and physiological reactions). In a face-to-face interaction, humans detect and interpret those interactive signals of their communicator with little or no effort. Yet design and development of an automated system that accomplishes these tasks is rather difficult. This paper surveys the past work in solving these problems by a computer and provides a set of recommendations for developing the first part of an intelligent multimodal HCI-an automatic personalized analyzer of a user's nonverbal affective feedback.

823 citations

Journal ArticleDOI
TL;DR: Well-designed multimodal systems integrate complementary modalities to yield a highly synergistic blend in which the strengths of each mode are capitalized upon and used to overcome weaknesses in the other.
Abstract: Multimodal systems process combined natural input modes—such as speech, pen, touch, hand gestures, eye gaze, and head and body movements—in a coordinated manner with multimedia system output. These systems represent a new direction for computing that draws from novel input and output technologies currently becoming available. Since the appearance of Bolt's [1] \" Put That There \" demonstration system, which processed speech in parallel with manual pointing, a variety of multimodal systems has emerged. Some rudimentary ones process speech combined with mouse pointing, such as the early CUBRICON system [8]. Others recognize speech while determining the location of pointing from users' manual gestures or gaze [5]. Moving from traditional interfaces toward interfaces offering users greater expressive power, naturalness, and portability. Recent multimodal systems now recognize a broader range of signal integrations, which are no longer limited to the simple point-and-speak combinations handled by earlier systems. For example, the Quickset system integrates speech with pen input that includes drawn graphics, symbols, gestures, and pointing. It uses a semantic unification process to combine the meaningful multimodal information carried by two input signals, both of which are rich and multidimensional. Quickset also uses a multi-agent architecture and runs on a handheld PC [3]. Figure 1 illustrates Quickset's response to the multi-modal command \" Airstrips... facing this way, facing this way, and facing this way, \" which was spoken while the user drew arrows placing three airstrips in correct orientation on a map. Multimodal systems represent a research-level paradigm shift away from conventional windows-icons-menus-pointers (WIMP) interfaces toward providing users with greater expressive power, naturalness, flexibility , and portability. Well-designed multimodal systems integrate complementary modalities to yield a highly synergistic blend in which the strengths of each mode are capitalized upon and used to overcome weaknesses in the other. Such systems potentially can function more robustly than unimodal systems that involve a single recognition-based technology such as speech, pen, or vision.

658 citations

Book
01 Jan 2002
TL;DR: This chapter will review the main types of multimodal interfaces, their advantages and cognitive science underpinnings, primary features and architectural characteristics, and general research in the field of multi-modal interaction and interface design.
Abstract: Multimodal systems process two or more combined user input modes— such as speech, pen, touch, manual gestures, gaze, and head and body movements— in a coordinated manner with multimedia system output. This class of systems represents a new direction for computing, and a paradigm shift away from conventional WIMP interfaces. Since the appearance of Bolt’s (1980) “Put That There” demonstration system, which processed speech in parallel with touch-pad pointing, a variety of new multimodal systems has emerged. This new class of interfaces aims to recognize naturally occurring forms of human language and behavior, which incorporate at least one recognitionbased technology (e.g., speech, pen, vision). The development of novel multimodal systems has been enabled by the myriad input and output technologies currently becoming available, including new devices and improvements in recognition-based technologies. This chapter will review the main types of multimodal interfaces, their advantages and cognitive science underpinnings, primary features and architectural characteristics, and general research in the field of multimodal interaction and interface design.

528 citations

Proceedings ArticleDOI
01 Nov 1997
TL;DR: QuickSet: Multimodal Interaction for Distributed Applications Philip R. Cohen, Michael Johnston, David McGee, Sharon Oviatt, Jay Pittman, Ira Smith, Liang Chen and Josh Glow Center for Human Computer Communication Oregon Graduate Institute of Science and Technology.
Abstract: QuickSet: Multimodal Interaction for Distributed Applications Philip R. Cohen, Michael Johnston, David McGee, Sharon Oviatt, Jay Pittman, Ira Smith, Liang Chen and Josh Glow Center for Human Computer Communication Oregon Graduate Institute of Science and Technology

516 citations

Journal ArticleDOI
TL;DR: The emerging architectural approaches for interpreting speech and pen-based gestural input in a robust manner are summarized-including early and late fusion approaches, and the new hybrid symbolic-statistical approach.
Abstract: The growing interest in multimodal interface design is inspired in large part by the goals of supporting more transparent, flexible, efficient, and powerfully expressive means of human-computer interaction than in the past. Multimodal interfaces are expected to support a wider range of diverse applications, be usable by a broader spectrum of the average population, and function more reliably under realistic and challenging usage conditions. In this article, we summarize the emerging architectural approaches for interpreting speech and pen-based gestural input in a robust manner-including early and late fusion approaches, and the new hybrid symbolic-statistical approach. We also describe a diverse collection of state-of-the-art multimodal systems that process users' spoken and gestural input. These applications range from map-based and virtual reality systems for engaging in simulations and training, to field medic systems for mobile use in noisy environments, to web-based transactions and standard text-editing applications that will reshape daily computing and have a significant commercial impact. To realize successful multimodal systems of the future, many key research challenges remain to be addressed. Among these challenges are the development of cognitive theories to guide multimodal system design, and the development of effective natural language processing, dialogue processing, and error-handling techniques. In addition, new multimodal systems will be needed that can function more robustly and adaptively, and with support for collaborative multiperson use. Before this new class of systems can proliferate, toolkits also will be needed to promote software development for both simulated and functioning systems.

381 citations