scispace - formally typeset
Search or ask a question
Author

Sharon Oviatt

Bio: Sharon Oviatt is an academic researcher from Monash University. The author has contributed to research in topics: Multimodal interaction & Spoken language. The author has an hindex of 48, co-authored 145 publications receiving 9672 citations. Previous affiliations of Sharon Oviatt include Artificial Intelligence Center & Oregon National Primate Research Center.


Papers
More filters
Journal ArticleDOI
TL;DR: Well-designed multimodal systems integrate complementary modalities to yield a highly synergistic blend in which the strengths of each mode are capitalized upon and used to overcome weaknesses in the other.
Abstract: Multimodal systems process combined natural input modes—such as speech, pen, touch, hand gestures, eye gaze, and head and body movements—in a coordinated manner with multimedia system output. These systems represent a new direction for computing that draws from novel input and output technologies currently becoming available. Since the appearance of Bolt's [1] \" Put That There \" demonstration system, which processed speech in parallel with manual pointing, a variety of multimodal systems has emerged. Some rudimentary ones process speech combined with mouse pointing, such as the early CUBRICON system [8]. Others recognize speech while determining the location of pointing from users' manual gestures or gaze [5]. Moving from traditional interfaces toward interfaces offering users greater expressive power, naturalness, and portability. Recent multimodal systems now recognize a broader range of signal integrations, which are no longer limited to the simple point-and-speak combinations handled by earlier systems. For example, the Quickset system integrates speech with pen input that includes drawn graphics, symbols, gestures, and pointing. It uses a semantic unification process to combine the meaningful multimodal information carried by two input signals, both of which are rich and multidimensional. Quickset also uses a multi-agent architecture and runs on a handheld PC [3]. Figure 1 illustrates Quickset's response to the multi-modal command \" Airstrips... facing this way, facing this way, and facing this way, \" which was spoken while the user drew arrows placing three airstrips in correct orientation on a map. Multimodal systems represent a research-level paradigm shift away from conventional windows-icons-menus-pointers (WIMP) interfaces toward providing users with greater expressive power, naturalness, flexibility , and portability. Well-designed multimodal systems integrate complementary modalities to yield a highly synergistic blend in which the strengths of each mode are capitalized upon and used to overcome weaknesses in the other. Such systems potentially can function more robustly than unimodal systems that involve a single recognition-based technology such as speech, pen, or vision.

658 citations

Book
01 Jan 2002
TL;DR: This chapter will review the main types of multimodal interfaces, their advantages and cognitive science underpinnings, primary features and architectural characteristics, and general research in the field of multi-modal interaction and interface design.
Abstract: Multimodal systems process two or more combined user input modes— such as speech, pen, touch, manual gestures, gaze, and head and body movements— in a coordinated manner with multimedia system output. This class of systems represents a new direction for computing, and a paradigm shift away from conventional WIMP interfaces. Since the appearance of Bolt’s (1980) “Put That There” demonstration system, which processed speech in parallel with touch-pad pointing, a variety of new multimodal systems has emerged. This new class of interfaces aims to recognize naturally occurring forms of human language and behavior, which incorporate at least one recognitionbased technology (e.g., speech, pen, vision). The development of novel multimodal systems has been enabled by the myriad input and output technologies currently becoming available, including new devices and improvements in recognition-based technologies. This chapter will review the main types of multimodal interfaces, their advantages and cognitive science underpinnings, primary features and architectural characteristics, and general research in the field of multimodal interaction and interface design.

528 citations

Proceedings ArticleDOI
01 Nov 1997
TL;DR: QuickSet: Multimodal Interaction for Distributed Applications Philip R. Cohen, Michael Johnston, David McGee, Sharon Oviatt, Jay Pittman, Ira Smith, Liang Chen and Josh Glow Center for Human Computer Communication Oregon Graduate Institute of Science and Technology.
Abstract: QuickSet: Multimodal Interaction for Distributed Applications Philip R. Cohen, Michael Johnston, David McGee, Sharon Oviatt, Jay Pittman, Ira Smith, Liang Chen and Josh Glow Center for Human Computer Communication Oregon Graduate Institute of Science and Technology

516 citations

Proceedings ArticleDOI
27 Mar 1997
TL;DR: The present research analyzed multimodal interaction while people spoke and wrote to a simulated dynamic map system and revealed that the temporal precedence of writing over speech was a major theme, with pen input conveying location information first in a sentence.
Abstract: Our ability to develop robust multimodal systems will depend on knowledge of the natural integration patterns that typify people's combined use of different input modes. To provide a foundation for theory and design, the present research analyzed multimodal interaction while people spoke and wrote to a simulated dynamic map system. Task analysis revealed that multimodal interaction occurred most frequently during spatial location commands, and with intermediate frequency during selection commands. In addition, microanalysis of input signals identified sequential, simultaneous, point-and-speak, and compound integration patterns, as well as data on the temporal precedence of modes and on inter-modal lags. In synchronizing input streams, the temporal precedence of writing over speech was a major theme, with pen input conveying location information first in a sentence. Linguistic analysis also revealed that the spoken and written modes consistently supplied complementary semantic information, rather than redundant. One long-term goal of this research is the development of predictive models of natural modality integration to guide the design of emerging multimodal architectures.

441 citations

Journal ArticleDOI
TL;DR: The emerging architectural approaches for interpreting speech and pen-based gestural input in a robust manner are summarized-including early and late fusion approaches, and the new hybrid symbolic-statistical approach.
Abstract: The growing interest in multimodal interface design is inspired in large part by the goals of supporting more transparent, flexible, efficient, and powerfully expressive means of human-computer interaction than in the past. Multimodal interfaces are expected to support a wider range of diverse applications, be usable by a broader spectrum of the average population, and function more reliably under realistic and challenging usage conditions. In this article, we summarize the emerging architectural approaches for interpreting speech and pen-based gestural input in a robust manner-including early and late fusion approaches, and the new hybrid symbolic-statistical approach. We also describe a diverse collection of state-of-the-art multimodal systems that process users' spoken and gestural input. These applications range from map-based and virtual reality systems for engaging in simulations and training, to field medic systems for mobile use in noisy environments, to web-based transactions and standard text-editing applications that will reshape daily computing and have a significant commercial impact. To realize successful multimodal systems of the future, many key research challenges remain to be addressed. Among these challenges are the development of cognitive theories to guide multimodal system design, and the development of effective natural language processing, dialogue processing, and error-handling techniques. In addition, new multimodal systems will be needed that can function more robustly and adaptively, and with support for collaborative multiperson use. Before this new class of systems can proliferate, toolkits also will be needed to promote software development for both simulated and functioning systems.

381 citations


Cited by
More filters
Book
01 Jan 2000
TL;DR: This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora, to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation.
Abstract: From the Publisher: This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora.Methodology boxes are included in each chapter. Each chapter is built around one or more worked examples to demonstrate the main idea of the chapter. Covers the fundamental algorithms of various fields, whether originally proposed for spoken or written language to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation. Emphasis on web and other practical applications. Emphasis on scientific evaluation. Useful as a reference for professionals in any of the areas of speech and language processing.

3,794 citations

01 Aug 2001
TL;DR: The study of distributed systems which bring to life the vision of ubiquitous computing systems, also known as ambient intelligence, is concentrated on in this work.
Abstract: With digital equipment becoming increasingly networked, either on wired or wireless networks, for personal and professional use alike, distributed software systems have become a crucial element in information and communications technologies. The study of these systems forms the core of the ARLES' work, which is specifically concerned with defining new system software architectures, based on the use of emerging networking technologies. In this context, we concentrate on the study of distributed systems which bring to life the vision of ubiquitous computing systems, also known as ambient intelligence.

2,774 citations

01 Nov 2008

2,686 citations

Journal ArticleDOI
TL;DR: A comprehensive review of the data fusion state of the art is proposed, exploring its conceptualizations, benefits, and challenging aspects, as well as existing methodologies.

1,684 citations