Perception of Synthetic Visual Speech

doi:10.1007/978-3-662-13015-5_11

Book ChapterDOI

Perception of Synthetic Visual Speech

Michael M. Cohen, +2 more

- pp 153-168

Chats0

TLDR

Recognition of the synthetic talker is reasonably close to that of the human talker, but a significant distance remains to be covered and improvements to the synthetic phoneme specifications are discussed.

Abstract:

We report here on an experiment comparing visual recognition of monosyllabic words produced either by our computer-animated talker or a human talker. Recognition of the synthetic talker is reasonably close to that of the human talker, but a significant distance remains to be covered and we discuss improvements to the synthetic phoneme specifications. In an additional experiment using the same paradigm, we compare perception of our animated talker with a similarly generated point-light display, finding significantly worse performance for the latter for a number of viseme classes. We conclude with some ideas for future progress and briefly describe our new animated tongue.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Development and evaluation of a computer-animated tutor for vocabulary and language learning in children with autism.

Alexis Bosseler, +1 more

- 01 Dec 2003 -

Journal of Autism and Developmental Diso...

TL;DR: The research indicates that children with autism are capable of learning new language within an automated program centered around a computer-animated agent, multimedia, and active participation and can transfer and use the language in a natural, untrained environment.

...read moreread less

Journal ArticleDOI

Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images

Pierre Badin, +5 more

- 01 Jul 2002 -

Journal of Phonetics

TL;DR: The geometry of these vocal organs is measured on one subject uttering a corpus of sustained articulations in French to imply that most 3D features such as tongue groove or lateral channels can be controlled by articulatory parameters defined for the midsagittal model.

...read moreread less

Book ChapterDOI

Developing and evaluating conversational agents

Dominic W. Massaro, +3 more

TL;DR: The use of the agent is expanded in educational and therapeutic environments, as in the learning of non-native languages and in learning to read, and to create a human–computer interface centered on a virtual, conversational agent.

...read moreread less

Journal ArticleDOI

Attention to Facial Regions in Segmental and Prosodic Visual Speech Perception Tasks

Charissa R. Lansing, +1 more

- 01 Jun 1999 -

Journal of Speech Language and Hearing R...

TL;DR: The results indicate that information in the upper part of the talker's face is more critical for intonation pattern decisions than for decisions about word segments or primary sentence stress, thus supporting the Gaze Direction Assumption.

...read moreread less

Picture My Voice : Audio to Visual Speech Synthesis using Artificial Neural Networks

Dominic W. Massaro, +4 more

TL;DR: Through a series of audiovisual perceptual experiments withnoise-degraded audio, it is demonstrated that the animated talking head provides significantly increased intelligibility over the audio-only case, in some cases not significantly below that provided by a natural face.

...read moreread less

Collapse

Perception of Synthetic Visual Speech

Citations

Development and evaluation of a computer-animated tutor for vocabulary and language learning in children with autism.

Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images

Developing and evaluating conversational agents

Attention to Facial Regions in Segmental and Prosodic Visual Speech Perception Tasks

Picture My Voice : Audio to Visual Speech Synthesis using Artificial Neural Networks

Related Papers (5)

Perceiving talking faces: from speech perception to a behavioral principle

Visual contribution to speech intelligibility in noise

Hearing lips and seeing voices

Modeling Coarticulation in Synthetic Visual Speech

Video Rewrite: driving visual speech with audio