scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
Book
01 Jun 1986

385 citations

Journal ArticleDOI
TL;DR: In this paper, a model for the identification of speech sounds is proposed that assumes that the acoustic cues are perceived independently, feature evaluation provides information about the degree to which each quality is present in the speech sound, and each speech sound is denned by a propositional prototype in longterm memory that determines how the featural information is integrated.
Abstract: A model for the identification of speech sounds is proposed that assumes that (a) the acoustic cues are perceived independently, (b) feature evaluation provides information about the degree to which each quality is present in the speech sound, (c) each speech sound is denned by a propositional prototype in longterm memory that determines how the featural information is integrated, and (d) the speech sound is identified on the basis of the relative degree to which it matches the various alternative prototypes. The model was supported by the results of an experiment in which subjects identified stop-consonant-vowel syllables that were factorially generated by independently varying acoustic cues for voicing and for place of articulation. This experiment also replicated previous findings of changes in the identification boundary of one acoustic dimension as a function of the level of another dimension. These results have previously been interpreted as evidence for the interaction of the perceptions of the acoustic features themselves. In contrast, the present model provides a good description of the data, including these boundary changes, while still maintaining complete noninteraction at the feature evaluation stage of processing. Although considerable progress has been made in the field of speech perception in recent years, there is still much that is unknown about the details of how speech sounds are perceived and discriminated. In particular, while there has been considerable success in isolating the dimensions of acoustic information that are important in perceiving and identifying speech sounds, very little is known about how the information from the various acoustic dimensions is put together in order to actually accomplish identification. The present article proposes and tests a model of these fundamental integration processes that take place during speech perception. Much of the study of features in speech has focused on the stop consonants of English. The stop consonants are a set of speech sounds

330 citations

Journal ArticleDOI
TL;DR: It is suggested that near-perfect consonant identification performance could be attained by subjects who receive only enveme and viseme information and no spectral information.
Abstract: This study investigated the cues for consonant recognition that are available in the time‐intensity envelope of speech. Twelve normal‐hearing subjects listened to three sets of spectrally identical noise stimuli created by multiplying noise with the speech envelopes of 19 /aCa/ natural‐speech nonsense syllables. The speech envelope for each of the three noise conditions was derived using a different low‐pass filter cutoff (20, 200, and 2000 Hz). Average consonant identification performance was above chance for the three noise conditions and improved significantly with the increase in envelope bandwidth from 20–200 Hz. SINDSCAL multidimensional scaling analysis of the consonant confusions data identified three speech envelope features that divided the 19 consonants into four envelope feature groups (‘‘envemes’’). The enveme groups in combination with visually distinctive speech feature groupings (‘‘visemes’’) can distinguish most of the 19 consonants. These results suggest that near‐perfect consonant identification performance could be attained by subjects who receive only enveme and viseme information and no spectral information.

313 citations

Journal ArticleDOI
TL;DR: Audiovisual speech processing results have shown that, with lip reading, it is possible to enhance the reliability of audio speech recognition, which may result in a computer that can truly understand the user via hand-free natural spoken language even in a very noisy environments.
Abstract: We have reported activities in audiovisual speech processing, with emphasis on lip reading and lip synchronization. These research results have shown that, with lip reading, it is possible to enhance the reliability of audio speech recognition, which may result in a computer that can truly understand the user via hand-free natural spoken language even in a very noisy environments. Similarly, with lip synchronization, it is possible to render realistic talking heads with lip movements synchronized with the voice, which is very useful for human-computer interactions. We envision that in the near future, advancement in audiovisual speech processing will greatly increase the usability of computers. Once that happens, the cameras and the microphone may replace the keyboard and the mouse as better mechanisms for human-computer interaction.

244 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822