scispace - formally typeset
Search or ask a question

Showing papers on "Viseme published in 1987"


Journal ArticleDOI
TL;DR: It is suggested that near-perfect consonant identification performance could be attained by subjects who receive only enveme and viseme information and no spectral information.
Abstract: This study investigated the cues for consonant recognition that are available in the time‐intensity envelope of speech. Twelve normal‐hearing subjects listened to three sets of spectrally identical noise stimuli created by multiplying noise with the speech envelopes of 19 /aCa/ natural‐speech nonsense syllables. The speech envelope for each of the three noise conditions was derived using a different low‐pass filter cutoff (20, 200, and 2000 Hz). Average consonant identification performance was above chance for the three noise conditions and improved significantly with the increase in envelope bandwidth from 20–200 Hz. SINDSCAL multidimensional scaling analysis of the consonant confusions data identified three speech envelope features that divided the 19 consonants into four envelope feature groups (‘‘envemes’’). The enveme groups in combination with visually distinctive speech feature groupings (‘‘visemes’’) can distinguish most of the 19 consonants. These results suggest that near‐perfect consonant identification performance could be attained by subjects who receive only enveme and viseme information and no spectral information.

313 citations


Journal ArticleDOI

23 citations





Journal ArticleDOI
TL;DR: The 15 hours of training with each of these skilled lipreaders in the LA condition suggest that the consonants /s,l,TH/ are quite reliably identified through lipreading alone, and a processor optimized to enhance specifically these contrasts might prove to be the better speechreading aid.
Abstract: The homorganic obstruent pairs /p-b, t-d, k-g, ch-j, f-v, th-TH, s-z, sh-zh/ are notoriously confusable in the lipreading alone (LA) condition, and the nasal consonants Iml and Inl are often mistaken for their homorganic oral counteφarts. Implant users' speechreading of these viseme group members is improved by the addition of either multi­ ple channel or single-channel electrical stimulation.'^ The palatal obstruent distinctions /sh,zh,ch,]7 have been targeted for remediation via other speechreading aids.' One approach to implant sound processor setting involves opti­ mizing to distinguish the nonvisible frequently occurring consonants /t,d,k,s,z,n,l,TH/ (E. Schubert, personal com­ munication). This optimization method resulted in signifi­ cant speechreading improvement and even some open speech comprehension without lipreading for one deaf patient (M. White, personal communication). During a 50-week consonant training program conducted in our laboratory, two experimental subjects spent seven sessions identifying consonants in the LA condition, and five sub­ sequent sessions identifying the same consonants in the stimulation plus lipreading condition, aided by a singlechannel sound processor. Our 15 hours of training with each of these skilled lipreaders in the LA condition suggest that the consonants /s,l,TH/ are quite reliably identified through lipreading alone. Of the consonants not visible on the lips, /t,d,k,g,n,y/ are the troublesome contrasts. The singlechannel sound processor provides some help in disambigu­ ating these six consonants for deaf speechreaders, although a processor optimized to enhance specifically these contrasts might prove to be the better speechreading aid.

1 citations


Journal ArticleDOI
TL;DR: In this article, various aspects of the mechanisms of speech are studied, such as the perception of speech, the sounds of speech and word recognition, and articulation, the pronunciation of words and the sound of speech.
Abstract: Various aspects of the mechanisms of speech are studied. One series of studies has concentrated on the perception of speech, the sounds of speech, and word recognition. Various models for speech recognition have been created. Another set of studies has focused on articulation, the pronunciation of words and the sounds of speech. This area has also been explored in considerable detail.

01 Jan 1987
TL;DR: This work takes into account the two nearest neighbours and defines a “belonging degree” calculated from the distances between the vector and the two centroids, which gives better results in the case of speaker independent speech recognition system.
Abstract: In a classical quantization system, each vector is represented by the nearest centroid; two vectors belonging to the same class are then indistinguishable. In order to mitigate this situation, we take into account the two nearest neighbours and define a “belonging degree” calculated from the distances between the vector and the two centroids. In the case of speaker independent speech recognition system, this “fuzzy” quantization gives better results.