scispace - formally typeset
Search or ask a question

Showing papers on "Viseme published in 1991"


Journal ArticleDOI
TL;DR: It is indicated that the automatic derivation of mouth movement from a speech soundtrack is a tractable problem and a common speech synthesis method, linear prediction, is adapted to provide simple and accurate phoneme recognition.
Abstract: SUMMARY The problem of creating mouth animation synchronized to recorded speech is discussed. Review of a model of speech sound generation indicates that the automatic derivation of mouth movement from a speech soundtrack is a tractable problem. Several automatic lip-sync techniques are compared, and one method is described in detail. In this method a common speech synthesis method, linear prediction, is adapted to provide simple and accurate phoneme recognition. The recognized phonemes are associated with mouth positions to provide keyframes for computer animation of speech. Experience with this technique indicates that automatic lipsync can produce useful results.

101 citations



Book ChapterDOI
01 Jan 1991
TL;DR: This article found that utterances that they produce and hear are in fact not divided by short pauses into words, and this impression of hearing and speaking words is not lost even by experienced speech researchers who are well aware of the fact utterances are not acoustically segmented into words.
Abstract: Speakers have the clear intuition that when speaking they say words and when spoken to they hear words. It comes as a considerable surprise to naive speakers to discover that the utterances that they produce and hear are in fact not divided by short pauses into words. And this impression of hearing and speaking words is not lost even by experienced speech researchers who are well aware of the fact that utterances are not acoustically segmented into words.

40 citations


Journal ArticleDOI
TL;DR: It is clear that different processes are operating at different ages; however, more complex processes may come into play around the ages of 6 to 10 years; boys may use different strategies than girls, and with age, a multiplicity of processes may be concurrently active.
Abstract: Using a developmental approach, two aspects of debate in the speech perception literature were tested, (a) the nature of adult speech processing, the dichotomy being along nonlinguistic versus linguistic lines, and (b) the nature of speech processing by children of different ages, the hypotheses here implying in infancy detector-like processes and at age four "adult-like" speech perception reorganizations Children ranging in age from 4 up to 18 years discriminated native and foreign speech contrasts Results confirm the hypotheses for adults It is clear that different processes are operating at different ages; however, more complex processes may come into play around the ages of 6 to 10 years; boys may use different strategies than girls, and with age, a multiplicity of processes may be concurrently active

1 citations


Proceedings ArticleDOI
30 Aug 1991
TL;DR: A new solution for the acoustic-to-articulatory inversion problem, based on distinctive region and mode theory, is suggested, to study the phonetic role of coarticulation.
Abstract: A new solution for the acoustic-to-articulatory inversion problem, based on distinctive region and mode theory, is suggested. Vowel sounds are coded in terms of the distinctive region synergy ratios which relate to the position of the body of the tongue in the front/back and high/low dimensions. This feature is used to study the phonetic role of coarticulation. >

Journal Article
TL;DR: This paper proposes a new strategy of speech synthesis that uses intermediate-sized units corresponding to half syllables, called `demisyllables' in order to produce computer-generated speech.
Abstract: Synthesis of English speech by computer can be accomplished in several different ways, depending on the size of the speech units that are used to produce voice output. The most widely used units for speech synthesis are phonemes (i.e., small speech units corresponding to individual phonetic items). An alternate method of producing computer-generated speech is to concatenate entire words of English in a method called `word-concatenation' synthesis. A third strategy, the one described in this paper, is to use intermediate-sized units corresponding to half syllables, called `demisyllables'