scispace - formally typeset
Search or ask a question

Showing papers on "Viseme published in 1986"


Book
01 Jun 1986

385 citations



Proceedings ArticleDOI
01 May 1986
TL;DR: An automated method of synchronizing facial animation to recorded speech is described, which retains intelligibility and natural speech rhythm while achieving a “synthetic realism” consistent with computer animation.
Abstract: An automated method of synchronizing facial animation to recorded speech is described. In this method, a common speech synthesis method (linear prediction) is adapted to provide simple and accurate phoneme recognition. The recognized phonemes are then associated with mouth positions to provide keyframes for computer animation of speech using a parametric model of the human face.The linear prediction software, once implemented, can also be used for speech resynthesis. The synthesis retains intelligibility and natural speech rhythm while achieving a “synthetic realism” consistent with computer animation. Speech synthesis also enables certain useful manipulations for the purpose of computer character animation.

104 citations


Journal ArticleDOI
TL;DR: This article found that the acoustic representation of speech segments is heavily context-dependent, in particular the durations of speech sounds do not depend on the context of the speaker's utterances.
Abstract: Numerous studies, both in speech production and perception, have found that the acoustic representation of speech segments is heavily context-dependent. In particular the durations of speech sounds do

31 citations


Journal ArticleDOI
TL;DR: An attempt to realize lip reading, using techniques of image processing and pattern recognition, in which the result showed a remarkable capability of lip readig for a small number of words.
Abstract: Lip reading is a regular method that enables the deaf to understand other people's speech by visual information. The acquisition of the technique, however, requires great effort and long time, and the educational system for its teaching is not established yet. This paper describes an attempt to realize lip reading, using techniques of image processing and pattern recognition, in which we aim at clarifying possibilities and limitations inherently existing in lip reading. Although our final goal is the realization of speech understanding, this paper only deals with recognition of vowels and words of the Japanese language as the first step. The front or side view of the mouth is taken with a TV camera, and some feature values of the lip shape were extracted. Discrimination of five vowels were performed by the maximum-likelihood method and the vowels were correctly discriminated by more than about 80% Moreover, word recognition based upon the vowel discrimination was performed. The result showed a remarkable capability of lip readig for a small number of words. Finally, several problems are discussed in relation to the actual lip reading of the deaf.

18 citations


Journal ArticleDOI
01 Apr 1986
TL;DR: The algorithms proposed here are composed of simple image-processing, and it is shown they work well and will make it possible to realize them in real-time.
Abstract: Though technology in speech recognition has progressed recently, Automatic Speech Recognition (ASR) is vulnerable to noise. Lip-information is thought to be useful for speech recognition in noisy situations, such as in a factory or in a car.This paper describes speech recognition enhancement by lip-information. Two types of usage are dealt with. One is the detection of start and stop of speech from lip-information. This is the simplest usage of lip-information. The other is lip-pattern recognition, and it is used for speech recognition together with sound information. The algorithms for both usages are proposed, and the experimental system shows they work well. The algorithms proposed here are composed of simple image-processing. Future progress in image-processing will make it possible to realize them in real-time.

17 citations


Book ChapterDOI
31 Jan 1986

4 citations


Book
01 Nov 1986

3 citations


Journal ArticleDOI
TL;DR: It will be shown that it is possible to decompose the speech signal into overlapping “temporal transition functions” using techniques which make no assumptions about the phonetic structure of the signal or the articulatory constraints used in speech production.
Abstract: Articulatory phonetics describes speech as a sequence of overlapping articulatory gestures, each of which may be associated with a characteristic ideal target spectrum. In normal speech, the idealized target gestures for each speech sound are often never attained, and the speech signal exhibits only transitions between such (implicit) targets. It has been suggested that the underlying speech sounds can only be recovered by reference to detailed knowledge of the gestures by which individual speech sounds are produced. It will be shown that it is possible to decompose the speech signal into overlapping “temporal transition functions” using techniques which make no assumptions about the phonetic structure of the signal or the articulatory constraints used in speech production. Previous work has shown that these techniques can produce a large reduction in the information rate needed to represent the spectral information in speech signals [B.S. Atal, Proc. ICASSP 83, 2.6, 81–84 (1983)]. It will be shown that these methods are able to derive speech components of low bandwiths that vary on a time scale closely related to traditional phonetic events. Implications for perception and the application of such techniques both for speech coding and as a possible front end for speech recognition will be discussed.

2 citations


Book
01 Nov 1986

1 citations