TL;DR: An automated method of synchronizing facial animation to recorded speech is described, which retains intelligibility and natural speech rhythm while achieving a “synthetic realism” consistent with computer animation.

...read moreread less

Abstract: An automated method of synchronizing facial animation to recorded speech is described. In this method, a common speech synthesis method (linear prediction) is adapted to provide simple and accurate phoneme recognition. The recognized phonemes are then associated with mouth positions to provide keyframes for computer animation of speech using a parametric model of the human face.The linear prediction software, once implemented, can also be used for speech resynthesis. The synthesis retains intelligibility and natural speech rhythm while achieving a “synthetic realism” consistent with computer animation. Speech synthesis also enables certain useful manipulations for the purpose of computer character animation.

...read moreread less

104 citations

Journal Article•DOI•

The Perception of Quantity in Icelandic

[...]

Jörgen Pind

01 Jan 1986-Phonetica

TL;DR: This article found that the acoustic representation of speech segments is heavily context-dependent, in particular the durations of speech sounds do not depend on the context of the speaker's utterances.

...read moreread less

Abstract: Numerous studies, both in speech production and perception, have found that the acoustic representation of speech segments is heavily context-dependent. In particular the durations of speech sounds do

...read moreread less

31 citations

Journal Article•DOI•

Speech Recognition by Image Processing of Lip Movements

[...]

Kiyotoshi Matsuoka¹, Tadayoshi Furuya¹, Kenji Kurosu¹•Institutions (1)

Kyushu Institute of Technology¹

28 Feb 1986-Journal of the Society of Instrument and Control Engineers

TL;DR: An attempt to realize lip reading, using techniques of image processing and pattern recognition, in which the result showed a remarkable capability of lip readig for a small number of words.

...read moreread less

Abstract: Lip reading is a regular method that enables the deaf to understand other people's speech by visual information. The acquisition of the technique, however, requires great effort and long time, and the educational system for its teaching is not established yet. This paper describes an attempt to realize lip reading, using techniques of image processing and pattern recognition, in which we aim at clarifying possibilities and limitations inherently existing in lip reading. Although our final goal is the realization of speech understanding, this paper only deals with recognition of vowels and words of the Japanese language as the first step. The front or side view of the mouth is taken with a TV camera, and some feature values of the lip shape were extracted. Discrimination of five vowels were performed by the maximum-likelihood method and the vowels were correctly discriminated by more than about 80% Moreover, word recognition based upon the vowel discrimination was performed. The result showed a remarkable capability of lip readig for a small number of words. Finally, several problems are discussed in relation to the actual lip reading of the deaf.

...read moreread less

18 citations

Journal Article•DOI•

Speech recognition enhancement by lip information

[...]

Shogo Nishida¹•Institutions (1)

Mitsubishi Electric¹

01 Apr 1986

TL;DR: The algorithms proposed here are composed of simple image-processing, and it is shown they work well and will make it possible to realize them in real-time.

...read moreread less

Abstract: Though technology in speech recognition has progressed recently, Automatic Speech Recognition (ASR) is vulnerable to noise. Lip-information is thought to be useful for speech recognition in noisy situations, such as in a factory or in a car.This paper describes speech recognition enhancement by lip-information. Two types of usage are dealt with. One is the detection of start and stop of speech from lip-information. This is the simplest usage of lip-information. The other is lip-pattern recognition, and it is used for speech recognition together with sound information. The algorithms for both usages are proposed, and the experimental system shows they work well. The algorithms proposed here are composed of simple image-processing. Future progress in image-processing will make it possible to realize them in real-time.

...read moreread less

17 citations

Book Chapter•DOI•

2. Parts of Speech

[...]

Ken Hale, Paul Platero

31 Jan 1986

4 citations

Book•

The speech recognition problem

[...]

G. Bristow

01 Nov 1986

3 citations

Journal Article•DOI•

Decoding the speech code—Applications of temporal decomposition

[...]

Stephen Michael Marcus, Bishnu S. Atal

01 Dec 1986-Journal of the Acoustical Society of America

TL;DR: It will be shown that it is possible to decompose the speech signal into overlapping “temporal transition functions” using techniques which make no assumptions about the phonetic structure of the signal or the articulatory constraints used in speech production.

...read moreread less

Abstract: Articulatory phonetics describes speech as a sequence of overlapping articulatory gestures, each of which may be associated with a characteristic ideal target spectrum. In normal speech, the idealized target gestures for each speech sound are often never attained, and the speech signal exhibits only transitions between such (implicit) targets. It has been suggested that the underlying speech sounds can only be recovered by reference to detailed knowledge of the gestures by which individual speech sounds are produced. It will be shown that it is possible to decompose the speech signal into overlapping “temporal transition functions” using techniques which make no assumptions about the phonetic structure of the signal or the articulatory constraints used in speech production. Previous work has shown that these techniques can produce a large reduction in the information rate needed to represent the spectral information in speech signals [B.S. Atal, Proc. ICASSP 83, 2.6, 81–84 (1983)]. It will be shown that these methods are able to derive speech components of low bandwiths that vary on a time scale closely related to traditional phonetic events. Implications for perception and the application of such techniques both for speech coding and as a possible front end for speech recognition will be discussed.

...read moreread less

2 citations

Book•

The elements of speech recognition

[...]

W. A. Lea

01 Nov 1986

1 citations