Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Automated lip-synch and speech synthesis for character animation

[...]

John P. Lewis¹, Frederic I. Parke¹•Institutions (1)

New York Institute of Technology¹

01 May 1986

TL;DR: An automated method of synchronizing facial animation to recorded speech is described, which retains intelligibility and natural speech rhythm while achieving a “synthetic realism” consistent with computer animation.

...read moreread less

Abstract: An automated method of synchronizing facial animation to recorded speech is described. In this method, a common speech synthesis method (linear prediction) is adapted to provide simple and accurate phoneme recognition. The recognized phonemes are then associated with mouth positions to provide keyframes for computer animation of speech using a parametric model of the human face.The linear prediction software, once implemented, can also be used for speech resynthesis. The synthesis retains intelligibility and natural speech rhythm while achieving a “synthetic realism” consistent with computer animation. Speech synthesis also enables certain useful manipulations for the purpose of computer character animation.

...read moreread less

104 citations

Journal Article•DOI•

Automated lip-sync: Background and techniques

[...]

John P. Lewis¹•Institutions (1)

Princeton University¹

01 Oct 1991-Journal of Visualization and Computer Animation

TL;DR: It is indicated that the automatic derivation of mouth movement from a speech soundtrack is a tractable problem and a common speech synthesis method, linear prediction, is adapted to provide simple and accurate phoneme recognition.

...read moreread less

Abstract: SUMMARY The problem of creating mouth animation synchronized to recorded speech is discussed. Review of a model of speech sound generation indicates that the automatic derivation of mouth movement from a speech soundtrack is a tractable problem. Several automatic lip-sync techniques are compared, and one method is described in detail. In this method a common speech synthesis method, linear prediction, is adapted to provide simple and accurate phoneme recognition. The recognized phonemes are associated with mouth positions to provide keyframes for computer animation of speech. Experience with this technique indicates that automatic lipsync can produce useful results.

...read moreread less

101 citations

Patent•

Talking facial display method and apparatus

[...]

Tomaso Poggio¹, Antoine F. Ezzat¹•Institutions (1)

Massachusetts Institute of Technology¹

31 Dec 1998-Journal of the Acoustical Society of America

TL;DR: In this paper, a method and apparatus of converting input text into an audio-visual speech stream resulting in a talking face image enunciating the text is presented, which is then displayed in real time, thereby displaying photo-realistic talking face.

...read moreread less

Abstract: A method and apparatus of converting input text into an audio-visual speech stream resulting in a talking face image enunciating the text. This method of converting input text into an audio-visual speech stream comprises the steps of: recording a visual corpus of a human-subject, building a viseme interpolation database, and synchronizing the talking face image with the text stream. In a preferred embodiment, viseme transitions are automatically calculated using optical flow methods, and morphing techniques are employed to result in smooth viseme transitions. The viseme transitions are concatenated together and synchronized with the phonemes according to the timing information. The audio-visual speech stream is then displayed in real time, thereby displaying a photo-realistic talking face.

...read moreread less

98 citations

Patent•

Hearing assist device with directional detection and sound modification

[...]

Carl M. Panasik¹, Thomas M. Siep¹, Trudy D. Stetzler¹, Pedro R. Gelabert¹•Institutions (1)

Texas Instruments¹

09 Oct 1998-Journal of the Acoustical Society of America

TL;DR: In this paper, a system for learning a mapping between time-varying signals is used to drive facial animation directly from speech, without laborious voice track analysis, and the output of the system is a sequence of facial control parameters suitable for driving a variety of different kinds of animation ranging from warped photorealistic images to 3D cartoon characters.

...read moreread less

Abstract: A system for learning a mapping between time-varying signals is used to drive facial animation directly from speech, without laborious voice track analysis. The system learns dynamical models of facial and vocal action from observations of a face and the facial gestures made while speaking. Instead of depending on heuristic intermediate representations such as phonemes or visemes, the system trains hidden Markov models to obtain its own optimal representation of vocal and facial action. An entropy-minimizing training technique using an entropic prior ensures that these models contain sufficient dynamical information to synthesize realistic facial motion to accompany new vocal performances. In addition, they can make optimal use of context to handle ambiguity and relatively long-lasting facial co-articulation effects. The output of the system is a sequence of facial control parameters suitable for driving a variety of different kinds of animation ranging from warped photorealistic images to 3D cartoon characters.

...read moreread less

96 citations

Patent•DOI•

Apparatuses and methods for developing and using models for speech recognition

[...]

Laurence S. Gillick, Francesco Scattone

23 Jan 1995-Journal of the Acoustical Society of America

TL;DR: A computerized system time aligns frames of spoken training data against models of the speech sounds; automatically selects different sets of phonetic context classifications which divide the speech sound models into speech sound groups aligned against acoustically similar frames; creates model components from the frames aligned againstspeech sound groups with related classifications; and uses these model components to build a separate model for each related speech sound group.

...read moreread less

Abstract: A computerized system time aligns frames of spoken training data against models of the speech sounds; automatically selects different sets of phonetic context classifications which divide the speech sound models into speech sound groups aligned against acoustically similar frames; creates model components from the frames aligned against speech sound groups with related classifications; and uses these model components to build a separate model for each related speech sound group. A decision tree classifies speech sounds into such groups, and related speech sound groups descend from common tree nodes. New speech samples time aligned against a given speech sound group's model update models of related speech sound groups, decreasing the training data required to adapt the system. The phonetic context classifications can be based on knowledge of which contextual features are associated with acoustic similarity. The computerized system samples speech sounds using a first, larger, parameter set; automatically selects combinations of phonetic context classifications which divide the speech sounds into groups whose frames are acoustically similar, such as by use of a decision tree; selects a second, smaller, set of parameters based on that set's ability to separate the frames aligned with each speech sound group, such as by used of linear discriminant analysis; and then uses these new parameters to represent frames and speech sound models. Then, using the new parameters, a decision tree classifier can be used to re-classify the speech sounds and to calculate new acoustic models for the resulting groups of speech sounds.

...read moreread less

95 citations

Collapse

Network Information

Performance

Metrics

884

Papers

19,235

Citations

No. of papers in the topic in previous years
Year	Papers
2023	7
2022	12
2021	13
2020	39
2019	19
2018	22

Viseme

Papers published on a yearly basis

Papers

Trending Questions (8)

Network Information

Related Topics (5)

Performance

Metrics