scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
Proceedings ArticleDOI
31 Oct 1994
TL;DR: A continuous optical automatic speech recognizer that uses optical information from the oral-cavity shadow of a speaker that achieves a 25.3 percent recognition on sentences having a perplexity of 150 without using any syntactic, semantic, acoustic, or contextual guides is described.
Abstract: We describe a continuous optical automatic speech recognizer (OASR) that uses optical information from the oral-cavity shadow of a speaker. The system achieves a 25.3 percent recognition on sentences having a perplexity of 150 without using any syntactic, semantic, acoustic, or contextual guides. We introduce 13, mostly dynamic, oral-cavity features used for optical recognition, present phones that appear optically similar (visemes) for our speaker, and present the recognition results for our hidden Markov models (HMMs) using visemes, trisemes, and generalized trisemes. We conclude that future research is warranted for optical recognition, especially when combined with other input modalities. >

80 citations

Proceedings ArticleDOI
12 May 1998
TL;DR: The proposed technique can generate synchronized lip movements with speech in a unified framework and coarticulation is implicitly incorporated into the generated mouth shapes, so that synthetic lip motion becomes smooth and realistic.
Abstract: This paper presents a new technique for synthesizing visual speech from arbitrarily given text. The technique is based on an algorithm for parameter generation from HMM with dynamic features, which has been successfully applied to text-to-speech synthesis. In the training phase, syllable HMMs are trained with visual speech parameter sequences that represent lip movements. In the synthesis phase, a sentence HMM is constructed by concatenating syllable HMMs corresponding to the phonetic transcription for the input text. Then an optimum visual speech parameter sequence is generated from the sentence HMM in an ML sense. The proposed technique can generate synchronized lip movements with speech in a unified framework. Furthermore, coarticulation is implicitly incorporated into the generated mouth shapes. As a result, synthetic lip motion becomes smooth and realistic.

79 citations

Journal ArticleDOI
TL;DR: This paper takes a novel approach for speech animation — using visyllables, the visual counterpart of syllables, into a concatenative visyllable based speech animation system that is easy to implement, effective for real‐time as well as non real-time applications and results into realistic speech animation.
Abstract: Visemes are visual counterpart of phonemes. Traditionally, the speech animation of 3D synthetic faces involvesextraction of visemes from input speech followed by the application of co-articulation rules to generate realisticanimation. In this paper, we take a novel approach for speech animation — using visyllables, the visual counterpartof syllables. The approach results into a concatenative visyllable based speech animation system. The key contributionof this paper lies in two main areas. Firstly, we define a set of visyllable units for spoken English along withthe associated phonological rules for valid syllables. Based on these rules, we have implemented a syllabificationalgorithm that allows segmentation of a given phoneme stream into syllables and subsequently visyllables. Secondly,we have recorded the database of visyllables using a facial motion capture system. The recorded visyllableunits are post-processed semi-automatically to ensure continuity at the vowel boundaries of the visyllables. We defineeach visyllable in terms of the Facial Movement Parameters (FMP). The FMPs are obtained as a result of thestatistical analysis of the facial motion capture data. The FMPs allow a compact representation of the visyllables.Further, the FMPs also facilitate the formulation of rules for boundary matching and smoothing after concatenatingthe visyllables units. Ours is the first visyllable based speech animation system. The proposed technique iseasy to implement, effective for real-time as well as non real-time applications and results into realistic speechanimation. Categories and Subject Descriptors (according to ACM CCS): 1.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism

78 citations

Journal ArticleDOI
TL;DR: The intended audience for the book Fiindahientals of Speech Synthesis arid Recognitioii, edited by Eric Keller, is the "whole new generation of computer scientists" who will wonder "what speech is all about" in the context of building or deploying computer-based speech technology.
Abstract: The intended audience for the book Fiindahientals of Speech Synthesis arid Recognitioii, edited by Eric Keller, is the “whole new generation of computer scientists” who will wonder “what speech is all about” in the context of building or deploying computer-based speech technology. Apart from the reader’s being a “well motivated” “computer scientist,” there are no specific prerequisites mentioned. Given the title, one would thus hope for a broad, balanced introduction to speech science and technology. This, the book is not. It does, however, contain some excellent material much of which is accessible to the lay reader in speech. The book is divided into three sections: “Background,” aimed at introducing basic speech science and technology concepts; ‘‘State of the Art,” which, as the name implies provides a window on current practical consequences of speech research and development; and “Challenges,” presenting research questions in the areas of speech production, perception, synthesis, and human-machine interaction. Because the chapters are written by several contributors, it seems most appropriate to review them individually, rather than to attempt generalizations that cover the book as a whole. After glancing through the introduction to the “Background” section and through Chapter I, “Fundamentals of Phonetic Science,” I was ready to put the bookdown permanently. These are rife with colloquialisms, nonstandard terminology, inaccuracies, and misleading material. The introduction to phonetics is primarily contained in one footnote. The discussion of speech acoustics suggests a general lack of understanding of the area. The terminology is nonstandard without having the redeeming value ofclarifying issues. For instance, all potential points of constriction of the vocal tract are termed “ports” with the “linguo-palatal port” having the same valence as the velar port. The introduction to the voicing mechanism and the accompanying illustration convey no insight regarding the self-oscillatory nature of the process. No introduction is provided for basic measurement concepts (frequency, intensity, spectnim, spectrogram). Basic processing methods are incompletely or inaccurately described: “[LPC] which is often calculated by taking a spectrum of an autocorrelation.” The only redeeming aspects of the chapter are its brevity and the references contained at the end.

77 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822