Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Continuous optical automatic speech recognition by lipreading

[...]

A.J. Goldschen¹, Oscar N. Garcia¹, Eric D. Petajan•Institutions (1)

George Washington University¹

31 Oct 1994

TL;DR: A continuous optical automatic speech recognizer that uses optical information from the oral-cavity shadow of a speaker that achieves a 25.3 percent recognition on sentences having a perplexity of 150 without using any syntactic, semantic, acoustic, or contextual guides is described.

...read moreread less

Abstract: We describe a continuous optical automatic speech recognizer (OASR) that uses optical information from the oral-cavity shadow of a speaker. The system achieves a 25.3 percent recognition on sentences having a perplexity of 150 without using any syntactic, semantic, acoustic, or contextual guides. We introduce 13, mostly dynamic, oral-cavity features used for optical recognition, present phones that appear optically similar (visemes) for our speaker, and present the recognition results for our hidden Markov models (HMMs) using visemes, trisemes, and generalized trisemes. We conclude that future research is warranted for optical recognition, especially when combined with other input modalities. >

...read moreread less

80 citations

Proceedings Article•DOI•

Text-to-visual speech synthesis based on parameter generation from HMM

[...]

Takashi Masuko¹, Takao Kobayashi¹, Masatsune Tamura¹, J. Masubuchi¹, Keiichi Tokuda² - Show less +1 more•Institutions (2)

Tokyo Institute of Technology¹, Nagoya Institute of Technology²

12 May 1998

TL;DR: The proposed technique can generate synchronized lip movements with speech in a unified framework and coarticulation is implicitly incorporated into the generated mouth shapes, so that synthetic lip motion becomes smooth and realistic.

...read moreread less

Abstract: This paper presents a new technique for synthesizing visual speech from arbitrarily given text. The technique is based on an algorithm for parameter generation from HMM with dynamic features, which has been successfully applied to text-to-speech synthesis. In the training phase, syllable HMMs are trained with visual speech parameter sequences that represent lip movements. In the synthesis phase, a sentence HMM is constructed by concatenating syllable HMMs corresponding to the phonetic transcription for the input text. Then an optimum visual speech parameter sequence is generated from the sentence HMM in an ML sense. The proposed technique can generate synchronized lip movements with speech in a unified framework. Furthermore, coarticulation is implicitly incorporated into the generated mouth shapes. As a result, synthetic lip motion becomes smooth and realistic.

...read moreread less

79 citations

Journal Article•DOI•

Visyllable Based Speech Animation

[...]

Sumedha Kshirsagar¹, Nadia Magnenat-Thalmann¹•Institutions (1)

University of Geneva¹

01 Sep 2003-Computer Graphics Forum

TL;DR: This paper takes a novel approach for speech animation — using visyllables, the visual counterpart of syllables, into a concatenative visyllable based speech animation system that is easy to implement, effective for real‐time as well as non real-time applications and results into realistic speech animation.

...read moreread less

Abstract: Visemes are visual counterpart of phonemes. Traditionally, the speech animation of 3D synthetic faces involvesextraction of visemes from input speech followed by the application of co-articulation rules to generate realisticanimation. In this paper, we take a novel approach for speech animation — using visyllables, the visual counterpartof syllables. The approach results into a concatenative visyllable based speech animation system. The key contributionof this paper lies in two main areas. Firstly, we define a set of visyllable units for spoken English along withthe associated phonological rules for valid syllables. Based on these rules, we have implemented a syllabificationalgorithm that allows segmentation of a given phoneme stream into syllables and subsequently visyllables. Secondly,we have recorded the database of visyllables using a facial motion capture system. The recorded visyllableunits are post-processed semi-automatically to ensure continuity at the vowel boundaries of the visyllables. We defineeach visyllable in terms of the Facial Movement Parameters (FMP). The FMPs are obtained as a result of thestatistical analysis of the facial motion capture data. The FMPs allow a compact representation of the visyllables.Further, the FMPs also facilitate the formulation of rules for boundary matching and smoothing after concatenatingthe visyllables units. Ours is the first visyllable based speech animation system. The proposed technique iseasy to implement, effective for real-time as well as non real-time applications and results into realistic speechanimation. Categories and Subject Descriptors (according to ACM CCS): 1.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism

...read moreread less

78 citations

Journal Article•DOI•

Fundamentals of Speech Synthesis and Speech Recognition

[...]

David Talkin

01 Jan 1996-Language and Speech

TL;DR: The intended audience for the book Fiindahientals of Speech Synthesis arid Recognitioii, edited by Eric Keller, is the "whole new generation of computer scientists" who will wonder "what speech is all about" in the context of building or deploying computer-based speech technology.

...read moreread less

Abstract: The intended audience for the book Fiindahientals of Speech Synthesis arid Recognitioii, edited by Eric Keller, is the “whole new generation of computer scientists” who will wonder “what speech is all about” in the context of building or deploying computer-based speech technology. Apart from the reader’s being a “well motivated” “computer scientist,” there are no specific prerequisites mentioned. Given the title, one would thus hope for a broad, balanced introduction to speech science and technology. This, the book is not. It does, however, contain some excellent material much of which is accessible to the lay reader in speech. The book is divided into three sections: “Background,” aimed at introducing basic speech science and technology concepts; ‘‘State of the Art,” which, as the name implies provides a window on current practical consequences of speech research and development; and “Challenges,” presenting research questions in the areas of speech production, perception, synthesis, and human-machine interaction. Because the chapters are written by several contributors, it seems most appropriate to review them individually, rather than to attempt generalizations that cover the book as a whole. After glancing through the introduction to the “Background” section and through Chapter I, “Fundamentals of Phonetic Science,” I was ready to put the bookdown permanently. These are rife with colloquialisms, nonstandard terminology, inaccuracies, and misleading material. The introduction to phonetics is primarily contained in one footnote. The discussion of speech acoustics suggests a general lack of understanding of the area. The terminology is nonstandard without having the redeeming value ofclarifying issues. For instance, all potential points of constriction of the vocal tract are termed “ports” with the “linguo-palatal port” having the same valence as the velar port. The introduction to the voicing mechanism and the accompanying illustration convey no insight regarding the self-oscillatory nature of the process. No introduction is provided for basic measurement concepts (frequency, intensity, spectnim, spectrogram). Basic processing methods are incompletely or inaccurately described: “[LPC] which is often calculated by taking a spectrum of an autocorrelation.” The only redeeming aspects of the chapter are its brevity and the references contained at the end.

...read moreread less

77 citations

Book•

How to Build a Speech Recognition Application

[...]

Bruce Balentine, David P. Morgan

01 Apr 1999

75 citations

Collapse

Network Information

Performance

Metrics

884

Papers

19,235

Citations

No. of papers in the topic in previous years
Year	Papers
2023	7
2022	12
2021	13
2020	39
2019	19
2018	22

Viseme

Papers published on a yearly basis

Papers

Trending Questions (8)

Network Information

Related Topics (5)

Performance

Metrics