scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A main effect of pseudo-homovisemy is found, suggesting that at least some deaf individuals do automatically access sublexical structure during single-word reading, and a working model of single- word reading by deaf adults based on the dual-route cascaded model of reading aloud is proposed.
Abstract: There is an ongoing debate whether deaf individuals access phonology when reading, and if so, what impact the ability to access phonology might have on reading achievement. However, the debate so far has been theoretically unspecific on two accounts: (a) the phonological units deaf individuals may have of oral language have not been specified and (b) there seem to be no explicit cognitive models specifying how phonology and other factors operate in reading by deaf individuals. We propose that deaf individuals have representations of the sublexical structure of oral-aural language which are based on mouth shapes and that these sublexical units are activated during reading by deaf individuals. We specify the sublexical units of deaf German readers as 11 "visemes" and incorporate the viseme set into a working model of single-word reading by deaf adults based on the dual-route cascaded model of reading aloud by Coltheart, Rastle, Perry, Langdon, and Ziegler (2001. DRC: A dual route cascaded model of visual word recognition and reading aloud. Psychological Review, 108, 204-256. doi: 10.1037//0033-295x.108.1.204). We assessed the indirect route of this model by investigating the "pseudo-homoviseme" effect using a lexical decision task in deaf German reading adults. We found a main effect of pseudo-homovisemy, suggesting that at least some deaf individuals do automatically access sublexical structure during single-word reading.

12 citations

Proceedings ArticleDOI
26 Sep 2010
TL;DR: The research shows that the use of viseme-dependent significance weights improves the performance of state asynchronous CHMM-based speech recognizer and, for a state synchronous MSHMMbased recognizer, fewer errors can be achieved using stationary time delays of visual data with respect to the corresponding audio signal.
Abstract: The aim of the present study is to investigate some key challenges of the audio-visual speech recognition technology, such as asynchrony modeling of multimodal speech, estimation of auditory and visual speech significance, as well as stream weight optimization. Our research shows that the use of viseme-dependent significance weights improves the performance of state asynchronous CHMM-based speech recognizer. In addition, for a state synchronous MSHMMbased recognizer, fewer errors can be achieved using stationary time delays of visual data with respect to the corresponding audio signal. Evaluation experiments showed that individual audio-visual stream weights for each visemephoneme pair lead to relative reduction of WER by 20%. Index Terms: multimodal speech, audio-visual processing, Hidden Markov Models, asynchrony, significance weights

12 citations

Proceedings ArticleDOI
01 Dec 2008
TL;DR: This work extends and improves a recently introduced dynamic Bayesian network based audio-visual automatic speech recognition (AV-ASR) system to model the audio and visual streams as being composed of separate, yet related, sub-word units.
Abstract: This work extends and improves a recently introduced (Dec. 2007) dynamic Bayesian network (DBN) based audio-visual automatic speech recognition (AV-ASR) system. That system models the audio and visual components of speech as being composed of the same sub-word units when, in fact, this is not psycholinguistically true. We extend the system to model the audio and visual streams as being composed of separate, yet related, sub-word units. We also introduce a novel stream weighting structure incorporated into the model itself. In doing so, our system makes improvements in word error rate (WER) and overall recognition accuracy in a large vocabulary continuous speech recognition task (LVCSR). The ldquobestrdquo performing proposed system attains a WER of 66.71%whereas the ldquobestrdquo baseline system performs at a WER of 64.30%. The proposed system also improves accuracy to 45.95% from 39.40%.

12 citations

Proceedings ArticleDOI
01 Oct 2004
TL;DR: The re-synthesize speech from speech recognition features to gain more insight in the preprocessing, and is able to pin-point some deficiencies in the current preprocessing scheme.
Abstract: The merits of different signal preprocessing schemes for speech recognizers are usually assessed purely on the basis of the resulting recognition accuracy. Such benchmarks give a good indication as to whether one preprocessing is better than another, but little knowledge is acquired about why it is better or how it could be further improved. In order to gain more insight in the preprocessing, we seek to re-synthesize speech from speech recognition features. This way, we are able to pin-point some deficiencies in our current preprocessing scheme. Additional analysis of successful new preprocessing schemes may allow us one day to identify precisely those properties that are desirable in a feature set. Next to these purely scientific aims, the re-synthesis of speech from recognition features is of interest to thin-client speech applications, and as an alternative to the classical LPC source-filter model for speech manipulation.

12 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822