Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A Dual-Route Cascaded Model of Reading by Deaf Adults: Evidence for Grapheme to Viseme Conversion

[...]

Eeva A. Elliott¹, Mario Braun¹, Michael Kuhlmann¹, Arthur M. Jacobs¹•Institutions (1)

Free University of Berlin¹

01 Apr 2012-Journal of Deaf Studies and Deaf Education

TL;DR: A main effect of pseudo-homovisemy is found, suggesting that at least some deaf individuals do automatically access sublexical structure during single-word reading, and a working model of single- word reading by deaf adults based on the dual-route cascaded model of reading aloud is proposed.

...read moreread less

Abstract: There is an ongoing debate whether deaf individuals access phonology when reading, and if so, what impact the ability to access phonology might have on reading achievement. However, the debate so far has been theoretically unspecific on two accounts: (a) the phonological units deaf individuals may have of oral language have not been specified and (b) there seem to be no explicit cognitive models specifying how phonology and other factors operate in reading by deaf individuals. We propose that deaf individuals have representations of the sublexical structure of oral-aural language which are based on mouth shapes and that these sublexical units are activated during reading by deaf individuals. We specify the sublexical units of deaf German readers as 11 "visemes" and incorporate the viseme set into a working model of single-word reading by deaf adults based on the dual-route cascaded model of reading aloud by Coltheart, Rastle, Perry, Langdon, and Ziegler (2001. DRC: A dual route cascaded model of visual word recognition and reading aloud. Psychological Review, 108, 204-256. doi: 10.1037//0033-295x.108.1.204). We assessed the indirect route of this model by investigating the "pseudo-homoviseme" effect using a lexical decision task in deaf German reading adults. We found a main effect of pseudo-homovisemy, suggesting that at least some deaf individuals do automatically access sublexical structure during single-word reading.

...read moreread less

12 citations

Proceedings Article•DOI•

Viseme-Dependent Weight Optimization for CHMM-Based Audio-Visual Speech Recognition

[...]

Alexey Karpov¹, Andrey Ronzhin¹, Konstantin Markov², Milos Zelezný³•Institutions (3)

Russian Academy of Sciences¹, University of Aizu², University of West Bohemia³

26 Sep 2010

TL;DR: The research shows that the use of viseme-dependent significance weights improves the performance of state asynchronous CHMM-based speech recognizer and, for a state synchronous MSHMMbased recognizer, fewer errors can be achieved using stationary time delays of visual data with respect to the corresponding audio signal.

...read moreread less

Abstract: The aim of the present study is to investigate some key challenges of the audio-visual speech recognition technology, such as asynchrony modeling of multimodal speech, estimation of auditory and visual speech significance, as well as stream weight optimization. Our research shows that the use of viseme-dependent significance weights improves the performance of state asynchronous CHMM-based speech recognizer. In addition, for a state synchronous MSHMMbased recognizer, fewer errors can be achieved using stationary time delays of visual data with respect to the corresponding audio signal. Evaluation experiments showed that individual audio-visual stream weights for each visemephoneme pair lead to relative reduction of WER by 20%. Index Terms: multimodal speech, audio-visual processing, Hidden Markov Models, asynchrony, significance weights

...read moreread less

12 citations

Proceedings Article•DOI•

A phone-viseme dynamic Bayesian network for audio-visual automatic speech recognition

[...]

Louis H. Terry¹, Aggelos K. Katsaggelos¹•Institutions (1)

Northwestern University¹

01 Dec 2008

TL;DR: This work extends and improves a recently introduced dynamic Bayesian network based audio-visual automatic speech recognition (AV-ASR) system to model the audio and visual streams as being composed of separate, yet related, sub-word units.

...read moreread less

Abstract: This work extends and improves a recently introduced (Dec. 2007) dynamic Bayesian network (DBN) based audio-visual automatic speech recognition (AV-ASR) system. That system models the audio and visual components of speech as being composed of the same sub-word units when, in fact, this is not psycholinguistically true. We extend the system to model the audio and visual streams as being composed of separate, yet related, sub-word units. We also introduce a novel stream weighting structure incorporated into the model itself. In doing so, our system makes improvements in word error rate (WER) and overall recognition accuracy in a large vocabulary continuous speech recognition task (LVCSR). The ldquobestrdquo performing proposed system attains a WER of 66.71%whereas the ldquobestrdquo baseline system performs at a WER of 64.30%. The proposed system also improves accuracy to 45.95% from 39.40%.

...read moreread less

12 citations

Proceedings Article•DOI•

Synthesizing speech from speech recognition parameters

[...]

Kris Demuynck¹, Oscar Garcia¹, Dirk Van Compernolle¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Oct 2004

TL;DR: The re-synthesize speech from speech recognition features to gain more insight in the preprocessing, and is able to pin-point some deficiencies in the current preprocessing scheme.

...read moreread less

Abstract: The merits of different signal preprocessing schemes for speech recognizers are usually assessed purely on the basis of the resulting recognition accuracy. Such benchmarks give a good indication as to whether one preprocessing is better than another, but little knowledge is acquired about why it is better or how it could be further improved. In order to gain more insight in the preprocessing, we seek to re-synthesize speech from speech recognition features. This way, we are able to pin-point some deficiencies in our current preprocessing scheme. Additional analysis of successful new preprocessing schemes may allow us one day to identify precisely those properties that are desirable in a feature set. Next to these purely scientific aims, the re-synthesis of speech from recognition features is of interest to thin-client speech applications, and as an alternative to the classical LPC source-filter model for speech manipulation.

...read moreread less

12 citations

Interpreted multi-state lip models for audio-visual speech recognition.

[...]

Michael Vogt

01 Jan 1997

12 citations

Collapse

Network Information

Performance

Metrics

884

Papers

19,235

Citations

No. of papers in the topic in previous years
Year	Papers
2023	7
2022	12
2021	13
2020	39
2019	19
2018	22

Viseme

Papers published on a yearly basis

Papers

Trending Questions (8)

Network Information

Related Topics (5)

Performance

Metrics