Stream Weight Optimization of Speech and Lip Image Sequence for Audio-Visual Speech Recognition

Open AccessProceedings Article

Stream Weight Optimization of Speech and Lip Image Sequence for Audio-Visual Speech Recognition

Satoshi Nakamura, +2 more

- Vol. 3, pp 20-24

Chats0

TLDR

ICSLP2000: the 6th International Conference on Spoken Language Processing, October 16-20, 2000, Beijing, China.

Abstract:

ICSLP2000: the 6th International Conference on Spoken Language Processing, October 16-20, 2000, Beijing, China.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Recent advances in the automatic recognition of audiovisual speech

Gerasimos Potamianos, +4 more

TL;DR: The main components of audiovisual automatic speech recognition (ASR) are reviewed and novel contributions in two main areas are presented: first, the visual front-end design, based on a cascade of linear image transforms of an appropriate video region of interest, and subsequently, audiovISual speech integration.

...read moreread less

Audio-Visual Automatic Speech Recognition: An Overview

Gerasimos Potamianos, +3 more

TL;DR: Novel, non-traditional approaches, that use orthogonal sources of information to the acoustic input, are needed to achieve ASR performance closer to the human speech perception level, and robust enough to be deployable in field applications.

...read moreread less

Journal ArticleDOI

Audiovisual Information Fusion in Human–Computer Interfaces and Intelligent Environments: A Survey

Shankar T. Shivappa, +2 more

TL;DR: The fusion strategies and the corresponding models used in audiovisual tasks such as speech recognition, tracking, biometrics, affective state recognition, and meeting scene analysis are described.

...read moreread less

Journal ArticleDOI

User-centered modeling and evaluation of multimodal interfaces

Sharon Oviatt

TL;DR: This paper summarizes research on the cognitive science foundations of multimodal interaction, and on the essential role that user-centered modeling has played in prototyping, guiding, and evaluating the design of next-generation multi-modality interfaces, and describes the important role that selective methodologies and evaluation metrics have played in shaping next- generation multimodAL systems.

...read moreread less

Journal ArticleDOI

On Dynamic Stream Weighting for Audio-Visual Speech Recognition

Virginia Estellers, +2 more

- 01 May 2012 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: The inclusion of a voice activity detector in the weighting scheme improves speech recognition over different system architectures and confidence measures, leading to an increase in performance more relevant than any difference between the proposed confidence measures.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Signal bias removal by maximum likelihood estimation for robust telephone speech recognition

Biing-Hwang Juang, +1 more

- 01 Jan 1996 -

IEEE Transactions on Speech and Audio Pr...

TL;DR: The SBR method, integrated into a discrete density HMM, is applied to telephone speech recognition where the contamination due to extraneous signal components is assumed to be unknown and to enable real-time implementation, a sequential method for the estimation of the bias is presented.

...read moreread less

Book ChapterDOI

On the Integration of Auditory and Visual Parameters in an HMM-based ASR

Ali Adjoudani, +1 more

TL;DR: A model which can improve the performances of an audio-visual speech recognizer in an isolated word and speaker dependent situation is proposed by using a hybrid system based on two HMMs trained respectively with acoustic and optic data.

...read moreread less

Proceedings ArticleDOI

Discriminative training of HMM stream exponents for audio-visual speech recognition

Gerasimos Potamianos, +1 more

TL;DR: The use of discriminative training by means of the generalized probabilistic descent (GPB) algorithm to estimate hidden Markov model (HMM) stream exponents for audio-visual speech recognition is proposed.

...read moreread less

Proceedings ArticleDOI

Adaptive bimodal sensor fusion for automatic speechreading

Uwe Meier, +2 more

TL;DR: Different methods of combining the visual and acoustic data to improve the recognition performance of automated speech recognizers by using additional visual information are presented, achieving error reduction of up to 50%.

...read moreread less

Proceedings ArticleDOI

Maximum likelihood weighting of dynamic speech features for CDHMM speech recognition

Javier Hernando

TL;DR: This paper proposes a method to automatically estimate an optimum state-dependent stream weighting in a continuous density hidden Markov model (CDHMM) recognition system by means of a maximum-likelihood based training algorithm.

...read moreread less

Stream Weight Optimization of Speech and Lip Image Sequence for Audio-Visual Speech Recognition

Citations

Recent advances in the automatic recognition of audiovisual speech

Audio-Visual Automatic Speech Recognition: An Overview

Audiovisual Information Fusion in Human–Computer Interfaces and Intelligent Environments: A Survey

User-centered modeling and evaluation of multimodal interfaces

On Dynamic Stream Weighting for Audio-Visual Speech Recognition

References

Signal bias removal by maximum likelihood estimation for robust telephone speech recognition

On the Integration of Auditory and Visual Parameters in an HMM-based ASR

Discriminative training of HMM stream exponents for audio-visual speech recognition

Adaptive bimodal sensor fusion for automatic speechreading

Maximum likelihood weighting of dynamic speech features for CDHMM speech recognition

Related Papers (5)

Audio-visual speech modeling for continuous speech recognition

Recent advances in the automatic recognition of audiovisual speech

On the Integration of Auditory and Visual Parameters in an HMM-based ASR

Hearing lips and seeing voices

Automatic lipreading to enhance speech recognition (speech reading)