scispace - formally typeset
Open AccessProceedings Article

Stream Weight Optimization of Speech and Lip Image Sequence for Audio-Visual Speech Recognition

Satoshi Nakamura, +2 more
- Vol. 3, pp 20-24
Reads0
Chats0
TLDR
ICSLP2000: the 6th International Conference on Spoken Language Processing, October 16-20, 2000, Beijing, China.
Abstract
ICSLP2000: the 6th International Conference on Spoken Language Processing, October 16-20, 2000, Beijing, China.

read more

Citations
More filters
Journal ArticleDOI

Recent advances in the automatic recognition of audiovisual speech

TL;DR: The main components of audiovisual automatic speech recognition (ASR) are reviewed and novel contributions in two main areas are presented: first, the visual front-end design, based on a cascade of linear image transforms of an appropriate video region of interest, and subsequently, audiovISual speech integration.

Audio-Visual Automatic Speech Recognition: An Overview

TL;DR: Novel, non-traditional approaches, that use orthogonal sources of information to the acoustic input, are needed to achieve ASR performance closer to the human speech perception level, and robust enough to be deployable in field applications.
Journal ArticleDOI

Audiovisual Information Fusion in Human–Computer Interfaces and Intelligent Environments: A Survey

TL;DR: The fusion strategies and the corresponding models used in audiovisual tasks such as speech recognition, tracking, biometrics, affective state recognition, and meeting scene analysis are described.
Journal ArticleDOI

User-centered modeling and evaluation of multimodal interfaces

TL;DR: This paper summarizes research on the cognitive science foundations of multimodal interaction, and on the essential role that user-centered modeling has played in prototyping, guiding, and evaluating the design of next-generation multi-modality interfaces, and describes the important role that selective methodologies and evaluation metrics have played in shaping next- generation multimodAL systems.
Journal ArticleDOI

On Dynamic Stream Weighting for Audio-Visual Speech Recognition

TL;DR: The inclusion of a voice activity detector in the weighting scheme improves speech recognition over different system architectures and confidence measures, leading to an increase in performance more relevant than any difference between the proposed confidence measures.
References
More filters
Journal ArticleDOI

Signal bias removal by maximum likelihood estimation for robust telephone speech recognition

TL;DR: The SBR method, integrated into a discrete density HMM, is applied to telephone speech recognition where the contamination due to extraneous signal components is assumed to be unknown and to enable real-time implementation, a sequential method for the estimation of the bias is presented.
Book ChapterDOI

On the Integration of Auditory and Visual Parameters in an HMM-based ASR

TL;DR: A model which can improve the performances of an audio-visual speech recognizer in an isolated word and speaker dependent situation is proposed by using a hybrid system based on two HMMs trained respectively with acoustic and optic data.
Proceedings ArticleDOI

Discriminative training of HMM stream exponents for audio-visual speech recognition

TL;DR: The use of discriminative training by means of the generalized probabilistic descent (GPB) algorithm to estimate hidden Markov model (HMM) stream exponents for audio-visual speech recognition is proposed.
Proceedings ArticleDOI

Adaptive bimodal sensor fusion for automatic speechreading

TL;DR: Different methods of combining the visual and acoustic data to improve the recognition performance of automated speech recognizers by using additional visual information are presented, achieving error reduction of up to 50%.
Proceedings ArticleDOI

Maximum likelihood weighting of dynamic speech features for CDHMM speech recognition

TL;DR: This paper proposes a method to automatically estimate an optimum state-dependent stream weighting in a continuous density hidden Markov model (CDHMM) recognition system by means of a maximum-likelihood based training algorithm.