Open AccessProceedings Article
Stream Weight Optimization of Speech and Lip Image Sequence for Audio-Visual Speech Recognition
Satoshi Nakamura,Hidetoshi Ito,Kiyohiro Shikano +2 more
- Vol. 3, pp 20-24
Reads0
Chats0
TLDR
ICSLP2000: the 6th International Conference on Spoken Language Processing, October 16-20, 2000, Beijing, China.Abstract:
ICSLP2000: the 6th International Conference on Spoken Language Processing, October 16-20, 2000, Beijing, China.read more
Citations
More filters
Journal ArticleDOI
Recent advances in the automatic recognition of audiovisual speech
TL;DR: The main components of audiovisual automatic speech recognition (ASR) are reviewed and novel contributions in two main areas are presented: first, the visual front-end design, based on a cascade of linear image transforms of an appropriate video region of interest, and subsequently, audiovISual speech integration.
Audio-Visual Automatic Speech Recognition: An Overview
TL;DR: Novel, non-traditional approaches, that use orthogonal sources of information to the acoustic input, are needed to achieve ASR performance closer to the human speech perception level, and robust enough to be deployable in field applications.
Journal ArticleDOI
Audiovisual Information Fusion in Human–Computer Interfaces and Intelligent Environments: A Survey
TL;DR: The fusion strategies and the corresponding models used in audiovisual tasks such as speech recognition, tracking, biometrics, affective state recognition, and meeting scene analysis are described.
Journal ArticleDOI
User-centered modeling and evaluation of multimodal interfaces
TL;DR: This paper summarizes research on the cognitive science foundations of multimodal interaction, and on the essential role that user-centered modeling has played in prototyping, guiding, and evaluating the design of next-generation multi-modality interfaces, and describes the important role that selective methodologies and evaluation metrics have played in shaping next- generation multimodAL systems.
Journal ArticleDOI
On Dynamic Stream Weighting for Audio-Visual Speech Recognition
TL;DR: The inclusion of a voice activity detector in the weighting scheme improves speech recognition over different system architectures and confidence measures, leading to an increase in performance more relevant than any difference between the proposed confidence measures.
References
More filters
Journal ArticleDOI
Signal bias removal by maximum likelihood estimation for robust telephone speech recognition
Biing-Hwang Juang,M. Rahim +1 more
TL;DR: The SBR method, integrated into a discrete density HMM, is applied to telephone speech recognition where the contamination due to extraneous signal components is assumed to be unknown and to enable real-time implementation, a sequential method for the estimation of the bias is presented.
Book ChapterDOI
On the Integration of Auditory and Visual Parameters in an HMM-based ASR
Ali Adjoudani,Christian Benoît +1 more
TL;DR: A model which can improve the performances of an audio-visual speech recognizer in an isolated word and speaker dependent situation is proposed by using a hybrid system based on two HMMs trained respectively with acoustic and optic data.
Proceedings ArticleDOI
Discriminative training of HMM stream exponents for audio-visual speech recognition
TL;DR: The use of discriminative training by means of the generalized probabilistic descent (GPB) algorithm to estimate hidden Markov model (HMM) stream exponents for audio-visual speech recognition is proposed.
Proceedings ArticleDOI
Adaptive bimodal sensor fusion for automatic speechreading
TL;DR: Different methods of combining the visual and acoustic data to improve the recognition performance of automated speech recognizers by using additional visual information are presented, achieving error reduction of up to 50%.
Proceedings ArticleDOI
Maximum likelihood weighting of dynamic speech features for CDHMM speech recognition
TL;DR: This paper proposes a method to automatically estimate an optimum state-dependent stream weighting in a continuous density hidden Markov model (CDHMM) recognition system by means of a maximum-likelihood based training algorithm.
Related Papers (5)
Audio-visual speech modeling for continuous speech recognition
Stéphane Dupont,Juergen Luettin +1 more
On the Integration of Auditory and Visual Parameters in an HMM-based ASR
Ali Adjoudani,Christian Benoît +1 more