scispace - formally typeset
Open Access

Feature-fusion based audio-visual speech recognition using lip geometry features in noisy enviroment

TLDR
A feature-fusion audio-visual speech recognition system that extracts lip geometry from the mouth region using a combination of skin color filter, border following and convex hull, and classification using a Hidden Markov Model is described.
Abstract
Humans are often able to compensate for noise degradation and uncertainty in speech information by augmenting the received audio with visual information. Such bimodal perception generates a rich combination of information that can be used in the recognition of speech. However, due to wide variability in the lip movement involved in articulation, not all speech can be substantially improved by audio-visual integration. This paper describes a feature-fusion audio-visual speech recognition (AVSR) system that extracts lip geometry from the mouth region using a combination of skin color filter, border following and convex hull, and classification using a Hidden Markov Model. The comparison of the new approach with conventional audio-only system is made when operating under simulated ambient noise conditions that affect the spoken phrases. The experimental results demonstrate that, in the presence of audio noise, the audio-visual approach significantly improves speech recognition accuracy compared with audio-only approach.

read more

Citations
More filters
Journal ArticleDOI

Introduction to multivariate analysis, by C. Chatfield and A. J. Collins. Pp 246. £13 hardcover, £7·50 paperback. 1980. ISBN 0-412-16030-7/4 (Chapman and Hall)

TL;DR: In this paper, the multivariate normal distribution is used for principal component analysis and multivariate analysis of covariance and related topics, as well as multi-dimensional scaling and cluster analysis.
Journal Article

A Review of Audio-Visual Speech Recognition

TL;DR: The aim of this paper is to discuss the AVSR structures, which includes the front end processes, audio-visual data corpus used, recent works and accuracy estimation methods.
Book ChapterDOI

A comparison of model validation techniques for audio-visual speech recognition

TL;DR: Model validation techniques, namely the holdout method, leave-one-out cross validation and bootstrap validation, are implemented to validate the performance of an AVSR system as well as to provide a comparison of the performances of the validation techniques themselves.
Book ChapterDOI

A Follow-Up Survey of Audiovisual Speech Integration Strategies

TL;DR: This paper presents a review on various existing and recent techniques for AVSR, with a special emphasis on recent AVSR system fusion technique, where the AVSR systems fusion stages (early, intermediate and late integration) are discussed with their corresponding models.
Dissertation

Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis

TL;DR: According to the experimental result, it showed that by referring to the WADA-W look-up table, it is capable of performing a consistent SNR estimation with more accurate and less bias result compared to the original WADA technique under four types of noises.
References
More filters
Journal ArticleDOI

A tutorial on hidden Markov models and selected applications in speech recognition

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.
Proceedings ArticleDOI

Rapid object detection using a boosted cascade of simple features

TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.
Journal ArticleDOI

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

TL;DR: In this article, several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system, and the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations.
Journal ArticleDOI

Assessment for automatic speech recognition II: NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems

TL;DR: NoISEX-92 specifies a carefully controlled experiment on artificially noisy speech data, examining performance for a limited digit recognition task but with a relatively wide range of noises and signal-to-noise ratios.
Book

Introduction to multivariate analysis

TL;DR: In this article, the multivariate normal distribution is used for principal component analysis and multivariate analysis of covariance and related topics, as well as multi-dimensional scaling and cluster analysis.
Related Papers (5)