Feature-fusion based audio-visual speech recognition using lip geometry features in noisy enviroment

Open Access

Feature-fusion based audio-visual speech recognition using lip geometry features in noisy enviroment

TLDR

A feature-fusion audio-visual speech recognition system that extracts lip geometry from the mouth region using a combination of skin color filter, border following and convex hull, and classification using a Hidden Markov Model is described.

Abstract:

Humans are often able to compensate for noise degradation and uncertainty in speech information by augmenting the received audio with visual information. Such bimodal perception generates a rich combination of information that can be used in the recognition of speech. However, due to wide variability in the lip movement involved in articulation, not all speech can be substantially improved by audio-visual integration. This paper describes a feature-fusion audio-visual speech recognition (AVSR) system that extracts lip geometry from the mouth region using a combination of skin color filter, border following and convex hull, and classification using a Hidden Markov Model. The comparison of the new approach with conventional audio-only system is made when operating under simulated ambient noise conditions that affect the spoken phrases. The experimental results demonstrate that, in the presence of audio noise, the audio-visual approach significantly improves speech recognition accuracy compared with audio-only approach.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Introduction to multivariate analysis, by C. Chatfield and A. J. Collins. Pp 246. £13 hardcover, £7·50 paperback. 1980. ISBN 0-412-16030-7/4 (Chapman and Hall)

Dennis Cooke

- 01 Dec 1981 -

The Mathematical Gazette

TL;DR: In this paper, the multivariate normal distribution is used for principal component analysis and multivariate analysis of covariance and related topics, as well as multi-dimensional scaling and cluster analysis.

...read moreread less

Journal Article

A Review of Audio-Visual Speech Recognition

Thum Wei Seong, +1 more

- 29 Jan 2018 -

Journal of Telecommunication, Electronic...

TL;DR: The aim of this paper is to discuss the AVSR structures, which includes the front end processes, audio-visual data corpus used, recent works and accuracy estimation methods.

...read moreread less

Book ChapterDOI

A comparison of model validation techniques for audio-visual speech recognition

Thum Wei Seong, +3 more

TL;DR: Model validation techniques, namely the holdout method, leave-one-out cross validation and bootstrap validation, are implemented to validate the performance of an AVSR system as well as to provide a comparison of the performances of the validation techniques themselves.

...read moreread less

Book ChapterDOI

A Follow-Up Survey of Audiovisual Speech Integration Strategies

Ilham Addarrazi, +2 more

TL;DR: This paper presents a review on various existing and recent techniques for AVSR, with a special emphasis on recent AVSR system fusion technique, where the AVSR systems fusion stages (early, intermediate and late integration) are discussed with their corresponding models.

...read moreread less

Dissertation

Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis

Wei Seong Thum

TL;DR: According to the experimental result, it showed that by referring to the WADA-W look-up table, it is capable of performing a consistent SNR estimation with more accurate and less bias result compared to the original WADA technique under four types of noises.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

A tutorial on hidden Markov models and selected applications in speech recognition

Lawrence R. Rabiner

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.

...read moreread less

Proceedings ArticleDOI

Rapid object detection using a boosted cascade of simple features

Paul A. Viola, +1 more

TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.

...read moreread less

Journal ArticleDOI

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

S. Davis, +1 more

- 01 Aug 1980 -

IEEE Transactions on Acoustics, Speech, ...

TL;DR: In this article, several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system, and the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations.

...read moreread less

Journal ArticleDOI

Assessment for automatic speech recognition II: NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems

Andrew Varga, +1 more

- 01 Jul 1993 -

Speech Communication

TL;DR: NoISEX-92 specifies a carefully controlled experiment on artificially noisy speech data, examining performance for a limited digit recognition task but with a relatively wide range of noises and signal-to-noise ratios.

...read moreread less

Book

Introduction to multivariate analysis

Chris Chatfield, +1 more

TL;DR: In this article, the multivariate normal distribution is used for principal component analysis and multivariate analysis of covariance and related topics, as well as multi-dimensional scaling and cluster analysis.

...read moreread less

Journal on Multimodal User Interfaces

Audio-visual speech recognition using lip information extracted from side-face images

Koji Iwano, +3 more

- 01 Jan 2007 -

Eurasip Journal on Audio, Speech, and Mu...

Method of Speech Recognition and Speaker Identification using Audio-Visual of Polish Speech and Hidden Markov Models

Mariusz Kubanek

Feature-fusion based audio-visual speech recognition using lip geometry features in noisy enviroment

Citations

Introduction to multivariate analysis, by C. Chatfield and A. J. Collins. Pp 246. £13 hardcover, £7·50 paperback. 1980. ISBN 0-412-16030-7/4 (Chapman and Hall)

A Review of Audio-Visual Speech Recognition

A comparison of model validation techniques for audio-visual speech recognition

A Follow-Up Survey of Audiovisual Speech Integration Strategies

Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis

References

A tutorial on hidden Markov models and selected applications in speech recognition

Rapid object detection using a boosted cascade of simple features

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

Assessment for automatic speech recognition II: NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems

Introduction to multivariate analysis

Related Papers (5)

A lip geometry approach for feature-fusion based audio-visual speech recognition

Analysis of lip geometric features for audio-visual speech recognition

Comparison between different feature extraction techniques for audio-visual speech recognition

Audio-visual speech recognition using lip information extracted from side-face images

Method of Speech Recognition and Speaker Identification using Audio-Visual of Polish Speech and Hidden Markov Models