Analysis of Correlation between Audio and Visual Speech Features for Clean Audio Feature Prediction in Noise

Open AccessProceedings Article

Analysis of Correlation between Audio and Visual Speech Features for Clean Audio Feature Prediction in Noise

Chats0

TLDR

Experiments reveal that features representing broad spectral information have higher correlation to visual features than those representing finer spectral detail.

Abstract:

The aim of this work is to examine the correlation between audio and visual speech features. The motivation is to find visual features that can provide clean audio feature estimates which can be used for speech enhancement when the original audio signal is corrupted by noise. Two audio features (MFCCs and formants) and three visual features (active appearance model, 2-D DCT and cross-DCT) are considered with correlation measured using multiple linear regression. The correlation is then exploited through the development of a maximum a posteriori (MAP) prediction of audio features solely from the visual features. Experiments reveal that features representing broad spectral information have higher correlation to visual features than those representing finer spectral detail. The accuracy of prediction follows the results found in the correlation measurements.

Citations

PDF

Open Access

More filters

Audio-Visual Speech Recognition

Yoni Bauduin

Posted Content

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation

Daniel Michelsanti, +6 more

- 21 Aug 2020 -

arXiv: Audio and Speech Processing

TL;DR: This paper provides a systematic survey of this research topic, focusing on the main elements that characterise the systems in the literature: acoustic features; visual features; deep learning methods; fusion techniques; training targets; and objective functions.

...read moreread less

The Challenge of Multispeaker Lip-Reading

Stephen Cox, +4 more

TL;DR: This paper shows the danger of not using different speakers in the trainingand test-sets and demonstrates that lip-reading visual features, when compared with the MFCCs commonly used for audio speech recognition, have inherently small variation within a single speaker across all classes spoken.

...read moreread less

Journal ArticleDOI

Visually-Derived Wiener Filters for Speech Enhancement

Ibrahim Almajai, +3 more

TL;DR: In this paper, a visually derived Wiener filter was proposed to extract clean speech and noise power spectrum statistics from visual speech features, which is used to enhance audio speech that has been contaminated by noise.

...read moreread less

Journal ArticleDOI

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation

Daniel Michelsanti, +6 more

- 17 Mar 2021 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: In this paper, the authors provide a comprehensive survey of audio-visual speech enhancement and speech separation based on deep learning, focusing on the main elements that characterise the systems in the literature: acoustic features; visual features; deep learning methods; fusion techniques; training targets and objective functions.

...read moreread less

References

PDF

Open Access

More filters

Book

Regression Analysis by Example

Samprit Chatterjee, +1 more

TL;DR: Simple linear regression Multiple linear regression Regression Diagnostics: Detection of Model Violations Qualitative Variables as Predictors Transformation of Variables Weighted Least Squares The Problem of Correlated Errors Analysis of Collinear Data Biased Estimation of Regression Coefficients Variable Selection Procedures Logistic Regression Appendix References as discussed by the authors

...read moreread less

Journal ArticleDOI

Visual contribution to speech intelligibility in noise

W. H. Sumby, +1 more

- 01 Mar 1954 -

Journal of the Acoustical Society of Ame...

TL;DR: In this article, the visual contribution to oral speech intelligibility was examined as a function of the speech-to-noise ratio and of the size of the vocabulary under test.

...read moreread less

Journal ArticleDOI

Regression Analysis by Example

Terri L. Moore

- 01 May 2001 -

Technometrics

TL;DR: This book serves well as an introduction to the speci c area of methods for detecting and correcting model violations in the standard linear regression model and provides a general overview of transformations of variables and focuses on three traditional situations where transformations can be applied.

...read moreread less

Proceedings ArticleDOI

Statistical models of appearance for medical image analysis and computer vision

Timothy F. Cootes, +1 more

TL;DR: The Active Shape Model essentially matches a model to boundaries in an image, and the Active Appearance Model finds model parameters which synthesize a complete image which is as similar as possible to the target image.

...read moreread less

Journal ArticleDOI

Quantitative association of vocal-tract and facial behavior

Hani Camille Yehia, +2 more

- 01 Oct 1998 -

Speech Communication

TL;DR: Multilinear techniques are applied to support the claims that facial motion during speech is largely a by-product of producing the speech acoustics and better estimated by the 3D motion of the face than by the midsagittalmotion of the anterior vocal-tract.

...read moreread less