Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization)

doi:10.1109/SAM.2002.1191001

Proceedings ArticleDOI

Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization)

- pp 68-71

TLDR

In this article, a non-linear enhancement technique called audio-visual codebook dependent cepstral normalization (AVCDCN) was proposed for both audio-only and audio-Visual speech recognition.

Abstract:

We introduce a non-linear enhancement technique called audio-visual codebook dependent cepstral normalization (AVCDCN) and we consider its use with both audio-only and audio-visual speech recognition. AVCDCN is inspired from CDCN, an audio-only enhancement technique that approximates the nonlinear effect of noise on speech with a piecewise constant function. Our experiments show that the use of visual information in AVCDCN allows significant performance gains over CDCN.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Recent advances in the automatic recognition of audiovisual speech

Gerasimos Potamianos, +4 more

TL;DR: The main components of audiovisual automatic speech recognition (ASR) are reviewed and novel contributions in two main areas are presented: first, the visual front-end design, based on a cascade of linear image transforms of an appropriate video region of interest, and subsequently, audiovISual speech integration.

...read moreread less

Audio-Visual Automatic Speech Recognition: An Overview

Gerasimos Potamianos, +3 more

TL;DR: Novel, non-traditional approaches, that use orthogonal sources of information to the acoustic input, are needed to achieve ASR performance closer to the human speech perception level, and robust enough to be deployable in field applications.

...read moreread less

Proceedings ArticleDOI

Pixels that sound

E. Kidron, +2 more

TL;DR: This work presents a stable and robust algorithm which grasps dynamic audio-visual events with high spatial resolution, and derives a unique solution based on canonical correlation analysis (CCA), which effectively detects pixels that are associated with the sound, while filtering out other dynamic pixels.

...read moreread less

Journal ArticleDOI

Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures

Bertrand Rivet, +2 more

- 01 Jan 2007 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: A novel algorithm plugging audiovisual coherence of speech signals, estimated by statistical tools, on audio blind source separation (BSS) techniques is presented, applied to the difficult and realistic case of convolutive mixtures.

...read moreread less

Journal ArticleDOI

Blind Audiovisual Source Separation Based on Sparse Redundant Representations

Anna Llagostera Casanovas, +3 more

- 01 Aug 2010 -

IEEE Transactions on Multimedia

TL;DR: A novel method is proposed which exploits the correlation between the video signal captured with a camera and a synchronously recorded one-microphone audio track to detect and separate audiovisual sources present in a scene.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Environmental robustness in automatic speech recognition

Alejandro Acero, +1 more

TL;DR: Initial efforts to make Sphinx, a continuous-speech speaker-independent recognition system, robust to changes in the environment are reported, and two novel methods based on additive corrections in the cepstral domain are proposed.

...read moreread less

Audio-visual speech recognition

Chalapathy Neti, +7 more

TL;DR: Speech Reference EPFL-CONF-82637 is presented, which describes the development of a framework for future generations of interpreters to understand and respond toaudible language barriers.

...read moreread less

Proceedings ArticleDOI

High-performance robust speech recognition using stereo training data

Li Deng, +4 more

TL;DR: A novel technique of SPLICE (Stereo-based Piecewise Linear Compensation for Environments) for high performance robust speech recognition is described, an efficient noise reduction and channel distortion compensation technique that makes effective use of stereo training data.

...read moreread less

Journal ArticleDOI

Audio-visual enhancement of speech in noise.

Laurent Girin, +2 more

- 06 Jun 2001 -

Journal of the Acoustical Society of Ame...

TL;DR: An audio-visual approach to the problem of speech enhancement in noise is considered, since it has been demonstrated in several studies that viewing the speaker's face improves message intelligibility, especially in noisy environments.

...read moreread less

Proceedings ArticleDOI

Hierarchical discriminant features for audio-visual LVCSR

Gerasimos Potamianos, +2 more

TL;DR: Experiments demonstrate that the proposed feature fusion method improves speaker-independent, large vocabulary, continuous speech recognition (LVCSR) for both clean and noisy audio conditions considered.

...read moreread less