An analytic derivation of a phase-sensitive observation model for noise robust speech recognition

Open AccessProceedings Article

An analytic derivation of a phase-sensitive observation model for noise robust speech recognition

Volker Leutnant, +1 more

- pp 2395-2398

Chats0

TLDR

An analytic derivation of the moments of the phase factor between clean speech and noise cepstral or log-mel-spectral feature vectors is presented, leading to significant improvements in word accuracy on the AURORA2 database.

Abstract:

In this paper we present an analytic derivation of the moments of the phase factor between clean speech and noise cepstral or log-mel-spectral feature vectors. The development shows, among others, that the probability density of the phase factor is of sub-Gaussian nature and that it is independent of the noise type and the signal-to-noise ratio, however dependent on the mel filter bank index. Further we show how to compute the contribution of the phase factor to both the mean and the variance of the noisy speech observation likelihood, which relates the speech and noise feature vectors to those of noisy speech. The resulting phase-sensitive observation model is then used in model-based speech feature enhancement, leading to significant improvements in word accuracy on the AURORA2 database. Index Terms: model-based feature enhancement, phasesensitive observation model, phase factor distribution

Citations

PDF

Open Access

More filters

Journal ArticleDOI

An overview of noise-robust automatic speech recognition

Jinyu Li, +3 more

- 01 Apr 2014 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: A thorough overview of modern noise-robust techniques for ASR developed over the past 30 years is provided and methods that are proven to be successful and that are likely to sustain or expand their future applicability are emphasized.

...read moreread less

Book ChapterDOI

Front-End, Back-End, and Hybrid Techniques for Noise-Robust Speech Recognition

Li Deng

TL;DR: The Bayesian framework is used as a common thread for connecting, analyzing, and categorizing a number of popular approaches to the solutions pursued in the recent past on the problem of uncertainty handling in robust speech recognition.

...read moreread less

Book ChapterDOI

Model-Based Approaches to Handling Uncertainty

M. J. F. Gales

TL;DR: This chapter describes the underlying concepts of model-based noise compensation for robust speech recognition and how it can be applied to standard systems and considers important practical issues.

...read moreread less

Journal ArticleDOI

Phase-Aware Single-Channel Speech Enhancement With Modulation-Domain Kalman Filtering

Nikolaos Dionelis, +1 more

- 01 May 2018 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: In this paper, a modulation-domain Kalman filter was proposed to track the speech phase using circular statistics, along with the spectral log-amplitudes of speech and noise.

...read moreread less

Journal ArticleDOI

Monaural multi-talker speech recognition using factorial speech processing models

Mahdi Khademian, +1 more

- 01 Apr 2018 -

Speech Communication

TL;DR: In this article, a joint token passing algorithm was proposed for direct joint decoding of target and masker speakers' mixed-signals, which achieved 5.3% absolute task performance improvement compared to the first super-human system.

...read moreread less

References

PDF

Open Access

More filters

Book

Time Series: Data Analysis and Theory

David R. Brillinger

TL;DR: This book will be most useful to applied mathematicians, communication engineers, signal processors, statisticians, and time series researchers, both applied and theoretical.

...read moreread less

Journal ArticleDOI

Time Series: Data Analysis and Theory

W. D. Ray, +1 more

Journal ArticleDOI

Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion

Li Deng, +2 more

- 18 Apr 2005 -

IEEE Transactions on Speech and Audio Pr...

TL;DR: A new technique for dynamic, frame-by-frame compensation of the Gaussian variances in the hidden Markov model (HMM), exploiting the feature variance or uncertainty estimated during the speech feature enhancement process, to improve noise-robust speech recognition.

...read moreread less

Proceedings Article

A comparison of three non-linear observation models for noisy speech features.

Jasha Droppo, +2 more

TL;DR: It is shown that the new approximation uses half the calculation, and produces equivalent or improved word accuracy scores, when compared to previous techniques.

...read moreread less

Journal ArticleDOI

A Novel Uncertainty Decoding Rule With Applications to Transmission Error Robust Speech Recognition

Valentin Ion, +1 more

- 01 Jul 2008 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: An uncertainty decoding rule is derived for automatic speech recognition (ASR), which accounts for both corrupted observations and inter-frame correlation and shows how the clean speech posterior can be computed for communication links being characterized by either bit errors or packet loss.

...read moreread less

An analytic derivation of a phase-sensitive observation model for noise robust speech recognition

Citations

An overview of noise-robust automatic speech recognition

Front-End, Back-End, and Hybrid Techniques for Noise-Robust Speech Recognition

Model-Based Approaches to Handling Uncertainty

Phase-Aware Single-Channel Speech Enhancement With Modulation-Domain Kalman Filtering

Monaural multi-talker speech recognition using factorial speech processing models

References

Time Series: Data Analysis and Theory

Time Series: Data Analysis and Theory

Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion

A comparison of three non-linear observation models for noisy speech features.

A Novel Uncertainty Decoding Rule With Applications to Transmission Error Robust Speech Recognition

Related Papers (5)

Enhancement of log Mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise

Model-based techniques for noise robust speech recognition

ALGONQUIN: iterating laplace's method to remove multiple types of acoustic distortion for robust speech recognition.

HMM adaptation using vector taylor series for noisy speech recognition.

Speech recognition in noisy environments