scispace - formally typeset
Open AccessProceedings Article

An analytic derivation of a phase-sensitive observation model for noise robust speech recognition

Reads0
Chats0
TLDR
An analytic derivation of the moments of the phase factor between clean speech and noise cepstral or log-mel-spectral feature vectors is presented, leading to significant improvements in word accuracy on the AURORA2 database.
Abstract
In this paper we present an analytic derivation of the moments of the phase factor between clean speech and noise cepstral or log-mel-spectral feature vectors. The development shows, among others, that the probability density of the phase factor is of sub-Gaussian nature and that it is independent of the noise type and the signal-to-noise ratio, however dependent on the mel filter bank index. Further we show how to compute the contribution of the phase factor to both the mean and the variance of the noisy speech observation likelihood, which relates the speech and noise feature vectors to those of noisy speech. The resulting phase-sensitive observation model is then used in model-based speech feature enhancement, leading to significant improvements in word accuracy on the AURORA2 database. Index Terms: model-based feature enhancement, phasesensitive observation model, phase factor distribution

read more

Citations
More filters
Journal ArticleDOI

An overview of noise-robust automatic speech recognition

TL;DR: A thorough overview of modern noise-robust techniques for ASR developed over the past 30 years is provided and methods that are proven to be successful and that are likely to sustain or expand their future applicability are emphasized.
Book ChapterDOI

Front-End, Back-End, and Hybrid Techniques for Noise-Robust Speech Recognition

TL;DR: The Bayesian framework is used as a common thread for connecting, analyzing, and categorizing a number of popular approaches to the solutions pursued in the recent past on the problem of uncertainty handling in robust speech recognition.
Book ChapterDOI

Model-Based Approaches to Handling Uncertainty

TL;DR: This chapter describes the underlying concepts of model-based noise compensation for robust speech recognition and how it can be applied to standard systems and considers important practical issues.
Journal ArticleDOI

Phase-Aware Single-Channel Speech Enhancement With Modulation-Domain Kalman Filtering

TL;DR: In this paper, a modulation-domain Kalman filter was proposed to track the speech phase using circular statistics, along with the spectral log-amplitudes of speech and noise.
Journal ArticleDOI

Monaural multi-talker speech recognition using factorial speech processing models

TL;DR: In this article, a joint token passing algorithm was proposed for direct joint decoding of target and masker speakers' mixed-signals, which achieved 5.3% absolute task performance improvement compared to the first super-human system.
References
More filters
Book

Time Series: Data Analysis and Theory

TL;DR: This book will be most useful to applied mathematicians, communication engineers, signal processors, statisticians, and time series researchers, both applied and theoretical.
Journal ArticleDOI

Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion

TL;DR: A new technique for dynamic, frame-by-frame compensation of the Gaussian variances in the hidden Markov model (HMM), exploiting the feature variance or uncertainty estimated during the speech feature enhancement process, to improve noise-robust speech recognition.
Proceedings Article

A comparison of three non-linear observation models for noisy speech features.

TL;DR: It is shown that the new approximation uses half the calculation, and produces equivalent or improved word accuracy scores, when compared to previous techniques.
Journal ArticleDOI

A Novel Uncertainty Decoding Rule With Applications to Transmission Error Robust Speech Recognition

TL;DR: An uncertainty decoding rule is derived for automatic speech recognition (ASR), which accounts for both corrupted observations and inter-frame correlation and shows how the clean speech posterior can be computed for communication links being characterized by either bit errors or packet loss.