Open AccessProceedings Article
An analytic derivation of a phase-sensitive observation model for noise robust speech recognition
Volker Leutnant,Reinhold Haeb-Umbach +1 more
- pp 2395-2398
Reads0
Chats0
TLDR
An analytic derivation of the moments of the phase factor between clean speech and noise cepstral or log-mel-spectral feature vectors is presented, leading to significant improvements in word accuracy on the AURORA2 database.Abstract:
In this paper we present an analytic derivation of the moments of the phase factor between clean speech and noise cepstral or log-mel-spectral feature vectors. The development shows, among others, that the probability density of the phase factor is of sub-Gaussian nature and that it is independent of the noise type and the signal-to-noise ratio, however dependent on the mel filter bank index. Further we show how to compute the contribution of the phase factor to both the mean and the variance of the noisy speech observation likelihood, which relates the speech and noise feature vectors to those of noisy speech. The resulting phase-sensitive observation model is then used in model-based speech feature enhancement, leading to significant improvements in word accuracy on the AURORA2 database. Index Terms: model-based feature enhancement, phasesensitive observation model, phase factor distributionread more
Citations
More filters
Journal ArticleDOI
An overview of noise-robust automatic speech recognition
TL;DR: A thorough overview of modern noise-robust techniques for ASR developed over the past 30 years is provided and methods that are proven to be successful and that are likely to sustain or expand their future applicability are emphasized.
Book ChapterDOI
Front-End, Back-End, and Hybrid Techniques for Noise-Robust Speech Recognition
TL;DR: The Bayesian framework is used as a common thread for connecting, analyzing, and categorizing a number of popular approaches to the solutions pursued in the recent past on the problem of uncertainty handling in robust speech recognition.
Book ChapterDOI
Model-Based Approaches to Handling Uncertainty
TL;DR: This chapter describes the underlying concepts of model-based noise compensation for robust speech recognition and how it can be applied to standard systems and considers important practical issues.
Journal ArticleDOI
Phase-Aware Single-Channel Speech Enhancement With Modulation-Domain Kalman Filtering
Nikolaos Dionelis,Mike Brookes +1 more
TL;DR: In this paper, a modulation-domain Kalman filter was proposed to track the speech phase using circular statistics, along with the spectral log-amplitudes of speech and noise.
Journal ArticleDOI
Monaural multi-talker speech recognition using factorial speech processing models
TL;DR: In this article, a joint token passing algorithm was proposed for direct joint decoding of target and masker speakers' mixed-signals, which achieved 5.3% absolute task performance improvement compared to the first super-human system.
References
More filters
Book
Time Series: Data Analysis and Theory
TL;DR: This book will be most useful to applied mathematicians, communication engineers, signal processors, statisticians, and time series researchers, both applied and theoretical.
Journal ArticleDOI
Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion
TL;DR: A new technique for dynamic, frame-by-frame compensation of the Gaussian variances in the hidden Markov model (HMM), exploiting the feature variance or uncertainty estimated during the speech feature enhancement process, to improve noise-robust speech recognition.
Proceedings Article
A comparison of three non-linear observation models for noisy speech features.
Jasha Droppo,Li Deng,Alex Acero +2 more
TL;DR: It is shown that the new approximation uses half the calculation, and produces equivalent or improved word accuracy scores, when compared to previous techniques.
Journal ArticleDOI
A Novel Uncertainty Decoding Rule With Applications to Transmission Error Robust Speech Recognition
TL;DR: An uncertainty decoding rule is derived for automatic speech recognition (ASR), which accounts for both corrupted observations and inter-frame correlation and shows how the clean speech posterior can be computed for communication links being characterized by either bit errors or packet loss.