Journal ArticleDOI
RASTA processing of speech
Hynek Hermansky,Nelson Morgan +1 more
TLDR
The theoretical and experimental foundations of the RASTA method are reviewed, the relationship with human auditory perception is discussed, the original method is extended to combinations of additive noise and convolutional noise, and an application is shown to speech enhancement.Abstract:
Performance of even the best current stochastic recognizers severely degrades in an unexpected communications environment. In some cases, the environmental effect can be modeled by a set of simple transformations and, in particular, by convolution with an environmental impulse response and the addition of some environmental noise. Often, the temporal properties of these environmental effects are quite different from the temporal properties of speech. We have been experimenting with filtering approaches that attempt to exploit these differences to produce robust representations for speech recognition and enhancement and have called this class of representations relative spectra (RASTA). In this paper, we review the theoretical and experimental foundations of the method, discuss the relationship with human auditory perception, and extend the original method to combinations of additive noise and convolutional noise. We discuss the relationship between RASTA features and the nature of the recognition models that are required and the relationship of these features to delta features and to cepstral mean subtraction. Finally, we show an application of the RASTA technique to speech enhancement. >read more
Citations
More filters
Journal ArticleDOI
An overview of text-independent speaker recognition: From features to supervectors
Tomi Kinnunen,Haizhou Li +1 more
TL;DR: This paper starts with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling and elaborate advanced computational techniques to address robustness and session variability.
Journal ArticleDOI
Supervised Speech Separation Based on Deep Learning: An Overview
DeLiang Wang,Jitong Chen +1 more
TL;DR: A comprehensive overview of deep learning-based supervised speech separation can be found in this paper, where three main components of supervised separation are discussed: learning machines, training targets, and acoustic features.
Book
Application of Hidden Markov Models in Speech Recognition
Mark J. F. Gales,Steve Young +1 more
TL;DR: The aim of this review is first to present the core architecture of a HMM-based LVCSR system and then to describe the various refinements which are needed to achieve state-of-the-art performance.
Journal ArticleDOI
Multiresolution spectrotemporal analysis of complex sounds
TL;DR: A computational model of auditory analysis is described that is inspired by psychoacoustical and neurophysiological findings in early and central stages of the auditory system and provides a unified multiresolution representation of the spectral and temporal features likely critical in the perception of sound.
Journal ArticleDOI
Audio-visual speech modeling for continuous speech recognition
Stéphane Dupont,Juergen Luettin +1 more
TL;DR: A speech recognition system that uses both acoustic and visual speech information to improve recognition performance in noisy environments and is demonstrated on a large multispeaker database of continuously spoken digits.
References
More filters
Journal ArticleDOI
Suppression of acoustic noise in speech using spectral subtraction
TL;DR: A stand-alone noise suppression algorithm that resynthesizes a speech waveform and can be used as a pre-processor to narrow-band voice communications systems, speech recognition systems, or speaker authentication systems.
Journal ArticleDOI
Perceptual linear predictive (PLP) analysis of speech
TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.
Journal ArticleDOI
Blind deconvolution through digital signal processing
TL;DR: In this paper, the blind deconvolution problem of two signals when both are unknown is addressed and two related solutions which can be applied through digital signal processing in certain practical cases are discussed.
Journal ArticleDOI
Differential Intensity Sensitivity of the Ear for Pure Tones
TL;DR: In this article, the authors measured the differential sensitivity of the ear as a function of frequency and intensity and found that the ear can distinguish 370 separate tones between the threshold of audition and threshold of feeling at about 1300 c.p.s.
Proceedings ArticleDOI
RASTA-PLP speech analysis technique
TL;DR: The authors have developed a technique that is more robust to such steady-state spectral factors in speech that is conceptually simple and computationally efficient.
Related Papers (5)
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
S. Davis,Paul Mermelstein +1 more