scispace - formally typeset
Journal ArticleDOI

RASTA processing of speech

TLDR
The theoretical and experimental foundations of the RASTA method are reviewed, the relationship with human auditory perception is discussed, the original method is extended to combinations of additive noise and convolutional noise, and an application is shown to speech enhancement.
Abstract
Performance of even the best current stochastic recognizers severely degrades in an unexpected communications environment. In some cases, the environmental effect can be modeled by a set of simple transformations and, in particular, by convolution with an environmental impulse response and the addition of some environmental noise. Often, the temporal properties of these environmental effects are quite different from the temporal properties of speech. We have been experimenting with filtering approaches that attempt to exploit these differences to produce robust representations for speech recognition and enhancement and have called this class of representations relative spectra (RASTA). In this paper, we review the theoretical and experimental foundations of the method, discuss the relationship with human auditory perception, and extend the original method to combinations of additive noise and convolutional noise. We discuss the relationship between RASTA features and the nature of the recognition models that are required and the relationship of these features to delta features and to cepstral mean subtraction. Finally, we show an application of the RASTA technique to speech enhancement. >

read more

Citations
More filters
Journal ArticleDOI

An overview of text-independent speaker recognition: From features to supervectors

TL;DR: This paper starts with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling and elaborate advanced computational techniques to address robustness and session variability.
Journal ArticleDOI

Supervised Speech Separation Based on Deep Learning: An Overview

TL;DR: A comprehensive overview of deep learning-based supervised speech separation can be found in this paper, where three main components of supervised separation are discussed: learning machines, training targets, and acoustic features.
Book

Application of Hidden Markov Models in Speech Recognition

TL;DR: The aim of this review is first to present the core architecture of a HMM-based LVCSR system and then to describe the various refinements which are needed to achieve state-of-the-art performance.
Journal ArticleDOI

Multiresolution spectrotemporal analysis of complex sounds

TL;DR: A computational model of auditory analysis is described that is inspired by psychoacoustical and neurophysiological findings in early and central stages of the auditory system and provides a unified multiresolution representation of the spectral and temporal features likely critical in the perception of sound.
Journal ArticleDOI

Audio-visual speech modeling for continuous speech recognition

TL;DR: A speech recognition system that uses both acoustic and visual speech information to improve recognition performance in noisy environments and is demonstrated on a large multispeaker database of continuously spoken digits.
References
More filters
Journal ArticleDOI

Suppression of acoustic noise in speech using spectral subtraction

TL;DR: A stand-alone noise suppression algorithm that resynthesizes a speech waveform and can be used as a pre-processor to narrow-band voice communications systems, speech recognition systems, or speaker authentication systems.
Journal ArticleDOI

Perceptual linear predictive (PLP) analysis of speech

TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.
Journal ArticleDOI

Blind deconvolution through digital signal processing

TL;DR: In this paper, the blind deconvolution problem of two signals when both are unknown is addressed and two related solutions which can be applied through digital signal processing in certain practical cases are discussed.
Journal ArticleDOI

Differential Intensity Sensitivity of the Ear for Pure Tones

TL;DR: In this article, the authors measured the differential sensitivity of the ear as a function of frequency and intensity and found that the ear can distinguish 370 separate tones between the threshold of audition and threshold of feeling at about 1300 c.p.s.
Proceedings ArticleDOI

RASTA-PLP speech analysis technique

TL;DR: The authors have developed a technique that is more robust to such steady-state spectral factors in speech that is conceptually simple and computationally efficient.
Related Papers (5)