scispace - formally typeset
Search or ask a question
Author

R. Zelinski

Bio: R. Zelinski is an academic researcher. The author has contributed to research in topics: Background noise & Noise figure. The author has an hindex of 1, co-authored 1 publications receiving 364 citations.

Papers
More filters
Proceedings ArticleDOI
11 Apr 1988
TL;DR: The author presents a self-adapting noise reduction system which is based on a four-microphone array combined with an adaptive postfiltering scheme which produces an enhanced speech signal with barely noticeable residual noise if the input SNR is greater than 0 dB.
Abstract: The author presents a self-adapting noise reduction system which is based on a four-microphone array combined with an adaptive postfiltering scheme. Noise reduction is achieved by utilizing the directivity gain of the array and by reducing the residual noise through postfiltering of the received microphone signals. The postfiltering scheme depends on a Wiener filter estimating the desired speech signal and is computed from short-term measurements of the autocorrelation and cross-correlation functions of the microphone signals. The noise reduction system has been tested experimentally in a typical office room. The system produces an enhanced speech signal with barely noticeable residual noise if the input SNR is greater than 0 dB. The received noise power-measured in the absence of the speech signal-can be reduced by 28 dB. >

370 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This paper considers a sensor array located in an enclosure, where arbitrary transfer functions (TFs) relate the source signal and the sensors, and derives a suboptimal algorithm that can be implemented by estimating theTFs ratios, instead of estimating the TFs.
Abstract: We consider a sensor array located in an enclosure, where arbitrary transfer functions (TFs) relate the source signal and the sensors. The array is used for enhancing a signal contaminated by interference. Constrained minimum power adaptive beamforming, which has been suggested by Frost (1972) and, in particular, the generalized sidelobe canceler (GSC) version, which has been developed by Griffiths and Jim (1982), are the most widely used beamforming techniques. These methods rely on the assumption that the received signals are simple delayed versions of the source signal. The good interference suppression attained under this assumption is severely impaired in complicated acoustic environments, where arbitrary TFs may be encountered. In this paper, we consider the arbitrary TF case. We propose a GSC solution, which is adapted to the general TF case. We derive a suboptimal algorithm that can be implemented by estimating the TFs ratios, instead of estimating the TFs. The TF ratios are estimated by exploiting the nonstationarity characteristics of the desired signal. The algorithm is applied to the problem of speech enhancement in a reverberating room. The discussion is supported by an experimental study using speech and noise signals recorded in an actual room acoustics environment.

708 citations

BookDOI
01 May 1991
TL;DR: This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment, including the SNR-Dependent Cepstral Normalization, (SDCN) and the Codeword-Dependent Cep stral normalization (CDCN).
Abstract: This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in different acoustical environments, and when a desk-top microphone (rather than a close-talking microphone) is used for speech input. Without such processing, mismatches between training and testing conditions produce an unacceptable degradation in recognition accuracy. Two kinds of environmental variability are introduced by the use of desk-top microphones and different training and testing conditions: additive noise and spectral tilt introduced by linear filtering. An important attribute of the novel compensation algorithms described in this thesis is that they provide joint rather than independent compensation for these two types of degradation. Acoustical compensation is applied in our algorithms as an additive correction in the cepstral domain. This allows a higher degree of integration within SPHINX, the Carnegie Mellon speech recognition system, that uses the cepstrum as its feature vector. Therefore, these algorithms can be implemented very efficiently. Processing in many of these algorithms is based on instantaneous signal-to-noise ratio (SNR), as the appropriate compensation represents a form of noise suppression at low SNRs and spectral equalization at high SNRs. The compensation vectors for additive noise and spectral transformations are estimated by minimizing the differences between speech feature vectors obtained from a "standard" training corpus of speech and feature vectors that represent the current acoustical environment. In our work this is accomplished by minimizing the distortion of vector-quantized cepstra that are produced by the feature extraction module in SPHINX. In this dissertation we describe several algorithms including the SNR-Dependent Cepstral Normalization, (SDCN) and the Codeword-Dependent Cepstral Normalization (CDCN). With CDCN, the accuracy of SPHINX when trained on speech recorded with a close-talking microphone and tested on speech recorded with a desk-top microphone is essentially the same obtained when the system is trained and tested on speech from the desk-top microphone. An algorithm for frequency normalization has also been proposed in which the parameter of the bilinear transformation that is used by the signal-processing stage to produce frequency warping is adjusted for each new speaker and acoustical environment. The optimum value of this parameter is again chosen to minimize the vector-quantization distortion between the standard environment and the current one. In preliminary studies, use of this frequency normalization produced a moderate additional decrease in the observed error rate.

474 citations

Journal ArticleDOI
TL;DR: This paper proposes to analyze a large number of established and recent techniques according to four transverse axes: 1) the acoustic impulse response model, 2) the spatial filter design criterion, 3) the parameter estimation algorithm, and 4) optional postfiltering.
Abstract: Speech enhancement and separation are core problems in audio signal processing, with commercial applications in devices as diverse as mobile phones, conference call systems, hands-free systems, or hearing aids. In addition, they are crucial preprocessing steps for noise-robust automatic speech and speaker recognition. Many devices now have two to eight microphones. The enhancement and separation capabilities offered by these multichannel interfaces are usually greater than those of single-channel interfaces. Research in speech enhancement and separation has followed two convergent paths, starting with microphone array processing and blind source separation, respectively. These communities are now strongly interrelated and routinely borrow ideas from each other. Yet, a comprehensive overview of the common foundations and the differences between these approaches is lacking at present. In this paper, we propose to fill this gap by analyzing a large number of established and recent techniques according to four transverse axes: 1 the acoustic impulse response model, 2 the spatial filter design criterion, 3 the parameter estimation algorithm, and 4 optional postfiltering. We conclude this overview paper by providing a list of software and data resources and by discussing perspectives and future trends in the field.

452 citations

01 Aug 2000
TL;DR: He went on to Brown University in Providence, Rhode Island to study signal processing and began research on microphone arrays and received a Master of Science degree in Electrical Engineering in 1993 and continued to pursue his work towards a Doctor of Philosophy degree.

403 citations

Journal ArticleDOI
TL;DR: It is found that training on different noise environments and different microphones barely affects the ASR performance, especially when several environments are present in the training data: only the number of microphones has a significant impact.

345 citations