scispace - formally typeset
Search or ask a question
Author

Jont B. Allen

Other affiliations: AT&T, Alcatel-Lucent, Urbana University  ...read more
Bio: Jont B. Allen is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topics: Speech perception & Consonant. The author has an hindex of 41, co-authored 165 publications receiving 10516 citations. Previous affiliations of Jont B. Allen include AT&T & Alcatel-Lucent.


Papers
More filters
Journal ArticleDOI
TL;DR: The theoretical and practical use of image techniques for simulating the impulse response between two points in a small rectangular room, when convolved with any desired input signal, simulates room reverberation of the input signal.
Abstract: Image methods are commonly used for the analysis of the acoustic properties of enclosures. In this paper we discuss the theoretical and practical use of image techniques for simulating, on a digital computer, the impulse response between two points in a small rectangular room. The resulting impulse response, when convolved with any desired input signal, such as speech, simulates room reverberation of the input signal. This technique is useful in signal processing or psychoacoustic studies. The entire process is carried out on a digital computer so that a wide range of room parameters can be studied with accurate control over the experimental conditions. A fortran implementation of this model has been included.

3,720 citations

Journal ArticleDOI
01 Nov 1977
TL;DR: The effects of modifications made to the short-time transform are explicitly shown on the resulting signal and it is shown that a formal duality exists between the two synthesis methods based on the properties of the window used for obtaining theshort-time Fourier transform.
Abstract: Two distinct methods for synthesizing a signal from its short-time Fourier transform have previously been proposed. We call these methods the filter-bank summation (FBS) method and the overlap add (OLA) method. Each of these synthesis techniques has unique advantages and disadvantages in various applications due to the way in which the signal is reconstructed. In this paper we unify the ideas behind the two synthesis techniques and discuss the similarities and differences between these methods. In particular, we explicitly show the effects of modifications made to the short-time transform (both fixed and time-varying modifications are considered) on the resulting signal and discuss applications where each of the techniques would be most useful The interesting case of nonlinear modifications (possibly signal dependent) to the short-time Fourier transform is also discussed. Finally it is shown that a formal duality exists between the two synthesis methods based on the properties of the window used for obtaining the short-time Fourier transform.

954 citations

Journal ArticleDOI
Jont B. Allen1
TL;DR: In this article, a theory of short term spectral analysis, synthesis, and modification is presented with an attempt at pointing out certain practical and theoretical questions, which are useful in designing filter banks when the filter bank outputs are to be used for synthesis after multiplicative modifications are made to the spectrum.
Abstract: A theory of short term spectral analysis, synthesis, and modification is presented with an attempt at pointing out certain practical and theoretical questions. The methods discussed here are useful in designing filter banks when the filter bank outputs are to be used for synthesis after multiplicative modifications are made to the spectrum.

899 citations

Journal ArticleDOI
Jont B. Allen1
TL;DR: Until the performance of automatic speech recognition (ASR) hardware surpasses human performance in accuracy and robustness, the authors stand to gain by understanding the basic principles behind human Speech recognition (HSR).
Abstract: Until the performance of automatic speech recognition (ASR) hardware surpasses human performance in accuracy and robustness, we stand to gain by understanding the basic principles behind human speech recognition (HSR). This problem was studied exhaustively at Bell Labs between the years of 1918 and 1950 by Harvey Fletcher and his colleagues. The motivation for these studies was to quantify the quality of speech sounds in the telephone plant to both improve speech intelligibility and preference. To do this he and his group studied the effects of filtering and noise on speech recognition accuracy for nonsense consonant-vowel-consonant (CVC) syllables, words, and sentences. Fletcher used the term "articulation" as the probability of correct recognition for nonsense sounds, and "intelligibility" as the probability of correction recognition for words (sounds having meaning). In 1919, Fletcher found a way to transform articulation data for filtered speech into an additive density function D(f) and found a formula that accurately predicts the average articulation. The area under D(S) is called the "articulation index." Fletcher then went on to find relationships between the recognition errors for the nonsense speech sounds, words, and sentences. This work has recently been reviewed and partially replicated by Boothroyd and by Bronkhorst, et al. (1980). Taken as a whole, these studies tell us a great deal about how humans process and recognize speech sounds. >

428 citations

Journal ArticleDOI
TL;DR: In this article, the authors used a point image method to solve for wall reflections and a Nyquist plot was used to determine whether a given room impulse response was minimum phase when the initial delay was removed.
Abstract: When a conversation takes place inside a room, the acoustic speech signal is distorted by wall reflections. The room’s effect on this signal can be characterized by a room impulse response. If the impulse response happens to be minimum phase, it can easily be inverted. Synthetic room impulse responses were generated using a point image method to solve for wall reflections. A Nyquist plot was used to determine whether a given impulse response was minimum phase. Certain synthetic room impulse responses were found to be minimum phase when the initial delay was removed. A minimum phase inverse filter was successfully used to remove the effect of a room impulse response on a speech signal.

377 citations


Cited by
More filters
Journal ArticleDOI
S. Boll1
TL;DR: A stand-alone noise suppression algorithm that resynthesizes a speech waveform and can be used as a pre-processor to narrow-band voice communications systems, speech recognition systems, or speaker authentication systems.
Abstract: A stand-alone noise suppression algorithm is presented for reducing the spectral effects of acoustically added noise in speech. Effective performance of digital speech processors operating in practical environments may require suppression of noise from the digital wave-form. Spectral subtraction offers a computationally efficient, processor-independent approach to effective digital speech analysis. The method, requiring about the same computation as high-speed convolution, suppresses stationary noise from speech by subtracting the spectral noise bias calculated during nonspeech activity. Secondary procedures are then applied to attenuate the residual noise left after subtraction. Since the algorithm resynthesizes a speech waveform, it can be used as a pre-processor to narrow-band voice communications systems, speech recognition systems, or speaker authentication systems.

4,862 citations

Journal ArticleDOI
TL;DR: This work proposes an entirely non-recursive variational mode decomposition model, where the modes are extracted concurrently and is a generalization of the classic Wiener filter into multiple, adaptive bands.
Abstract: During the late 1990s, Huang introduced the algorithm called Empirical Mode Decomposition, which is widely used today to recursively decompose a signal into different modes of unknown but separate spectral bands. EMD is known for limitations like sensitivity to noise and sampling. These limitations could only partially be addressed by more mathematical attempts to this decomposition problem, like synchrosqueezing, empirical wavelets or recursive variational decomposition. Here, we propose an entirely non-recursive variational mode decomposition model, where the modes are extracted concurrently. The model looks for an ensemble of modes and their respective center frequencies, such that the modes collectively reproduce the input signal, while each being smooth after demodulation into baseband. In Fourier domain, this corresponds to a narrow-band prior. We show important relations to Wiener filter denoising. Indeed, the proposed method is a generalization of the classic Wiener filter into multiple, adaptive bands. Our model provides a solution to the decomposition problem that is theoretically well founded and still easy to understand. The variational model is efficiently optimized using an alternating direction method of multipliers approach. Preliminary results show attractive performance with respect to existing mode decomposition models. In particular, our proposed model is much more robust to sampling and noise. Finally, we show promising practical decomposition results on a series of artificial and real data.

4,111 citations

Journal ArticleDOI
David J. Thomson1
01 Sep 1982
TL;DR: In this article, a local eigenexpansion is proposed to estimate the spectrum of a stationary time series from a finite sample of the process, which is equivalent to using the weishted average of a series of direct-spectrum estimates based on orthogonal data windows to treat both bias and smoothing problems.
Abstract: In the choice of an estimator for the spectrum of a stationary time series from a finite sample of the process, the problems of bias control and consistency, or "smoothing," are dominant. In this paper we present a new method based on a "local" eigenexpansion to estimate the spectrum in terms of the solution of an integral equation. Computationally this method is equivalent to using the weishted average of a series of direct-spectrum estimates based on orthogonal data windows (discrete prolate spheroidal sequences) to treat both the bias and smoothing problems. Some of the attractive features of this estimate are: there are no arbitrary windows; it is a small sample theory; it is consistent; it provides an analysis-of-variance test for line components; and it has high resolution. We also show relations of this estimate to maximum-likelihood estimates, show that the estimation capacity of the estimate is high, and show applications to coherence and polyspectrum estimates.

3,921 citations

Journal ArticleDOI
TL;DR: The theoretical and practical use of image techniques for simulating the impulse response between two points in a small rectangular room, when convolved with any desired input signal, simulates room reverberation of the input signal.
Abstract: Image methods are commonly used for the analysis of the acoustic properties of enclosures. In this paper we discuss the theoretical and practical use of image techniques for simulating, on a digital computer, the impulse response between two points in a small rectangular room. The resulting impulse response, when convolved with any desired input signal, such as speech, simulates room reverberation of the input signal. This technique is useful in signal processing or psychoacoustic studies. The entire process is carried out on a digital computer so that a wide range of room parameters can be studied with accurate control over the experimental conditions. A fortran implementation of this model has been included.

3,720 citations

Journal ArticleDOI
Leon Cohen1
01 Jul 1989
TL;DR: A review and tutorial of the fundamental ideas and methods of joint time-frequency distributions is presented with emphasis on the diversity of concepts and motivations that have gone into the formation of the field.
Abstract: A review and tutorial of the fundamental ideas and methods of joint time-frequency distributions is presented. The objective of the field is to describe how the spectral content of a signal changes in time and to develop the physical and mathematical ideas needed to understand what a time-varying spectrum is. The basic gal is to devise a distribution that represents the energy or intensity of a signal simultaneously in time and frequency. Although the basic notions have been developing steadily over the last 40 years, there have recently been significant advances. This review is intended to be understandable to the nonspecialist with emphasis on the diversity of concepts and motivations that have gone into the formation of the field. >

3,568 citations