scispace - formally typeset
Search or ask a question
Author

AG Armin Kohlrausch

Bio: AG Armin Kohlrausch is an academic researcher from Eindhoven University of Technology. The author has contributed to research in topics: Binaural recording & Noise. The author has an hindex of 32, co-authored 230 publications receiving 5345 citations. Previous affiliations of AG Armin Kohlrausch include Analysis Group & University of Göttingen.


Papers
More filters
Journal ArticleDOI
TL;DR: A quantitative model for describing data from modulation-detection and modulation-masking experiments is presented, which proposes that the typical low-pass characteristic of the temporal modulation transfer function observed with wide-band noise carriers is not due to "sluggishness" in the auditory system, but can instead be understood in terms of the interaction between modulation filters and the inherent fluctuations in the carrier.
Abstract: This paper presents a quantitative model for describing data from modulation-detection and modulation-masking experiments, which extends the model of the ‘‘effective’’ signal processing of the auditory system described in Dau et al. @J. Acoust. Soc. Am. 99, 3615‐3622 ~1996!#. The new element in the present model is a modulation filterbank, which exhibits two domains with different scaling. In the range 0‐10 Hz, the modulation filters have a constant bandwidth of 5 Hz. Between 10 Hz and 1000 Hz a logarithmic scaling with a constant Q value of 2 was assumed. To preclude spectral effects in temporal processing, measurements and corresponding simulations were performed with stochastic narrow-band noise carriers at a high center frequency ~5 kHz!. For conditions in which the modulation rate ( f mod) was smaller than half the bandwidth of the carrier (D f ), the model accounts for the low-pass characteristic in the threshold functions @e.g., Viemeister, J. Acoust. Soc. Am. 66, 1364‐1380 ~1979!#. In conditions with f mod.D f /2, the model can account for the high-pass characteristic in the threshold function. In a further experiment, a classical masking paradigm for investigating frequency selectivity was adopted and translated to the modulation-frequency domain. Masked thresholds for sinusoidal test modulation in the presence of a competing modulation masker were measured and simulated as a function of the test modulation rate. In all cases, the model describes the experimental data to within a few dB. It is proposed that the typical low-pass characteristic of the temporal modulation transfer function observed with wide-band noise carriers is not due to ‘‘sluggishness’’ in the auditory system, but can instead be understood in terms of the interaction between modulation filters and the inherent fluctuations in the carrier. © 1997 Acoustical Society of America.@S0001-4966~97!05611-7#

580 citations

Journal ArticleDOI
TL;DR: A quantitative model for signal processing in the auditory system that combines a series of preprocessing stages with an optimal detector as the decision device allows one to estimate thresholds with the same signals and psychophysical procedures as those used in actual experiments.
Abstract: This paper describes a quantitative model for signal processing in the auditory system. The model combines a series of preprocessing stages with an optimal detector as the decision device. The present paper gives a description of the various preprocessing stages and of the implementation of the optimal detector. The output of the preprocessing stages is a time‐varying activity pattern to which ‘‘internal noise’’ is added. In the decision process, a stored temporal representation of the signal to be detected (template) is compared with the actual activity pattern. The comparison amounts to calculating the correlation between the two temporal patterns and is comparable to a ‘‘matched filtering’’ process. The detector itself derives the template at the beginning of each simulated threshold measurement from a suprathreshold value of the stimulus. The model allows one to estimate thresholds with the same signals and psychophysical procedures as those used in actual experiments. In the accompanying paper [Dau et al., J. Acoust. Soc. Am. 99, •••–••• (1996)] data obtained for human observers are compared with the optimal‐detector model for various masking conditions.

499 citations

Journal ArticleDOI
TL;DR: The combination of the modulation filterbank concept and the optimal decision algorithm proposed here appears to present a powerful strategy for describing modulation-detection phenomena in narrow-band and broadband conditions.
Abstract: A multi-channel model, describing the effects of spectral and temporal integration in amplitude-modulation detection for a stochastic noise carrier, is proposed and validated. The model is based on the modulation filterbank concept which was established in the accompanying paper [Dau et al., J. Acoust. Soc. Am. 102, 2892–2905 (1997)] for modulation perception in narrow-band conditions (single-channel model). To integrate information across frequency, the detection process of the model linearly combines the channel outputs. To integrate information across time, a kind of “multiple-look” strategy, is realized within the detection stage of the model. Both data from the literature and new data are used to validate the model. The model predictions agree with the results of Eddins [J. Acoust. Soc. Am. 93, 470–479 (1993)] that the “time constants” associated with the temporal modulation transfer functions (TMTF) derived for narrow-band stimuli do not vary with carrier frequency region and that they decrease monotonically with increasing stimulus bandwidth. The model is able to predict masking patterns in the modulation-frequency domain, as observed experimentally by Houtgast [J. Acoust. Soc. Am. 85, 1676–1680 (1989)]. The model also accounts for the finding by Sheft and Yost [J. Acoust. Soc. Am. 88, 796–805 (1990)] that the long “effective” integration time constants derived from the data are two orders of magnitude larger than the time constants derived from the cutoff frequency of the TMTF. Finally, the temporal-summation properties of the model allow the prediction of data in a specific temporal paradigm used earlier by Viemeister and Wakefield [J. Acoust. Soc. Am. 90, 858–865 (1991)]. The combination of the modulation filterbank concept and the optimal decision algorithm proposed here appears to present a powerful strategy for describing modulation-detection phenomena in narrow-band and broadband conditions.

308 citations

Journal ArticleDOI
TL;DR: The shape of the TMTF and the beat-detection data reflects a limitation in resolving fast amplitude variations, which must occur central to the inner-ear filtering.
Abstract: This paper is concerned with modulation and beat detection for sinusoidal carriers. In the first experiment, temporal modulation transfer functions (TMTFs) were measured for carrier frequencies between 1 and 10 kHz. Modulation rates covered the range from 10 Hz to about the rate equaling the critical bandwidth at the carrier frequency. In experiment 2, TMTFs for three carrier frequencies were obtained as a function of the carrier level. In the final experiment, thresholds for the detection of either the lower or the upper modulation sideband (beat detection) were measured for “carrier” frequencies of 5 and 10 kHz, using the same range of modulation rates as in experiment 1. The TMTFs for carrier frequencies of 2 kHz and higher remained flat up to a modulation rate of about 100–130 Hz and had similar values across carrier frequencies. For higher rates, modulation thresholds initially increased and then decreased rapidly, reflecting the subjects’ ability to resolve the sidebands spectrally. Detection thresholds generally improved with increasing carrier level, but large variations in the exact level dependence were observed, across subjects as well as across carrier frequencies. For beat rates up to about 70 Hz (at 5 kHz) and 100 Hz (at 10 kHz), beat detection thresholds were the same for the upper and the lower sidebands and were about 6 dB higher than the level per sideband at the modulation-detection threshold. At higher rates the threshold for both sidebands increased, but the increase was larger for the lower sideband. This reflects an asymmetry in masking with more masking towards lower frequencies. Only at rates well beyond the maximum of the TMTF did detection for the lower sideband start to be better than that for the upper sideband. The asymmetry at intermediate frequency separations can be explained by assuming that detection always takes place in filters centered above the stimulus spectrum. The shape of the TMTF and the beat-detection data reflects a limitation in resolving fast amplitude variations, which must occur central to the inner-ear filtering. Its characteristic resembles that of a first-order low-pass filter with a cutoff frequency of about 150 Hz.

242 citations

Journal ArticleDOI
TL;DR: Experiments show that the parameterized description of spatial properties enables a highly efficient, high-quality stereo audio representation.
Abstract: Parametric-stereo coding is a technique to efficiently code a stereo audio signal as a monaural signal plus a small amount of parametric overhead to describe the stereo image. The stereo properties are analyzed, encoded, and reinstated in a decoder according to spatial psychoacoustical principles. The monaural signal can be encoded using any (conventional) audio coder. Experiments show that the parameterized description of spatial properties enables a highly efficient, high-quality stereo audio representation.

228 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds, based on the well-known autocorrelation method with a number of modifications that combine to prevent errors.
Abstract: An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds. It is based on the well-known autocorrelation method with a number of modifications that combine to prevent errors. The algorithm has several desirable features. Error rates are about three times lower than the best competing methods, as evaluated over a database of speech recorded together with a laryngograph signal. There is no upper limit on the frequency search range, so the algorithm is suited for high-pitched voices and music. The algorithm is relatively simple and may be implemented efficiently and with low latency, and it involves few parameters that must be tuned. It is based on a signal model (periodic signal) that may be extended in several ways to handle various forms of aperiodicity that occur in particular applications. Finally, interesting parallels may be drawn with models of auditory processing.

1,975 citations

Journal ArticleDOI
TL;DR: A short-time objective intelligibility measure (STOI) is presented, which shows high correlation with the intelligibility of noisy and time-frequency weighted noisy speech (e.g., resulting from noise reduction) of three different listening experiments and showed better correlation with speech intelligibility compared to five other reference objective intelligible models.
Abstract: In the development process of noise-reduction algorithms, an objective machine-driven intelligibility measure which shows high correlation with speech intelligibility is of great interest. Besides reducing time and costs compared to real listening experiments, an objective intelligibility measure could also help provide answers on how to improve the intelligibility of noisy unprocessed speech. In this paper, a short-time objective intelligibility measure (STOI) is presented, which shows high correlation with the intelligibility of noisy and time-frequency weighted noisy speech (e.g., resulting from noise reduction) of three different listening experiments. In general, STOI showed better correlation with speech intelligibility compared to five other reference objective intelligibility models. In contrast to other conventional intelligibility models which tend to rely on global statistics across entire sentences, STOI is based on shorter time segments (386 ms). Experiments indeed show that it is beneficial to take segment lengths of this order into account. In addition, a free Matlab implementation is provided.

1,847 citations

Book
01 Jan 2001
TL;DR: This chapter discusses the Discrete-Time Speech Signal Processing Framework, a model based on the FBS Method, and its applications in Speech Communication Pathway and Homomorphic Signal Processing.
Abstract: (NOTE: Each chapter begins with an introduction and concludes with a Summary, Exercises and Bibliography.) 1. Introduction. Discrete-Time Speech Signal Processing. The Speech Communication Pathway. Analysis/Synthesis Based on Speech Production and Perception. Applications. Outline of Book. 2. A Discrete-Time Signal Processing Framework. Discrete-Time Signals. Discrete-Time Systems. Discrete-Time Fourier Transform. Uncertainty Principle. z-Transform. LTI Systems in the Frequency Domain. Properties of LTI Systems. Time-Varying Systems. Discrete-Fourier Transform. Conversion of Continuous Signals and Systems to Discrete Time. 3. Production and Classification of Speech Sounds. Anatomy and Physiology of Speech Production. Spectrographic Analysis of Speech. Categorization of Speech Sounds. Prosody: The Melody of Speech. Speech Perception. 4. Acoustics of Speech Production. Physics of Sound. Uniform Tube Model. A Discrete-Time Model Based on Tube Concatenation. Vocal Fold/Vocal Tract Interaction. 5. Analysis and Synthesis of Pole-Zero Speech Models. Time-Dependent Processing. All-Pole Modeling of Deterministic Signals. Linear Prediction Analysis of Stochastic Speech Sounds. Criterion of "Goodness". Synthesis Based on All-Pole Modeling. Pole-Zero Estimation. Decomposition of the Glottal Flow Derivative. Appendix 5.A: Properties of Stochastic Processes. Random Processes. Ensemble Averages. Stationary Random Process. Time Averages. Power Density Spectrum. Appendix 5.B: Derivation of the Lattice Filter in Linear Prediction Analysis. 6. Homomorphic Signal Processing. Concept. Homomorphic Systems for Convolution. Complex Cepstrum of Speech-Like Sequences. Spectral Root Homomorphic Filtering. Short-Time Homomorphic Analysis of Periodic Sequences. Short-Time Speech Analysis. Analysis/Synthesis Structures. Contrasting Linear Prediction and Homomorphic Filtering. 7. Short-Time Fourier Transform Analysis and Synthesis. Short-Time Analysis. Short-Time Synthesis. Short-Time Fourier Transform Magnitude. Signal Estimation from the Modified STFT or STFTM. Time-Scale Modification and Enhancement of Speech. Appendix 7.A: FBS Method with Multiplicative Modification. 8. Filter-Bank Analysis/Synthesis. Revisiting the FBS Method. Phase Vocoder. Phase Coherence in the Phase Vocoder. Constant-Q Analysis/Synthesis. Auditory Modeling. 9. Sinusoidal Analysis/Synthesis. Sinusoidal Speech Model. Estimation of Sinewave Parameters. Synthesis. Source/Filter Phase Model. Additive Deterministic-Stochastic Model. Appendix 9.A: Derivation of the Sinewave Model. Appendix 9.B: Derivation of Optimal Cubic Phase Parameters. 10. Frequency-Domain Pitch Estimation. A Correlation-Based Pitch Estimator. Pitch Estimation Based on a "Comb Filter<170. Pitch Estimation Based on a Harmonic Sinewave Model. Glottal Pulse Onset Estimation. Multi-Band Pitch and Voicing Estimation. 11. Nonlinear Measurement and Modeling Techniques. The STFT and Wavelet Transform Revisited. Bilinear Time-Frequency Distributions. Aeroacoustic Flow in the Vocal Tract. Instantaneous Teager Energy Operator. 12. Speech Coding. Statistical Models of Speech. Scaler Quantization. Vector Quantization (VQ). Frequency-Domain Coding. Model-Based Coding. LPC Residual Coding. 13. Speech Enhancement. Introduction. Preliminaries. Wiener Filtering. Model-Based Processing. Enhancement Based on Auditory Masking. Appendix 13.A: Stochastic-Theoretic parameter Estimation. 14. Speaker Recognition. Introduction. Spectral Features for Speaker Recognition. Speaker Recognition Algorithms. Non-Spectral Features in Speaker Recognition. Signal Enhancement for the Mismatched Condition. Speaker Recognition from Coded Speech. Appendix 14.A: Expectation-Maximization (EM) Estimation. Glossary.Speech Signal Processing.Units.Databases.Index.About the Author.

984 citations

Journal ArticleDOI
21 Jun 2007-Neuron
TL;DR: It is shown that the phase pattern of theta band responses recorded from human auditory cortex with magnetoencephalography (MEG) reliably tracks and discriminates spoken sentences and that this discrimination ability is correlated with speech intelligibility.

877 citations