Showing papers on "Speech enhancement published in 1989"

PDF

Open Access

Proceedings Article•DOI•

The voice activity detector for the Pan-European digital cellular mobile telephone service

[...]

Daniel Kenneth Freeman¹, G. Cosier¹, C.B. Southcott¹, Ivan Boyd¹•Institutions (1)

23 May 1989

TL;DR: In this article, a description of the voice activity detector (VAD) standardized by CEPT for use in the Pan-European digital cellular mobile telephone service is given, and performance tests carried out to validate the design are described.

...read moreread less

Abstract: A description is given of the voice activity detector (VAD) standardized by CEPT for use in the Pan-European digital cellular mobile telephone service The speech-coding algorithm chosen is a 13-kb/s speech coder, using a technique in which speech is produced at the decoder by passing a substitute for the residual through long-term and short-term predictor filters The difficulties of detecting speech in a noisy environment are discussed, and the performance tests carried out to validate the design are described The tests show that clipping levels are very low but that low levels of speech activity are recorded in conversations The VAD has low complexity (because it uses the results of analysis performed in the speech coder) and is failsafe in difficult conditions >

...read moreread less

179 citations

Proceedings Article•DOI•

Filtering of colored noise for speech enhancement and coding

[...]

B. Koo¹, Jerry D. Gibson¹, S.D. Gray•Institutions (1)

Texas A&M University¹

23 May 1989

TL;DR: The colored-noise prefilter greatly enhances the quality and intelligibility of LPC output speech for noisy inputs, and it is demonstrated that such gains are unavailable with white noise assumption Kalman and Wiener filters.

...read moreread less

Abstract: A report is presented on experiments using a colored-noise assumption Kalman filter to enhance speech additively contaminated by colored noise, such as helicopter noise and jeep noise, with a particular application to linear predictive coding (LPC) of noisy speech. The results indicate that the colored-noise Kalman filter provides a significant gain in SNR, a clear improvement in the sound spectrogram, and an audible improvement in output speech quality. The authors demonstrate that such gains are unavailable with white noise assumption Kalman and Wiener filters. The colored-noise prefilter greatly enhances the quality and intelligibility of LPC output speech for noisy inputs. >

...read moreread less

132 citations

Proceedings Article•DOI•

A comparison of several acoustic representations for speech recognition with degraded and undegraded speech

[...]

M. Hunt¹, C. Lefebvre¹•Institutions (1)

National Research Council¹

23 May 1989

TL;DR: Several acoustic representations have been compared in speaker-dependent and independent connected and isolated-word recognition tests with undegraded speech and with speech degraded by adding white noise and by applying a 6-dB/octave spectral tilt.

...read moreread less

Abstract: Several acoustic representations have been compared in speaker-dependent and independent connected and isolated-word recognition tests with undegraded speech and with speech degraded by adding white noise and by applying a 6-dB/octave spectral tilt. The representations comprised the output of an auditory model, cepstrum coefficients derived from an FFT-based mel-scale filter bank with various weighting schemes applied to the coefficients, cepstrum coefficients augmented with measures of their rates of change with time, and sets of linear discriminant functions derived from the filter-bank output and called IMELDA. The model outperformed the cepstrum representations except in noise-free connected-word tests, where it had a high insertion rate. The best cepstrum weighting scheme was derived from within-class variances. Its behavior may explain the empirical adjustments found necessary with other schemes. IMELDA outperformed all other representations in all conditions and is computationally simple. >

...read moreread less

118 citations

Journal Article•DOI•

Maximum likelihood noise cancellation using the EM algorithm

[...]

Meir Feder¹, Alan V. Oppenheim¹, E. Weinstein²•Institutions (2)

Massachusetts Institute of Technology¹, Woods Hole Oceanographic Institution²

01 Feb 1989-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: An approach to the two-microphone speech enhancement problem is discussed, and a maximum-likelihood problem is formulated for estimating the parameters needed for canceling the noise, and solved by the iterative EM (estimate-maximize) technique.

...read moreread less

Abstract: An approach to the two-microphone speech enhancement problem is discussed. Specifically, a maximum-likelihood (ML) problem is formulated for estimating the parameters needed for canceling the noise, and solved by the iterative EM (estimate-maximize) technique. The EM algorithm has been implemented for both a simplified and a more general scenario. The results improve upon those obtained with the classical least-squares approach. >

...read moreread less

83 citations

Proceedings Article•DOI•

Phonetically sensitive discriminants for improved speech recognition

[...]

George R. Doddington¹•Institutions (1)

Texas Instruments¹

23 May 1989

TL;DR: A phonetically sensitive transformation of speech features has yielded significant improvement in speech-recognition performance and is designed to discriminate against out-of-class confusion data and is a function of phonetic state.

...read moreread less

Abstract: A phonetically sensitive transformation of speech features has yielded significant improvement in speech-recognition performance. This (linear) transformation of the speech feature vector is designed to discriminate against out-of-class confusion data and is a function of phonetic state. Evaluation of the technique on the TI/NBS connected digit database demonstrates word (sentence) error rates of 0.5% (1.5%) for unknown-length strings and 0.2% (0.6%) for known-length strings. These error rates are two to three times lower than the best previously reported results and suggest that significant improvements in speech-recognition system performance can be achieved by better acoustic-phonetic modeling. >

...read moreread less

82 citations

Proceedings Article•DOI•

The Lincoln robust continuous speech recognizer

[...]

D.B. Paul¹•Institutions (1)

Massachusetts Institute of Technology¹

23 May 1989

TL;DR: The Lincoln stress-resistant HMM (hidden Markov model) CSR has been extended to large-vocabulary continuous speech for both speaker-dependent (SD) and speaker-independent (SI) tasks.

...read moreread less

Abstract: The Lincoln stress-resistant HMM (hidden Markov model) CSR has been extended to large-vocabulary continuous speech for both speaker-dependent (SD) and speaker-independent (SI) tasks. Performance on the DARPA resource management task (991-word vocabulary, perplexity 60 word-pair grammar) is 3.5% word error rate for SD training of word-context-dependent triphone models and 12.6% word error rate for SI training of (word-context-free) tied-mixture triphone models. >

...read moreread less

77 citations

Proceedings Article•DOI•

Phase coherence in speech reconstruction for enhancement and coding applications

[...]

Thomas F. Quatieri¹, R. J. McAulay¹•Institutions (1)

Massachusetts Institute of Technology¹

23 May 1989

TL;DR: A zero-phase sinusoidal analysis-synthesis system which generates natural-sounding speech without the requirement of vocal tract phase is described, which provides a basis for improving sound quality by providing different levels of phase coherence in speech reconstruction for time-scale modification.

...read moreread less

Abstract: It has been shown that an analysis-synthesis system based on a sinusoidal representation leads to synthetic speech that is essentially perceptually indistinguishable from the original. A change in speech quality has been observed, however, when the phase relation of the sine waves is altered. This occurs in practice when sine waves are processed for speech enhancement and for speech coding. A description is given of a zero-phase sinusoidal analysis-synthesis system which generates natural-sounding speech without the requirement of vocal tract phase. The method provides a basis for improving sound quality by providing different levels of phase coherence in speech reconstruction for time-scale modification, for a baseline system for coding, and for reducing the peak-to-RMS ratio by dispersion. >

...read moreread less

48 citations

Journal Article•DOI•

Spoken-word recognition using dynamic features analysed by two-dimensional cepstrum

[...]

Yasuo Ariki¹, S. Mizuta¹, Masaaki Nagata¹, Toshiyuki Sakai¹•Institutions (1)

Kyoto University¹

01 Apr 1989

TL;DR: Using word and monosyllable recognition experiments based on dynamic programming (DP) matching of a time sequence of the TDC, it is confirmed that the global static features (spectral envelope) and global dynamic features are both effective for speech recognition.

...read moreread less

Abstract: In the paper, two-dimensional cepstrum (TDC) analysis and its application to word and monosyllable recognition are described. The TDC can simultaneously represent several different kinds of information contained in the speech waveform: static and dynamic features, as well as global and fine frequency structure. Noise reduction and speech enhancement can be easily performed using the TDC. Using word and monosyllable recognition experiments based on dynamic programming (DP) matching of a time sequence of the TDC, it is confirmed that the global static features (spectral envelope) and global dynamic features are both effective for speech recognition. A speaker-independent (noisy) word recognition algorithm is also proposed which recognises the words based on the similarity of dynamic features. The algorithm employs linear matching instead of DP nonlinear matching, requires a small amount of memory, and shows high speed and high accuracy in recognition. At present, the recognition rate is 89.0% at ∞ dB and 70.0% at 0 dB signal-to-noise ratio.

...read moreread less

40 citations

Proceedings Article•DOI•

The effects of subtractive-type speech enhancement/noise reduction algorithms on parameter estimation for improved recognition and coding in high noise environments

[...]

W.M. Kushner¹, Vladimir Goncharoff¹, C.S. Wu, V.V. Nguyen, J. Damoulakis - Show less +1 more•Institutions (1)

Martin Marietta Materials, Inc.¹

23 May 1989

TL;DR: The study shows that removing less than the full amount of noise and whitening it improves spectral estimation and speech device performance.

...read moreread less

Abstract: The authors present the results of a study designed to investigate the effects of subtractive-type noise reduction algorithms on LPC-based spectral parameter estimation as related to the performance of speech processors operating with input SNRs of 15 dB and below. Subtractive noise preprocessing greatly improves the SNR, but system performance improvement is not commensurate. LPC spectral estimation is affected by the character of the residual noise which exhibits greater variance and spectral granularity than the original broadband noise. The study shows that removing less than the full amount of noise and whitening it improves spectral estimation and speech device performance. Techniques and performance results are presented. >

...read moreread less

34 citations

Proceedings Article•DOI•

A comparative study of cepstral lifters and distance measures for all pole models of speech in noise

[...]

J.-C. Junqua¹, H. Wakita•Institutions (1)

Panasonic¹

23 May 1989

TL;DR: The main conclusions of this research are: when speech is produced in a quiet environment and in speaker-dependent automatic speech recognition (ASR), the cepstral projection measure significantly improves recognition scores for the three all-pole models considered, and a low model order of the analysis is suitable.

...read moreread less

Abstract: After a brief overview of the techniques utilized, the authors evaluate perceptually based linear prediction (PLP) analysis, and then report the results of a comparative study of several front-ends in the case of speech produced in quiet and noisy environments (Lombard effect). Several all-pole models of speech using various lifters and distance measures are compared in various noise conditions. The main conclusions of this research are: (1) when speech is produced in a quiet environment and in speaker-dependent automatic speech recognition (ASR), the cepstral projection measure significantly improves recognition scores for the three all-pole models considered (for clean reference and noisy test templates), with the best results obtained with the LP analysis (for SNR=5 dB); (2) when speech is produced in a quiet environment and in speaker-dependent and cross-speaker ASR, the optimal filter is a function of the SNR of the test and the reference templates; and (3) when speech is produced in noise and in speaker-dependent ASR, the PLPRPS front-end is the best, and a low model order of the analysis is suitable. >

...read moreread less

34 citations

Proceedings Article•DOI•

Spectral estimation using a log-distance error criterion applied to speech recognition

[...]

D. van Compernolle¹•Institutions (1)

Catholic University of Leuven¹

23 May 1989

TL;DR: The present technique performed better than spectral subtraction in noise immunity experiments on the IBM isolated word speech-recognition system, although at the expense of additional computational requirements.

...read moreread less

Abstract: A novel algorithm is presented for the estimation of a signal in noise. The distortion criterion used is based on the distance between log spectra. In many signal-processing applications, such as speech recognition, log spectra are much closer to the parameters used in a discriminator than power spectra. Therefore, it is believed that this spectral estimation technique should lead to better results than previously developed techniques such as spectral subtraction. The present technique performed better than spectral subtraction in noise immunity experiments on the IBM isolated word speech-recognition system, although at the expense of additional computational requirements. >

...read moreread less

Proceedings Article•DOI•

Stress compensation and noise reduction algorithms for robust speech recognition

[...]

John H. L. Hansen¹, Mark A. Clements¹•Institutions (1)

Duke University¹

23 May 1989

TL;DR: The main contribution is the achievement of robust recognition in diverse environmental conditions through the formulation of a series of speech-enhancement and stress-compensation preprocessing algorithms that produce speech or recognition features less sensitive to varying factors caused by stress and noise.

...read moreread less

Abstract: The problem of speech recognition in noisy, stressful environments is addressed. The main contribution is the achievement of robust recognition in diverse environmental conditions through the formulation of a series of speech-enhancement and stress-compensation preprocessing algorithms. These preprocessors produce speech or recognition features less sensitive to varying factors caused by stress and noise. Results from four recognition scenarios based on such preprocessing are reported. Neutral, stressful, noisy neutral, and noisy stressful speech styles are considered. Noise reduction is based on constrained iterative speech enhancement. Stress compensation algorithms are based on formant location, bandwidth, and intensity. >

...read moreread less

Proceedings Article•DOI•

Car noise processing for speech input

[...]

I. Lecomte¹, M. Lever¹, J. Boudy¹, A. Tassy¹•Institutions (1)

Matra¹

23 May 1989

TL;DR: The authors address the problem of speech enhancement in a car environment by describing the method used to characterize the noisy environment and the main effects of the noise on speech processing for speech recognition and transmission.

...read moreread less

Abstract: The authors address the problem of speech enhancement in a car environment. After describing the method used to characterize the noisy environment, the main effects of the noise on speech processing for speech recognition and transmission are presented. Most of the noise reduction methods proposed in past years have been developed and evaluated in the case of additive white noise. The authors have found that the results obtained by this approach, without modification, are poor. With preprocessing of the noisy signal, improvements are shown, especially for speech recognition. >

...read moreread less

Proceedings Article•DOI•

A class-oriented replacement technique for lost speech packets

[...]

Luiz A. DaSilva¹, D.W. Petr¹, Victor S. Frost¹•Institutions (1)

University of Kansas¹

23 Apr 1989

TL;DR: Results of subjective tests indicate that giving preferential delivery treatment to packets based on class can be used to improve subjective quality.

...read moreread less

Abstract: A replacement technique for lost speech packets is presented. This technique is based on the classification of the packets into four distinct classes: background noise, voiced speech, fricatives, and other. Different encoding schemes and lost packet replacement techniques are used for each class. Results of subjective tests indicate that giving preferential delivery treatment to packets based on class can be used to improve subjective quality. The replacement strategy renders the reconstructed signal indistinguishable from the original utterance for packet loss rates up to 47% for background noise packets, 8% for fricative packets, and 4% for other packets. For voiced packets, the replacement is distinguishable from the original signal even for packet loss rates lower than 5%, but significant improvement may be possible by reducing the memory associated with the voiced speech coding process. >

...read moreread less

Proceedings Article•DOI•

Adaptive code excited linear predictive coder (ACELPC)

[...]

J. Menez, Claude Galand, M. Rosso, Francoise Bottau

23 May 1989

TL;DR: The authors show that they can further reduce the computational complexity and the storage requirements of the coder, while improving the perceptual quality of the reconstructed speech.

...read moreread less

Abstract: Code-excited linear predictive coding (CELP) is a recent vector waveform coding technique which permits the encoding of telephone speech with high quality at very low bit rates. The authors show that they can further reduce the computational complexity and the storage requirements of the coder, while improving the perceptual quality of the reconstructed speech. These improvements are achieved by two key factors: the implementation of a noise-shaping effect by alternate estimation of the short-term predictor coefficients, and the use of a fixed/adaptive codebook together with a long-term predictor. >

...read moreread less

Proceedings Article•DOI•

Evaluation of acoustic correlates of speech under stress for robust speech recognition

[...]

John H. L. Hansen¹•Institutions (1)

Duke University¹

27 Mar 1989

TL;DR: In this paper, an investigation of how speech characteristics change under varying levels of stress with specific application to improving automatic isolated-word speech recognition is presented. But the authors focus on five speech analysis domains: pitch, glottal source, intensity, duration, and vocal tract shaping.

...read moreread less

Abstract: Results are presented of an investigation of how speech characteristics change under varying levels of stress with specific application to improving automatic isolated-word speech recognition. The evaluation focused on five speech analysis domains: pitch, glottal source, intensity, duration, and vocal tract shaping. Goodness-of-fit statistical tests were used to ascertain the significance of parameter variation in each domain. Results from analysis of pitch and glottal source spectrum are presented. The findings suggest that such parameter information can be used reliably to aid in automatic isolated-word speech recognition in noisy stressful environments. >

...read moreread less

Proceedings Article•DOI•

High quality speech expansion, compression, and noise filtering using the sola method of time scale modification

[...]

J.L. Wayman, R.E. Reinke, D.L. Wilson

01 Jan 1989

TL;DR: This paper discugs these three applications of SOLA and the proper system prameters required for each, giving significant improvement in signal-1-noise ratio.

...read moreread less

Abstract: Ahtract The synchronized-overla~ndd (SOLA) method of time scale modification Is a computationally simple, time-domain technique that can be used for: 1) speech expansion. providing improved intelligibility for difficult speech sepmenta; 2 ) speech compression/reexpansion, allowing low hit-rate transmission and storape; and 3) white-noise filtering, giving significant improvement in signal-1-noise ratio. In this paper, w e discugs these three applications of SOLA and the proper system prameters required for each. The paper is accompanied by a demonstration tape.

...read moreread less

Journal Article•DOI•

Comparison of noisy speech enhancement algorithms in terms of LPC perturbation

[...]

M.S. Ahmed¹•Institutions (1)

King Fahd University of Petroleum and Minerals¹

01 Jan 1989-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A number of noisy speech enhancement algorithms are experimentally compared in terms of linear predictive coding (LPC) perturbations to find the most efficient methods for spectral subtraction and oversubtraction.

...read moreread less

Abstract: A number of noisy speech enhancement algorithms are experimentally compared in terms of linear predictive coding (LPC) perturbations. The enhancement algorithms considered are simple spectral subtraction, spectral oversubtraction with use of a spectral floor, spectral subtraction with residual noise removal, and time-domain and frequency-domain adaptive minimum mean-square-error filtering. LPC perturbations considered are LPC cepstral distance, log likelihood ratios, and weighted likelihood ratio. >

...read moreread less

Proceedings Article•DOI•

Speech enhancement based upon hidden Markov modeling

[...]

Yariv Ephraim¹, David Malah¹, Biing-Hwang Juang¹•Institutions (1)

Bell Labs¹

23 May 1989

TL;DR: A maximum a posteriori approach for enhancing speech signals which have been degraded by statistically independent additive noise is proposed, based upon statistical modeling of the clean speech signal and the noise process using long training sequences from the two processes.

...read moreread less

Abstract: A maximum a posteriori approach for enhancing speech signals which have been degraded by statistically independent additive noise is proposed. The approach is based upon statistical modeling of the clean speech signal and the noise process using long training sequences from the two processes. Hidden Markov models (HMMs) with mixtures of Gaussian autoregressive (AR) output probability distributions are used to model the clean speech signal. A low-order Gaussian AR model is used for the wideband Gaussian noise considered here. The parameter set of the HMM is estimated using the Baum or the EM (estimation-maximization) algorithm. The enhancement of the noisy speech is done by means of reestimation of the clean speech waveform using the EM algorithm. An approximate improvement of 4.0-6.0 dB in signal-to-noise ratio (SNR) is achieved at 10 dB input SNR. >

...read moreread less

Proceedings Article•DOI•

Analysis-by-synthesis linear predictive speech coding at 2.4 kbit/s

[...]

F.F. Tzeng

27 Nov 1989

TL;DR: Novel 2.4-kb/s linear predictive speech coders based on the analysis-by-syntheses method are proposed and found that the model which selects the excitation signal from either a random sequence codebook or a pitch synthesizer produces the best perceived quality speech.

...read moreread less

Abstract: Novel 2.4-kb/s linear predictive speech coders based on the analysis-by-syntheses method are proposed. The introduction of a perceptually weighted distortion measure between the original speech and the reconstructed speech implicitly optimizes both the voiced/unvoiced decision and the pitch estimation/tracking. The coders are also shown to be more robust to background acoustic noises. The resultant speech quality is significantly enhanced by judicious parameter coding. Three excitation models are proposed and investigated. It is found that the model which selects the excitation signal from either a random sequence codebook or a pitch synthesizer produces the best perceived quality speech. >

...read moreread less

Proceedings Article•DOI•

Noise reduction for hearing aids

[...]

Harry Levitt¹, M. Weiss¹, Arlene C. Neuman¹, T. Schwander¹, M. Bakke¹, H.-B. Lin¹ - Show less +2 more•Institutions (1)

City University of New York¹

27 Mar 1989

TL;DR: An overview is given of the various forms of signal processing for noise reduction that have been developed for hearing aid applications, including one- and two-microphone systems.

...read moreread less

Abstract: Summary form only given. An overview is given of the various forms of signal processing for noise reduction that have been developed for hearing aid applications. One- and two-microphone systems are discussed. A more advanced form of a two-microphone system is also examined. >

...read moreread less

Journal Article•DOI•

Real-time speech enhancement system using envelope expansion technique

[...]

P.M. Clarkson¹, S. Bahgat¹•Institutions (1)

Illinois Institute of Technology¹

17 Aug 1989-Electronics Letters

TL;DR: In this paper, a real-time system for the enhancement of speech signals degraded by additive broadband noise is described, where the enhancement process employed is a multiband envelope expansion technique derived from a recent speech deverberation method.

...read moreread less

Abstract: A real-time system for the enhancement of speech signals degraded by additive broadband noise is described. The enhancement process employed is a multiband envelope expansion technique derived from a recent speech dereverberation method and the system is implemented on a Texas Instruments TMS32O-C25 processor.

...read moreread less

Journal Article•DOI•

A fully adaptive comb filter for enhancing block-coded speech

[...]

D.E. Veeneman, B. Mazor

01 Jun 1989-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A comb filter that is adapted nonpitch-synchronously has been developed that it is very effective in reducing framing noise and has also proven effective in enhancing speech corrupted by other noise sources.

...read moreread less

Abstract: Framing noise is a distortion present in block-coded speech that is caused by discontinuities at frame boundaries. A comb filter that is adapted nonpitch-synchronously has been developed that it is very effective in reducing this distortion. An optimal, minimum mean-squared-error approach is applied to determine the filter coefficients and adapt them to the short-time statistics of the speech. This filter has also proven effective in enhancing speech corrupted by other noise sources. >

...read moreread less

Proceedings Article•DOI•

Speech enhancement for hearing aids using adaptive beamformers

[...]

A. Farassopulos¹•Institutions (1)

École Normale Supérieure¹

23 May 1989

TL;DR: Two techniques based on adaptive noise canceling to suppress the ambient noise in hearing aids are presented and notches are created in the directivity pattern of the filter when two uncorrelated noise sources are situated at opposite sides of the head.

...read moreread less

Abstract: The author presents two techniques based on adaptive noise canceling to suppress the ambient noise in hearing aids. The first technique uses two microphones and gives very good results in the case of one noise source. If two noise sources are situated at the same side of the head, there is also noise suppression, although less effective than with only one noise source. In the second technique the author creates notches in the directivity pattern of the filter when two uncorrelated noise sources are situated at opposite sides of the head by introducing an adaptive noise canceler using three microphones. The techniques were evaluated in real time using a specially developed board plugged into a personal computer. Preliminary results are presented. >

...read moreread less

Proceedings Article•DOI•

Robust LPC analysis and synthesis using the KL transformation of acoustic subwords spectra

[...]

V.R. Algazi¹, Sanguoon Chung, M.J. Ready, K.L. Brown•Institutions (1)

University of California, Davis¹

23 May 1989

TL;DR: A novel approach to the modeling and estimation of the speech spectral envelope over acoustic subwords that exhibits robust performance in noise is proposed and provides a considerable speech quality improvement over other methods.

...read moreread less

Abstract: The authors propose a novel approach to the modeling and estimation of the speech spectral envelope over acoustic subwords that exhibits robust performance in noise. The technique exploits the underlying signal structure of speech to improve parameter estimates, and it uses the perceptual properties of hearing to decrease the computational requirements in a perceptually meaningful way. The approach provides a considerable speech quality improvement over other methods. >

...read moreread less

Proceedings Article•DOI•

Use of objective speech quality measures in selecting effective spectral estimation techniques for speech enhancement

[...]

John H. L. Hansen¹, Mark A. Clements¹•Institutions (1)

Duke University¹

14 Aug 1989

TL;DR: Constrained iterative enhancement resulted in the highest quality improvement for a white Gaussian distortion in seven iterations and over a large number of colored noise conditions the best objective qualities resulted when Bartlett and maximum-entropy spectra were used in the enhancement algorithms.

...read moreread less

Abstract: Objective quality measures are used to determine optimal enhancement performance versus variation parameter settings for the enhancement algorithms. Issues addressed include: (1) the amount of magnitude averaging needed for highest speech quality using spectral subtracting; (2) choice of spectral estimation technique for colored noise characterization; and (3) terminating criterion for best speech quality for short-time Wiener filtering and constrained iterative enhancement algorithms. Additive white Gaussian noise and slowly varying colored aircraft cockpit noise are considered in the evaluation. Results show that constrained iterative enhancement resulted in the highest quality improvement for a white Gaussian distortion in seven iterations. Over a large number of colored noise conditions the best objective qualities resulted when Bartlett and maximum-entropy spectra were used in the enhancement algorithms. For aircraft cockpit noise, further improvement was observed when noise characterization was updated more frequently. >

...read moreread less

Proceedings Article•DOI•

An LPC-based spectral similarity measure for speech recognition in the presence of co-channel speech interference

[...]

Gary E. Kopec¹, M.A. Bush¹•Institutions (1)

PARC¹

23 May 1989

TL;DR: The authors present an alternative to the enhancement paradigm for cochannel speech recognition, in which target-interference separation and target recognition occur simultaneously, driven by a model of the recognition vocabulary, based on an LPC spectral similarity measure.

...read moreread less

Abstract: The authors present an alternative to the enhancement paradigm for cochannel speech recognition, in which target-interference separation and target recognition occur simultaneously, driven by a model of the recognition vocabulary. The method is based on an LPC (linear predictive coding) spectral similarity measure which allows a reference spectrum to match only a subset of the poles of a noisy input spectrum, rather than requiring a whole-spectrum comparison. A preliminary evaluation of the proposed method in a speaker-trained isolated-digit recognition task suggests a reduction in error rate of 50-70% at low target-interference ratios, as compared to a conventional whole-spectrum similarity measure. >

...read moreread less

Proceedings Article•DOI•

Robust cepstral based pitch determination

[...]

Mark Andrews, R.D. DeGroat, J. Picone

01 Jan 1989

TL;DR: A noisy cepsual signal model for speech processing and two Singular Value Decomposition as SVD based approaches which greatly enhance cepstral based pitch estimation performance in noisy environments are proposed.

...read moreread less

Abstract: Visual Information Technologies 7Afin l n m r T)r Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas >_"" -."" -I. Plano, Texas 75075 Richardson,-Texas 75083-0688 (214) 985-2267 (214) 690-2894 degroat@utdallas.edu Joseph Picone Speech and Image Understanding Laboratory Texas Insmments Inc. P.O. Box 655474 MS 238 Dallas, Texas 75265 (214) 995-6627 The FIT based cepstral method of human speech pitch (or fundamental frequency) determination is known to be accurate and reliable in studio quality environments, however, it leaves much to be desired at lower signal to noise ratios. Cepstral pitch determination techniques, which are a special case of the more general t h q of homomorphic signal processing. rely on the log operation to deconvolve the pitch sequence from the vocal uact response sequence. Classical cepstral processing modcls do not account for noise added to the signal. In this paper, we develop a noisy cepsual signal model for speech processing and we propose two Singular Value Decomposition (SVD) based approaches which greatly enhance cepstral based pitch estimation performance in noisy environments. Speech Production and Cepstral Pitch Determination Voiced speech pmduction can be modeled reasonably well as a pseudo pulse uain (pitch sequence) convolved with a linear system (vocal tract impulse response). Speech is considered wide sense stationary over short time segments (20 40 msec) [I] which makes analysis possible over short time windows (M frames). We assume that the r-domain description of the speech signal is modeled by [21, [31 S (2) = H ( 2 ) P ( 2 ) (1) where H ( z ) is the 2-transform of the vocal tract response sequence and P ( z ) is the r-transform of the pitch sequence. Analytical expressions for H ( z ) and P(r ) may be found in [2] or 131. We may use homomorphic filtering techniques to separate the multiplicative rclationship in ( I ) using the complex log operation thereby causing the pitch cepsmm and the vocal tract response cepsmm to occupy approximately disjoint quefrency spaces [2], [4]. Practical implementations of cepstral pitch determination.may be obtained from 141 in which it is shown that the Inverse FFT of the log of the magnitude of the FIT provides us with the real version of the quefrency. The connections between the complex cepsmm and the real cepsmm (usually denoted by just cepsmm) an shown in [21 and [31. The Noise Fmblem It is easy to see that homomorphic filtering (cepstral) techniques will not offer good performance in noise. Returning to ( I ) and taking the complex log operation, we find that log [S ( z ) ] = log [H (2) P (r)] = log [H (211 +log [P (.)I. (2) The separation of S(z) into its constituent parts works out very neatly assuming that no noise is added to the system. On the other band, if noise is added to the system, we obtain log [S (z ) + N ( z ) ] = log[H ( z ) P ( z ) + N (z)] . (3) A Cepstral Model for Speech Signals in Noise Manipulating (3) yields a noisy cepsual signal model . . which clearly exposes the desired signal component in the fist term of the right-hand side. We shall find great utility in going to vector and matrix notation at this point following a discretization of equation (4). ?he appropriate discrete Fourier uansform (DIT') equivalent of (4) is 1% [ H ( k ) P ( k ) + N (k)l = 1% [H (k) P ( k ) l where k = 0, ..., M 1 is the discrete normalized frequency variahle. We shall also stay consistent with the notation found in [21 andJ31 for representing the log of a general function, X ( k ) , as X (k). Thus, we represent (5 ) in vector form as P = Z+ log [l + D-'n] (6) 744 23ACSSC-12/89/0744 $1.00 Q 1989 MAPLE PRESS

...read moreread less

Proceedings Article•DOI•

Adaptive cancellation of adjacent channel interference

[...]

B.A. Hedstrom¹, R.L. Kirlin¹, Peter F. Driessen¹•Institutions (1)

Victoria University, Australia¹

01 Jun 1989

TL;DR: A new method to combat the adjacent channel interference (ACI) encountered with multichannel receivers is presented which uses a priori knowledge of the crosstalk gained by the injection of a known signal into either the input of the receiver or into the output of a transmitter, whichever is appropriate.

...read moreread less

Abstract: A new method to combat the adjacent channel interference (ACI) encountered with multichannel receivers is presented. In the case of a two-channel receiver, components of the desired signal may become present in the reference to the interference rendering traditional adaptive noise cancellation ineffective. This situation is similar to two-microphone systems used for speech enhancement. The crosstalk resistant adaptive noise canceller (CTRANC), designed to work in the two-microphone case, is unable to provide any signal improvement for the ACI model. A new system is proposed which uses a priori knowledge of the crosstalk gained by the injection of a known signal into either the input of the receiver or into the output of a transmitter, whichever is appropriate. This injection system is then compared with the CTRANC and other methods by means of extensive computer simulation. >

...read moreread less

Proceedings Article•DOI•

Noise-immune multisensor speech input: formal subjective testing in operational conditions

[...]

V. Viswanathan, C. Henry

23 May 1989

TL;DR: Results from formal speech intelligibility and quality tests in simulated fighter aircraft noise show that the exploratory development model (EDM) provides consistently both higher intelligibilityand quality, as well as more robust performance under all tested operating conditions, than the microphone alone.

...read moreread less

Abstract: Multisensor speech input was evaluated in operational conditions including an oxygen facemask, different noise conditions, and use with two speech coders and a speech-enhancement system. The authors developed a neckstrap for mounting accelerometers. They selected the best multisensor system and fabricated an exploratory development model (EDM). Results from formal speech intelligibility and quality tests in simulated fighter aircraft noise show that the EDM provides consistently both higher intelligibility and quality, as well as more robust performance under all tested operating conditions, than the microphone alone. >

...read moreread less