scispace - formally typeset
Search or ask a question

Showing papers by "Richard P. Lippmann published in 1997"


Journal ArticleDOI
TL;DR: Comparisons suggest that the human-machine performance gap can be reduced by basic research on improving low-level acoustic-phonetic modeling, on improving robustness with noise and channel variability, and on more accurately modeling spontaneous speech.

606 citations


Proceedings Article
01 Jan 1997
TL;DR: A new and simple approach to compensate for speech recognizers degradations is presented which uses mel-filter-bank (MFB) magnitudes as input features and missing feature theory to dynamically modify the probability computations performed in Hidden Markov Model recognizers.
Abstract: Speech recognizers trained with quiet wide-band speech degrade dramatically with high-pass, low-pass, and notch filtering, with noise, and with interruptions of the speech input. A new and simple approach to compensate for these degradations is presented which uses mel-filter-bank (MFB) magnitudes as input features and missing feature theory to dynamically modify the probability computations performed in Hidden Markov Model recognizers. When the identity of features missing due to filtering or masking is provided, recognition accuracy on a large talker-independent digit recognition task often rises from below 50% to above 95%. These promising results suggest future work to continuously estimate SNR's within MFB bands for dynamic adaptation of speech recognizers.

98 citations


Journal ArticleDOI
TL;DR: A committee classifier combining the best neural network and logistic regression provided the best model calibration, but the receiver operating characteristic curve area was only 76% irrespective of which predictive model was used.

88 citations


Journal ArticleDOI
TL;DR: The implementation of a hidden Markov model state decoding system, a component for a wordspotting speech recognition system, and the mapping of the discrete-time state decoding algorithm into the continuous domain are described.
Abstract: We describe the implementation of a hidden Markov model state decoding system, a component for a wordspotting speech recognition system. The key specification for this state decoder design is microwatt power dissipation: this requirement led to a continuous-time, analog circuit implementation. We describe the tradeoffs inherent in the choice of an analog design and explain the mapping of the discrete-time state decoding algorithm into the continuous domain. We characterize the operation of a ten-word (81-state) state decoder test chip.

21 citations


Proceedings ArticleDOI
04 Apr 1997
TL;DR: An approach to compensate for variable unknown sharp filtering and noise is presented which uses mel-filter-bank magnitudes as input features, estimates the signal-to-noise ratio (SNR) for each filter, and uses missing feature theory to dynamically modify the probability computations performed using Gaussian Mixture or Radial Basis Function neural network classifiers embedded within Hidden Markov Model recognizers.
Abstract: Despite dramatic recent advances in speech recognition technology, speech recogmzers still perform muchworse than humans The difference in performance between humans and machines is most dramatic whenvariable amounts and types of filtering and noise are present during testing For example, humans readilyunderstand speech that is low-pass filtered below 3 kHz or high-pass filtered above 1 kHz Machines trainedwith wide-band speech, however, degrade dramatically under these conditions An approach to compensatefor variable unknown sharp filtering and noise is presented which uses mel-filter-bank magnitudes as inputfeatures, estimates the signal-to-noise ratio (SNR) for each filter, and uses missing feature theory todynamically modify the probability computations performed using Gaussian Mixture or Radial BasisFunction neural network classifiers embedded within Hidden Markov Model (1-1MM) recognizers Theapproach was successfully demonstrated using a talker-independent digit recognition task It was found thatrecognition accuracy across many conditions rises from below 50%toabove 95% with this approach Thesepromising results suggest future work to dynamically estimate SNIR's and to explore the dynamics of humanadaptation to channel and noise variabilityKeywords: speech recognition, speech perception, missing features, filtering, noise, robust, neural network

14 citations


01 Jan 1997
TL;DR: A new and simple approach to compensaite for speech recognizers which uses mel-filter-bank (MFB) magnitudes as input features and missing feature theory to dynamically modify the probability computations performed in Hidden Markov Model recognizers.
Abstract: Speech recognizers trained with quiet wide-band speech degrade dramatically with high-pass, low-pass, and notch filtering, with noise, and with interruptions of the speech input. A new and simple approach to compensaite for these degradations is presented which uses mel-filter-bank (MFB) magnitudes as input features and missing feature theory to dynamically modify the probability computations performed in Hidden Markov Model recognizers. When the identity of features missing due to filtering or masking is provided, recognition accuracy on a large talker-independent digit recognition task often rises from below 50% to above 95%. These promising results suggest future work to continuously estimate SPiR's within MFB bands for dynamic adaptation of speech recognizers.

7 citations


Journal ArticleDOI
TL;DR: A high-performance low-complexity neural network wordspotter was developed using radial basis function (RBF) neutral networks in a hidden Markov model (HMM) framework and two new complementary approaches substantially improve performance on the talker-independent Switchboard corpus.
Abstract: A high-performance low-complexity neural network wordspotter was developed using radial basis function (RBF) neutral networks in a hidden Markov model (HMM) framework. Two new complementary approaches substantially improve performance on the talker-independent Switchboard corpus. Figure of merit (FOM) training adapts wordspotter parameters to directly improve the FOM performance metric, and voice transformations generate additional training examples by warping the spectra of training data to mimic across-talker vocal tract length variability.

2 citations