scispace - formally typeset
Search or ask a question

Showing papers by "Richard P. Lippmann published in 1995"


Journal ArticleDOI
TL;DR: MFB cepstra significantly outperform LPC cepstral under noisy conditions and techniques using an optimal linear combination of features for data reduction were evaluated.
Abstract: This paper compares the word error rate of a speech recognizer using several signal processing front ends based on auditory properties. Front ends were compared with a control mel filter bank (MFB) based cepstral front end in clean speech and with speech degraded by noise and spectral variability, using the TI-105 isolated word database. MFB recognition error rates ranged from 0.5 to 26.9% in noise, depending on the SNR, and auditory models provided error rates as much as four percentage points lower. With speech degraded by linear filtering, MFB error rates ranged from 0.5 to 3.1%, and the reduction in error rates provided by auditory models was less than 0.5 percentage points. Some earlier studies that demonstrated considerably more improvement with auditory models used linear predictive coding (LPC) based control front ends. This paper shows that MFB cepstra significantly outperform LPC cepstra under noisy conditions. Techniques using an optimal linear combination of features for data reduction were also evaluated. >

133 citations


Dissertation
01 Jan 1995
TL;DR: This thesis addresses the problem of limited training data in pattern detection problems where a small number of target classes must be detected in a varied background and voice transformation techniques are used to generate more training examples that improve the robustness of the spotting system.
Abstract: This thesis addresses the problem of limited training data in pattern detection problems where a small number of target classes must be detected in a varied background. There is typically limited training data and limited knowledge about class distributions in this type of spotting problem and in this case a statistical pattern classifier can not accurately model class distributions. The domain of wordspotting is used to explore new approaches that improve spotting system performance with limited training data. First, a high performance, state-of-the-art whole-word based wordspotter is developed. Two complementary approaches are then introduced to help compensate for the lack of data. Figure of Merit training, a new type of discriminative training algorithm, modifies the spotting system parameters according to the metric used to evaluate wordspotting systems. The effectiveness of discriminative training approaches may be limited due to overtraining a classifier on insufficient training data. While the classifier's performance on the training data improves, the classifier's performance on unseen test data degrades. To alleviate this problem, voice transformation techniques are used to generate more training examples that improve the robustness of the spotting system. The wordspotter is trained and tested on the Switchboard credit-card database, a database of spontaneous conversations recorded over the telephone. The baseline wordspotter achieves a Figure of Merit of 62.5% on a testing set. With Figure of Merit training, the Figure of Merit improves to 65.8%. When Figure of Merit training and voice transformations are used together, the Figure of Merit improves to 71.9%. The final wordspotter system achieves a Figure of Merit of 64.2% on the National Institute of Standards and Technology (NIST) September 1992 official benchmark, surpassing the 1992 results from other whole-word based wordspotting systems. Thesis Co-Supervisor: Richard P. Lippmann Title: Senior Technical Staff Thesis Co-Supervisor: David H. Staelin Title: Professor of Electrical Engineering

9 citations


Proceedings ArticleDOI
06 Apr 1995
TL;DR: Experiments demonstrate that sigmoid multilayer perceptron (MLP) networks provide slightly better risk prediction than conventional logistic regression and Bayesian models when used to predict the risk of death using a data base of 41,385 patients who underwent coronary artery bypass operations in 1993.
Abstract: Experiments demonstrate that sigmoid multilayer perceptron (MLP) networks provide slightly better risk prediction than conventional logistic regression and Bayesian models when used to predict the risk of death using a data base of 41,385 patients who underwent coronary artery bypass operations in 1993. MLP networks with no hidden layers (single-layer MLPs), networks with one hidden layer (two-layer MLPs), and networks with two hidden layers (three-layer MLPs) were trained using stochastic gradient descent with early stopping. All prediction techniques used the same input features and were evaluated by training on 20,698 patients and testing on a separate 20,687 patients. Receiver operating characteristic (ROC) curve areas for predicting mortality were roughly 75% for all classifiers. Risk stratification or accuracy of posterior probability prediction was slightly better with three-layer MLP networks which did not inflate risk for high-risk patients. Simple approaches were developed to calculate effective odds ratios for MLP networks and to generate confidence intervals for MLP risk predictions using an auxiliary `confidence MLP.' The confidence MLP is trained to reproduce confidence intervals that were generated during training using the outputs of 50 MLP networks trained with different bootstrap samples. © (1995) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

4 citations