Showing papers on "Word error rate published in 1989"

PDF

Open Access

Proceedings Article•

Handwritten Digit Recognition with a Back-Propagation Network

[...]

Yann LeCun¹, Bernhard E. Boser², John S. Denker³, John S. Denker², D. Henderson¹, Richard Howard², W. Hubbard², Lawrence D. Jackel¹ - Show less +4 more•Institutions (3)

Bell Labs¹, Alcatel-Lucent², AT&T³

01 Jan 1989

TL;DR: Minimal preprocessing of the data was required, but architecture of the network was highly constrained and specifically designed for the task, and has 1% error rate and about a 9% reject rate on zipcode digits provided by the U.S. Postal Service.

...read moreread less

Abstract: We present an application of back-propagation networks to handwritten digit recognition. Minimal preprocessing of the data was required, but architecture of the network was highly constrained and specifically designed for the task. The input of the network consists of normalized images of isolated digits. The method has 1% error rate and about a 9% reject rate on zipcode digits provided by the U.S. Postal Service.

...read moreread less

3,324 citations

Proceedings Article•DOI•

Some statistical issues in the comparison of speech recognition algorithms

[...]

Laurence S. Gillick, Stephen Cox

23 May 1989

TL;DR: The authors present two simple tests for deciding whether the difference in error rates between two algorithms tested on the same data set is statistically significant.

...read moreread less

Abstract: The authors present two simple tests for deciding whether the difference in error rates between two algorithms tested on the same data set is statistically significant. The first (McNemar's test) requires the errors made by an algorithm to be independent events and is found to be most appropriate for isolated-word algorithms. The second (a matched-pairs test) can be used even when errors are not independent events and is more appropriate for connected speech. >

...read moreread less

715 citations

Journal Article•DOI•

Accessing Spoken Words: The Importance of Word Onsets

[...]

William D. Marslen-Wilson, Pienie Zwitserlood

01 Aug 1989-Journal of Experimental Psychology: Human Perception and Performance

TL;DR: In this paper, a cross-modal priming technique was used to investigate the extent to which a rhyme prime (a prime that differs only in its first segment from the word that is semantically associated with the visual probe) is as effective a prime as the original word itself.

...read moreread less

Abstract: Approaches to spoken word recognition differ in the importance they assign to word onsets during lexical access. This research contrasted the hypothesis that lexical access is strongly directional with the hypothesis that word onsets are less important than the overall goodness of fit between input and lexical form. A cross-modal priming technique was used to investigate the extent to which a rhyme prime (a prime that differs only in its first segment from the word that is semantically associated with the visual probe) is as effective a prime as the original word itself. Earlier research had shown that partial primes that matched from word onset were very effective cross-modal primes. The present results show that, irrespective of whether the rhyme prime was a real word or not, and irrespective of the amount of overlap between the rhyme prime and the original word, the rhymes are much less effective primes than the full word. In fact, no overall priming effect could be detected at all except under conditions in which the competitor environment was very sparse. This suggests that word onsets do have a special status in the lexical access of spoken words. A fundamental property of the speech signal is its intrinsic directionality in time. Spoken utterances are spread out along the time line, moving necessarily from beginning to end, in a way that is not true of written language. This directionality of the speech input is strongly reflected in the claims made by the cohort model of spoken word recognition for the manner in which speech inputs are mapped onto the representations of word forms in the mental lexicon (Marslen-Wilson, 1984, 1987; Marslen-Wilson & Tyler, 1980; Marslen-Wilson & Welsh, 1978; Tyler, 1984; Tyler & Wessels, 1983; Warren & Marslen-Wilson, 1987). The cohort model of word recognition stresses the sequential and continuous nature of the mapping between the speech input and mental representations of word forms. This emphasis is closely tied up with the concept of a cohort and its implications for the properties of the on-line lexical decision space. In particular, according to the cohort model, the decision space is determined by the beginnings of words. The speech input at the beginning of the word maps onto all lexical items that share the same initial sequence. This initial set of candidates is termed the word-initial cohort, and the subsequent process of word recognition is determined by the

...read moreread less

372 citations

Journal Article•DOI•

On the role of competing word units in visual word recognition: the neighborhood frequency effect.

[...]

Jonathan Grainger, J. Kevin O'Regan, Arthur M. Jacobs, Juan Segui

01 May 1989-Attention Perception & Psychophysics

TL;DR: The data indicate that the presence in the neighborhood of at least one unit of higher frequency than the stimulus word itself results in interference in stimulus word processing.

...read moreread less

Abstract: Most current models of visual word recognition assume that the recognition process may be subdivided into at least three phases, which are referred to here as candidate generation, candidate selection, and conscious identifica tion. In the candidate generation phase, the word input contacts a number of orthographically similar lexical representations in memory. Using information from the continued sensory analysis and any contextual informa tion available, one of these candidates is selected for con scious identification. This general conception of the word recognition process adopts various precise forms in models such as Becker's (1976) verification model; Forster's (1976) search model; McClelland and Rumel hart's (1981) interactive-activation model; Morton's (1970) logogen model; Norris's (1986) checking model; and Paap, Newsome, McDonald, and Schvaneveldt's (1982) activation-verification model. These different models place different constraints on the individual oper ation of these subprocesses and the ways they interact, but the models all assume that both sublexical and whole word units (other than the stimulus word itself) are in volved in visual word recognition. Research concerned with the role of whole-word units in word recognition has concentrated largely on establish ing effects due to various parameters of the stimulus word itself (e.g., printed frequency, repetition, orthographic and

...read moreread less

315 citations

Patent•DOI•

Method for representing word models for use in speech recognition

[...]

Laurence S. Gillick, Dean Sturtevant, Robert Roth, James K. Baker, Janet M. Baker - Show less +1 more

23 Mar 1989-Journal of the Acoustical Society of America

TL;DR: In this paper, a method for deriving acoustic word representations for use in speech recognition is presented, which involves using dynamic programming to derive a corresponding initial sequence of probabilistic acoustic sub-models for the word independently of any previously derived acoustic model particular to the word.

...read moreread less

Abstract: A method is provided for deriving acoustic word representations for use in speech recognition Initial word models are created, each formed of a sequence of acoustic sub-models The acoustic sub-models from a plurality of word models are clustered, so as to group acoustically similar sub-models from different words, using, for example, the Kullback-Leibler information as a metric of similarity Then each word is represented by cluster spelling representing the clusters into which its acoustic sub-models were placed by the clustering Speech recognition is performed by comparing sequences of frames from speech to be recognized against sequences of acoustic models associated with the clusters of the cluster spelling of individual word models The invention also provides a method for deriving a word representation which involves receiving a first set of frame sequences for a word, using dynamic programming to derive a corresponding initial sequence of probabilistic acoustic sub-models for the word independently of any previously derived acoustic model particular to the word, using dynamic programming to time align each of a second set of frame sequences for the word into a succession of new sub-sequences corresponding to the initial sequence of models, and using these new sub-sequences to calculate new probabilistic sub-models

...read moreread less

257 citations

Proceedings Article•DOI•

Large vocabulary natural language continuous speech recognition

[...]

Lalit R. Bahl¹, Raimo Bakis¹, Jerome R. Bellegarda¹, Peter Fitzhugh Brown¹, David Burshtein¹, Subhro Das¹, P.V. de Souza¹, Ponani S. Gopalakrishnan¹, Frederick Jelinek¹, Dimitri Kanevsky¹, Robert Leroy Mercer¹, A. Nadas¹, David Nahamoo¹, Michael Picheny¹ - Show less +10 more•Institutions (1)

IBM¹

23 May 1989

TL;DR: A description is presented of the authors' current research on automatic speech recognition of continuously read sentences from a naturally-occurring corpus: office correspondence, which combines features from their current isolated-word recognition system and from their previously developed continuous-speech recognition system.

...read moreread less

Abstract: A description is presented of the authors' current research on automatic speech recognition of continuously read sentences from a naturally-occurring corpus: office correspondence. The recognition system combines features from their current isolated-word recognition system and from their previously developed continuous-speech recognition system. It consists of an acoustic processor, an acoustic channel model, a language model, and a linguistic decoder. Some new features in the recognizer relative to the isolated-word speech recognition system include the use of a fast match to prune rapidly to a manageable number the candidates considered by the detailed match, multiple pronunciations of all function words, and modeling of interphone coarticulatory behavior. The authors recorded training and test data from a set of ten male talkers. The perplexity of the test sentences was found to be 93; none of sentences was part of the data used to generate the language model. Preliminary (speaker-dependent) recognition results on these talkers yielded an average word error rate of 11.0%. >

...read moreread less

251 citations

Journal Article•DOI•

A frame-synchronous network search algorithm for connected word recognition

[...]

Chin-Hui Lee¹, Lawrence R. Rabiner¹•Institutions (1)

Bell Labs¹

01 Nov 1989-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A description is given of an implementation of a novel frame-synchronous network search algorithm for recognizing continuous speech as a connected sequence of words according to a specified grammar that is inherently based on hidden Markov model (HMM) representations.

...read moreread less

Abstract: A description is given of an implementation of a novel frame-synchronous network search algorithm for recognizing continuous speech as a connected sequence of words according to a specified grammar. The algorithm, which has all the features of earlier methods, is inherently based on hidden Markov model (HMM) representations and is described in an easily understood, easily programmable manner. The new features of the algorithm include the capability of recording and determining (unique) word sequences corresponding to the several best paths to each grammar node, and the capability of efficiently incorporating a range of word and state duration scoring techniques directly into the forward search of the algorithm, thereby eliminating the need for a postprocessor as in previous implementations. It is also simple and straightforward to incorporate deterministic word transition rules and statistical constraints (probabilities) from a language model into the forward search of the algorithm. >

...read moreread less

131 citations

Proceedings Article•DOI•

The Lincoln robust continuous speech recognizer

[...]

D.B. Paul¹•Institutions (1)

Massachusetts Institute of Technology¹

23 May 1989

TL;DR: The Lincoln stress-resistant HMM (hidden Markov model) CSR has been extended to large-vocabulary continuous speech for both speaker-dependent (SD) and speaker-independent (SI) tasks.

...read moreread less

Abstract: The Lincoln stress-resistant HMM (hidden Markov model) CSR has been extended to large-vocabulary continuous speech for both speaker-dependent (SD) and speaker-independent (SI) tasks. Performance on the DARPA resource management task (991-word vocabulary, perplexity 60 word-pair grammar) is 3.5% word error rate for SD training of word-context-dependent triphone models and 12.6% word error rate for SI training of (word-context-free) tied-mixture triphone models. >

...read moreread less

77 citations

Journal Article•DOI•

A comparison between criterion functions for linear classifiers, with an application to neural nets

[...]

Etienne Barnard¹, David P. Casasent¹•Institutions (1)

Carnegie Mellon University¹

01 Sep 1989

TL;DR: The error rates of linear classifiers that utilize various criterion functions are investigated for the case of two normal distributions with different variances and a priori probabilities, finding that the classifier based on the least mean squares criterion often performs considerably worse than the Bayes rate.

...read moreread less

Abstract: The error rates of linear classifiers that utilize various criterion functions are investigated for the case of two normal distributions with different variances and a priori probabilities. It is found that the classifier based on the least mean squares (LMS) criterion often performs considerably worse than the Bayes rate. The perceptron criterion (with suitable safety margin) and the linearized sigmoid generally lead to lower error rates than the LMS criterion, with the sigmoid usually the better of the two. Also investigated are the exceptions to the general trends: only if one class is known to have much larger a priori probability or variance than the other should one expect the LMS or perceptron criteria to be slightly preferable as far as error rate is concerned. The analysis is related to the performance of the back-propagation (BP) classifier, giving some understanding of the success of BP. A neural-net classifier, the adaptive-clustering classifier, suggested by this analysis is compared with BP (modified by using a conjugate-gradient optimization technique) for two problems. It is found that BP usually takes significantly longer to train than the adaptive-clustering technique. >

...read moreread less

75 citations

Proceedings Article•

Practical Characteristics of Neural Network and Conventional Pattern Classifiers on Artificial and Speech Problems

[...]

Yuchun Lee, Richard P. Lippmann¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 1989

TL;DR: The results suggest that classifier selection should often depend more heavily on practical considerations concerning memory and computation resources, and restrictions on training and classification times than on error rate.

...read moreread less

Abstract: Eight neural net and conventional pattern classifiers (Bayesian-unimodal Gaussian, k-nearest neighbor, standard back-propagation, adaptive-stepsize back-propagation, hypersphere, feature-map, learning vector quantizer, and binary decision tree) were implemented on a serial computer and compared using two speech recognition and two artificial tasks. Error rates were statistically equivalent on almost all tasks, but classifiers differed by orders of magnitude in memory requirements, training time, classification time, and ease of adaptivity. Nearest-neighbor classifiers trained rapidly but required the most memory. Tree classifiers provided rapid classification but were complex to adapt. Back-propagation classifiers typically required long training times and had intermediate memory requirements. These results suggest that classifier selection should often depend more heavily on practical considerations concerning memory and computation resources, and restrictions on training and classification times than on error rate.

...read moreread less

64 citations

Journal Article•DOI•

The modified Kanerva model for automatic speech recognition

[...]

Richard W. Prager¹, Frank Fallside¹•Institutions (1)

University of Cambridge¹

01 Jan 1989-Computer Speech & Language

TL;DR: This paper describes how the design for the modified Kanerva model is derived from Kanerva's original theory, and develops a method to deal with the time varying nature of the speech signal by recognizing static patterns together with a fixed quantity of contextual information.

...read moreread less

Journal Article•DOI•

Signature recognition through spectral analysis

[...]

Chan F. Lam¹, David Kamins¹•Institutions (1)

Medical University of South Carolina¹

01 Jan 1989-Pattern Recognition

TL;DR: A fast Fourier transform is used to transform normalized signatures into the frequency domain and results in an error rate of 2.5 per cent with the generally more conservative jackknife procedure yielding the same small error rate.

...read moreread less

Proceedings Article•DOI•

Improved automatic language identification in noisy speech

[...]

F.J. Goodman, A.F. Martin, R.E. Wohlford

23 May 1989

TL;DR: In this paper, a linear-predictive-coding-based formant extraction method was used to improve the accuracy of the automatic language identification algorithm, reducing the error rate by more than 50%.

...read moreread less

Abstract: A description is given of enhancements to an automatic language identification algorithm previously reported. The algorithm, based on linear-predictive-coding-based formant extraction, was greatly improved, reducing the error rate by more than 50%. This performance was achieved on a large (>9 h), very noisy, six-language database, using trials of less than 10 s. Experiments that improved performance are described, including tests of various distance metrics, expanded and modified parameter sets, and a new voicing statistic. Final performance results were obtained as a function of time, signal-to-noise ratio, and no-decision rate. A new rejection capability was developed to address the open-set identification problem. >

...read moreread less

Proceedings Article•DOI•

Word recognition using whole word and subword models

[...]

Chin-Hui Lee¹, Biing-Hwang Juang¹, F.K. Soong¹, Lawrence R. Rabiner¹•Institutions (1)

Bell Labs¹

23 May 1989

TL;DR: A unified framework is discussed which can be used to accomplish the goal of creating effective basic models of speech and points out the relative advantages of each type of speech unit based on the results of a series of recognition experiments.

...read moreread less

Abstract: The problem of how to select and construct a set of fundamental unit statistical models suitable for speech recognition is addressed. A unified framework is discussed which can be used to accomplish the goal of creating effective basic models of speech. The performances of three types of fundamental units, namely whole word, phoneme-like, and acoustic segment units, in a 1109-word vocabulary speech recognition task are compared. The authors point out the relative advantages of each type of speech unit based on the results of a series of recognition experiments. >

...read moreread less

Journal Article•DOI•

Unsupervised speaker adaptation based on hierarchical spectral clustering

[...]

Sadaoki Furui

01 Dec 1989-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: An automatic speaker adaptation algorithm for speech recognition, in which a small amount of training material of unspecified text can be used, which reduces the mean word recognition error rate from 4.9 to 2.9%.

...read moreread less

Abstract: The author proposes an automatic speaker adaptation algorithm for speech recognition, in which a small amount of training material of unspecified text can be used. The algorithm is easily applied to vector-quantization- (VQ) speech recognition systems consisting of a VQ codebook and a word dictionary in which each word is represented as a sequence of codebook entries. In the adaptation algorithm, the VQ codebook is modified for each new speaker, whereas the word dictionary is universally used for all speakers. The important feature of this algorithm is that a set of spectra in training frames and the codebook entries are clustered hierarchically. Based on the vectors representing deviation between centroids of the training frame clusters and the corresponding codebook clusters, adaptation is performed hierarchically from small to large numbers of clusters. The spectral resolution of the adaptation process is improved accordingly. Results of recognition experiments using utterances of 100 Japanese city names show that adaptation reduces the mean word recognition error rate from 4.9 to 2.9%. Since the error rate for speaker-dependent recognition is 2.2%, the adaptation method is highly effective. >

...read moreread less

Proceedings Article•DOI•

Automatic detection of new words in a large vocabulary continuous speech recognition system

[...]

A. Asadi¹, Richard Schwartz, John Makhoul•Institutions (1)

Northeastern University¹

15 Oct 1989

TL;DR: A preliminary investigation of techniques that automatically detect when the speaker has used a word that is not in the vocabulary, and develops a technique that uses a general model for the acoustics of any word to recognize the existence of new words.

...read moreread less

Abstract: In practical large vocabulary speech recognition systems, it is nearly impossible for a speaker to remember which words are in the vocabulary. The probability of the speaker using words outside the vocabulary can be quite high. For the case when a speaker uses a new word, current systems will always' recognize other words within the vocabulary in place of the new word, and the speaker wouldn't know what the problem is.In this paper, we describe a preliminary investigation of techniques that automatically detect when the speaker has used a word that is not in the vocabulary. We developed a technique that uses a general model for the acoustics of any word to recognize the existence of new words. Using this general word model, we measure the correct detection of new words versus the false alarm rate.Experiments were run using the DARPA 1000-word Resource Management Database for continuous speech recognition. The recognition system used is the BBN BYBLOS continuous speech recognition system (Chow et al., 1987). The preliminary results indicate a detection rate of 74% with a false alarm rate of 3.4%.

...read moreread less

Proceedings Article•DOI•

How limited training data can allow a neural network to outperform an 'optimal' statistical classifier

[...]

L.T. Niles¹, Harvey F. Silverman¹, G.N. Tajchman, M.A. Bush•Institutions (1)

Brown University¹

23 May 1989

TL;DR: It is concluded that there are pattern classification tasks in which an ANN is able to make better use of training data to achieve a lower error rate with a particular size training set.

...read moreread less

Abstract: Experiments comparing artificial neural network (ANN), k-nearest-neighbor (KNN), and Bayes' rule with Gaussian distributions and maximum-likelihood estimation (BGM) classifiers were performed. Classifier error rate as a function of training set size was tested for synthetic data drawn from several different probability distributions. In cases where the true distributions were poorly modeled, ANN was significantly better than BGM. In some cases, ANN was also better than KNN. Similar experiments were performed on a voiced/unvoiced speech classification task. ANN had a lower error rate than KNN or BGM for all training set sizes, although BGM approached the ANN error rate as the training set became larger. It is concluded that there are pattern classification tasks in which an ANN is able to make better use of training data to achieve a lower error rate with a particular size training set. >

...read moreread less

Proceedings Article•DOI•

Phoneme environment clustering for speech recognition

[...]

Shigeki Sagayama

23 May 1989

TL;DR: A phoneme environment clustering algorithm, which automatically selects an optimal set of allophones and estimates the missing context, is presented, which additionally gives the means to analyze coarticulation effects automatically and quantitatively.

...read moreread less

Abstract: A general principle is proposed to solve problems in context-dependent phoneme segment (or subword unit) based speech recognition, namely, how to choose the set of units and how to estimate the context effect missing in the training data. A phoneme environment clustering algorithm, which automatically selects an optimal set of allophones and estimates the missing context, is presented. This algorithm additionally gives the means to analyze coarticulation effects automatically and quantitatively. The problem is formulated as a clustering technique in phoneme environment space to approximate the mapping function from phoneme environment space to phoneme pattern space by a limited number of centroid patterns based on a distortion measure defined on the phoneme pattern space. The algorithm is tested for phoneme recognition and word recognition, and results are discussed. >

...read moreread less

Proceedings Article•DOI•

Robust smoothing methods for discrete hidden Markov models

[...]

Richard Schwartz, Owen Kimball, Francis Kubala, Ming-Whei Feng, Y. L. Chow, C. Barry, J. Makhoul - Show less +3 more

23 May 1989

TL;DR: Three methods for smoothing discrete probability functions in discrete hidden Markov models for large-vocabulary continuous-speech recognition are presented and results show a 20-30% reduction in error rate.

...read moreread less

Abstract: Three methods for smoothing discrete probability functions in discrete hidden Markov models for large-vocabulary continuous-speech recognition are presented. The smoothing is based on deriving a probabilistic co-occurrence matrix between the different vector-quantized spectra. Each estimated probability density is then multiplied by this matrix, ensuring that none of the probabilities are severely underestimated due to lack of training data. Experimental results show a 20-30% reduction in error rate when this smoothing is used. A word error rate of 3.0% is achieved with the DARPA 1000-word continuous speech recognition database and a word-pair grammar with a perplexity of 60. >

...read moreread less

Proceedings Article•DOI•

Continuous-speech recognition using a stochastic language model

[...]

A. Paeseler¹, Hermann Ney¹•Institutions (1)

Philips¹

23 May 1989

TL;DR: The authors describe the design of a stochastic language model and its integration into a continuous-speech recognition system that is part of the SPICOS system for understanding database queries spoken in natural language.

...read moreread less

Abstract: The authors describe the design of a stochastic language model and its integration into a continuous-speech recognition system that is part of the SPICOS system for understanding database queries spoken in natural language. The recognition strategy is based on statistical decision theory. The stochastic language model for the recognition of database queries is based on probabilities of trigrams, bigrams, and unigrams of word categories, which are intended to reflect lexical and semantic aspects of the SPICOS task. The implementation of stochastic language models in the search procedure is described, and results of recognition experiments are given. By using a stochastic model (perplexity = 124) a reduction of the word error rate from 21.8% without language model (perplexity = 917) to 9.1% was achieved. >

...read moreread less

Journal Article•DOI•

Word boundary detection in broad class and phoneme strings

[...]

Jonathan Harrington¹, Gordon Watson¹, Maggie Cooper¹•Institutions (1)

University of Edinburgh¹

01 Oct 1989-Computer Speech & Language

TL;DR: It is shown that a version of the model discussed in Cutler & Norris 1988 based on the distinction between “strong” and “weak” vowels enables over 40% of word boundaries to be correctly located at the broad class level although many word boundaries are also inserted at inappropriate points.

...read moreread less

Proceedings Article•DOI•

Robust recognition of loud and Lombard speech in the fighter cockpit environment

[...]

B.J. Stanton¹, L.H. Jamieson¹, G.D. Allen¹•Institutions (1)

United States Air Force Academy¹

23 May 1989

TL;DR: In this paper, a method is devised that uses the differences in spectral slope between linear predictive coding log magnitude spectra to weight the point-by-point energy differences between the spectra.

...read moreread less

Abstract: The major goal of this research is to reduce the discrepancy in recognition performance between normal and abnormal speech, given that reference templates were derived only from normal speech. A method is devised that uses the differences in spectral slope between linear predictive coding log magnitude spectra to weight the point-by-point energy differences between the spectra. The distances of all reference tokens of like phonemes are combined to form a smallest cumulative distance (SCD) method. When SCD is combined with the method of slope-dependent weighting (SDW), the most significant success is obtained. In terms of error rates for a fixed phoneme vector length of five, SDW+SCD is found to reduce the difference in error rate between normal and abnormal speech by approximately 50%. >

...read moreread less

Journal Article•DOI•

An examination of undetected typing errors

[...]

Fred J. Damerau¹, Eric Mays¹•Institutions (1)

IBM¹

01 Nov 1989-Information Processing and Management

TL;DR: An experiment on a large body of text shows that an increase in the word list size decreases the error rate.

...read moreread less

Abstract: We examine the effect of increasing word list size on the error rate of spelling correctors. An experiment on a large body of text shows that an increase in the word list size decreases the error rate.

...read moreread less

Journal Article•DOI•

On selecting variables and assessing their performance in linear discriminant analysis

[...]

S. Ganeshanandam¹, Wojtek J. Krzanowski²•Institutions (2)

Curtin University¹, University of Reading²

01 Sep 1989-Australian & New Zealand Journal of Statistics

TL;DR: In this article, a leave-one-out method is proposed for estimating the true error rate of the selected variables, or alternatively of the selection procedure itself, and Monte Carlo simulations demonstrate the feasibility of the proposed method and indicate its much greater accuracy relative to that of other available methods.

...read moreread less

Abstract: Summary Linear discriminant analysis between two populations is considered in this paper. Error rate is reviewed as a criterion for selection of variables, and a stepwise procedure is outlined that selects variables on the basis of empirical estimates of error. Problems with assessment of the selected variables are highlighted. A leave-one-out method is proposed for estimating the true error rate of the selected variables, or alternatively of the selection procedure itself. Monte Carlo simulations, of multivariate binary as well as multivariate normal data, demonstrate the feasibility of the proposed method and indicate its much greater accuracy relative to that of other available methods.

...read moreread less

Proceedings Article•DOI•

The BBN BYBLOS Continuous Speech Recognition system

[...]

Richard Schwartz, C. Barry, Yen-Lu Chow, Alan Derr, Ming-Whei Feng, Owen Kimball, Francis Kubala, John Makhoul, Jeffrey Vandegrift - Show less +5 more

21 Feb 1989

TL;DR: The algorithms used in the BBN BYBLOS Continuous Speech Recognition system are described and a method for smoothing the discrete densities on the states of the HMM, which is intended to alleviate the problem of insufficient training for detailed phonetic models is presented.

...read moreread less

Abstract: In this paper we describe the algorithms used in the BBN BYBLOS Continuous Speech Recognition system. The BYBLOS system uses context-dependent hidden Markov models of phonemes to provide a robust model of phonetic coarticulation. We provide an update of the ongoing research aimed at improving the recognition accuracy. In the first experiment we confirm the large improvement in accuracy that can be derived by using spectral derivative parameters in the recognition. In particular, the word error rate is reduced by a factor of two. Currently the system achieves a word error rate of 2.9% when tested on the speaker-dependent part of the standard 1000-Word DARPA Resource Management Database using the Word-Pair grammar supplied with the database. When no grammar was used, the error rate is 15.3%. Finally, we present a method for smoothing the discrete densities on the states of the HMM, which is intended to alleviate the problem of insufficient training for detailed phonetic models.

...read moreread less

Patent•

Recorded data reading system

[...]

Hiroshi Nakane¹•Institutions (1)

Toshiba¹

22 Sep 1989

TL;DR: In this paper, an apparatus for reading data stored on a data storage medium is presented, which includes a data readout device coupled with an error detecter coupled to the data read out device for detecting an error rate at which errors occur in the data reproduced by the readout devices.

...read moreread less

Abstract: An apparatus for reading data stored on a data storage medium. The apparatus includes a data readout device for reproducing data stored on the data storage medium, an error detecter coupled to the data readout device for detecting an error rate at which errors occur in the data reproduced by the data readout device, a readout controller for operating the data readout device at a selected data readout rate and a system controller coupled to the error detecter and the readout controller for comparing the error rate to a predetermined system maximum error rate and for establishing the selected data readout rate such that the error rate is below the predetermined system maximum error rate.

...read moreread less

Proceedings Article•DOI•

A phonetically labeled acoustic segment (PLAS) approach to speech analysis-synthesis

[...]

F.K. Soong¹•Institutions (1)

Bell Labs¹

23 May 1989

TL;DR: A phonetically labeled acoustic segment (PLAS) approach is proposed for speech analysis-synthesis by means of a bidirectional context-constrained mapping between a phonetic space and an acoustic space.

...read moreread less

Abstract: A phonetically labeled acoustic segment (PLAS) approach is proposed for speech analysis-synthesis. The goal is to develop a unified framework for general speech processing by means of a bidirectional context-constrained mapping between a phonetic space and an acoustic space. The PLAS analysis module is a continuous phone (phoneme) recognizer, while the PLAS synthesis module is a phonetically organized acoustic database. To regulate the proposed mapping in a phonetically structured manner, phone context-dependency was imposed in phone modeling, recognition, and synthesis. The PLAS approach was tested successfully on a database of continuously spoken Japanese utterances recorded by a single male talker. The automatic segmentation boundaries derived from modeling PLAS units agreed well with corresponding manual segmentation points, i.e. they were within a +or-20-ms interval 95% of the time. A 4% phoneme recognition error rate was obtained in a continuous recognition test. Natural-sounding speech was synthesized at an average bit rate of 55 b/s allocated to segmental information. >

...read moreread less

Proceedings Article•DOI•

The Lincoln Continuous Speech Recognition system: recent developments and results

[...]

Douglas B. Paul

21 Feb 1989

TL;DR: The Lincoln stress-resistant HMM CSR has been extended to large vocabulary continuous speech for both speaker-dependent (SD) and speaker-independent (SI) tasks.

...read moreread less

Abstract: The Lincoln stress-resistant HMM CSR has been extended to large vocabulary continuous speech for both speaker-dependent (SD) and speaker-independent (SI) tasks. Performance on the DARPA Resource Management task (991 word vocabulary, perplexity 60 word-pair grammar) [1] is 3.4% word error rate for SD training of word-context-dependent triphone models and 12.6% word error rate for SI training of (word-context-free) tied mixture triphone models.

...read moreread less

Patent•DOI•

Speaker-dependent connected speech word recognition method

[...]

Thomas B. Schalk¹, George R. Doddington¹•Institutions (1)

Texas Instruments¹

03 Mar 1989-Journal of the Acoustical Society of America

TL;DR: A dynamic programming algorithm is used in which time warping to match a sample to a reference is in effect permitted, and in which matching is performed with unconstrained endpoints and the error which can be introduced by even the best preliminary decision as to word boundaries is avoided.

...read moreread less

Abstract: A cost-effective word recognizer. Each frame of spoken input is compared to a set of reference frames. The comparison is equivalent to embodying the reference frame as an LPC inverse filter, and is preferably done in the autocorrelation domain. To avoid the instability and computational difficulties which can be caused by a high-gain LPC inverse filter, a noise floor is introduced into each reference frame sample. Thus, for each input speech frame, a scalar measures its similarity to each of the vocabulary of reference frames. To achieve connected word recognition based on this similarity measurement, a dynamic programming algorithm is used in which time warping to match a sample to a reference is in effect permitted, and in which matching is performed with unconstrained endpoints. Thus, the word boundary decisions are made on the basis of a local maximum in similarity, and, since no separate word division decision is required, the error which can be introduced by even the best preliminary decision as to word boundaries is avoided.

...read moreread less

Journal Article•DOI•

A general analysis of bit error probability for reference-based BPSK mobile data transmission

[...]

A. Bateman¹•Institutions (1)

University of Bristol¹

01 Apr 1989-IEEE Transactions on Communications

TL;DR: It is shown that under both Rician and Rayleigh fading conditions, the use of a reference can eliminate the irreducible error rate phenomenon, with minimal sacrifice in bit error rate performance over an ideal BPSK system.

...read moreread less

Abstract: Unified analysis of the performance of binary phase shift keying (BPSK) under static and mobile operating conditions is presented for the case in which a separate reference tone is used for channel sounding and subsequent 'coherent' data detection. It is shown that under both Rician and Rayleigh fading conditions, the use of a reference can eliminate the irreducible error rate phenomenon, with minimal sacrifice in bit error rate performance over an ideal BPSK system. >

...read moreread less