Showing papers on "Word error rate published in 1988"

PDF

Open Access

Book•

Automatic speech recognition : the development of the SPHINX system

[...]

31 Oct 1988

TL;DR: This paper presents a meta-analysis of the SPHINX system and its applications to speech recognition, finding a good unit of speech and finding a Good Unit of Speech that learns and adapts to new environments.

...read moreread less

Abstract: 1. Introduction.- 2. Hidden Markov Modeling of Speech.- 3. Task and Databases.- 4. The Baseline SPHINX System.- 5. Adding Knowledge.- 6. Finding a Good Unit of Speech.- 7. Learning and Adaptation.- 8. Summary of Results.- 9. Conclusion.- Appendix I. Evaluating Speech Recognizers.- I.1. Perplexity.- I.2. Computing Error Rate.- Appendix H. The Resource Management Task.- II.1. The Vocabulary and the SPHINX Pronunciation Dictionary.- II.2. The Grammar.- II.3. Training and Test Speakers.- Appendix III. Examples of SPHINX Recognition.- References.

...read moreread less

462 citations

Patent•DOI•

Method for interactive speech recognition and training

[...]

Jed M. Roberts, James K. Baker, Edward W. Porter

06 Dec 1988-Journal of the Acoustical Society of America

TL;DR: A method for creating word models for a large vocabulary, natural language dictation system that may be used for connected speech as well as for discrete utterances.

...read moreread less

Abstract: A method for creating word models for a large vocabulary, natural language dictation system. A user with limited typing skills can create documents with little or no advance training of word models. As the user is dictating, the user speaks a word which may or may not already be in the active vocabulary. The system displays a list of the words in the active vocabulary which best match the spoken word. By keyboard or voice command, the user may choose the correct word from the list or may choose to edit a similar word if the correct word is not on the list. Alternately, the user may type or speak the initial letters of the word. Then the recognition algorithm is called again satisfying the initial letters, and the choices displayed again. A word list is then also displayed from a large backup vocabulary. The best words to display from the backup vocabulary are chosen using a statistical language model and optionally word models derived from a phonemic dictionary. When the correct word is chosen by the user, the speech sample is used to create or update an acoustic model for the word, without further intervention by the user. As the system is used, it also constantly updates its statistical language model. The system gets more and more word models and keeps improving its performance the more it is used. The system may be used for connected speech as well as for discrete utterances.

...read moreread less

284 citations

Journal Article•DOI•

An approach to human reliability on man-machine systems using error possibility

[...]

Takehisa Onisawa¹•Institutions (1)

Kumamoto University¹

01 Aug 1988-Fuzzy Sets and Systems

TL;DR: A method of fault tree analysis of human errors based on the concept ‘error possibility’ instead of the error rate and it is shown that the proposed method gives us information.

...read moreread less

245 citations

Journal Article•DOI•

Cepstral domain talker stress compensation for robust speech recognition

[...]

Y. Chen¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Apr 1988-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A study of talker-stress-induced intraword variability and an algorithm that compensates for the systematic changes observed are presented and the functional form of the compensation is shown to correspond to the equalization of spectral tilts.

...read moreread less

Abstract: A study of talker-stress-induced intraword variability and an algorithm that compensates for the systematic changes observed are presented. The study is based on hidden Markov models trained by speech tokens spoken in various talking styles. The talking styles include normal speech, fast speech, loud speech, soft speech, and taking with noise injected through earphones; the styles are designed to simulate speech produced under real stressful conditions. Cepstral coefficients are used as the parameters in the hidden Markov models. The stress compensation algorithm compensates for the variations in the cepstral coefficients in a hypothesis-driven manner. The functional form of the compensation is shown to correspond to the equalization of spectral tilts. Substantial reduction of error rates has been achieved when the cepstral domain compensation techniques were tested on the simulated-stress speech database. The hypothesis-driven compensation technique reduced the average error rate from 13.9% to 6.2%. When a more sophisticated recognizer was used, it reduced the error rate from 2.5% to 1.9%. >

...read moreread less

114 citations

Journal Article•DOI•

On robust linear prediction of speech

[...]

Chin-Hui Lee¹•Institutions (1)

Bell Labs¹

01 May 1988-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A robust linear prediction (LP) algorithms is proposed that minimizes the sum of appropriately weighted residuals and takes into account the non-Gaussian nature of the excitations for voiced speech and gives a more efficient and less biased estimate for the prediction coefficients than conventional methods.

...read moreread less

Abstract: A robust linear prediction (LP) algorithms is proposed that minimizes the sum of appropriately weighted residuals. The weight is a function of the prediction residual, and the cost function is selected to give more weight to the bulk of small residuals while deemphasizing the small portion of large residuals. In contrast, the conventional LP procedure weights all prediction residuals equally. The robust algorithm takes into account the non-Gaussian nature of the excitations for voiced speech and gives a more efficient (less variance) and less biased estimate for the prediction coefficients than conventional methods. The algorithm can be used in the front-end features extractor for a speech recognition system and as an analyzer for a speech coding system. Testing on synthetic vowel data demonstrates that the robust LP procedure is able to reduce the formant and bandwidth error rate by more than an order of magnitude compared to the conventional LP procedures and is relatively insensitive to the placement of the LPC (LP coding) analysis window and to the value of the pitch period, for a given section of speech signal. >

...read moreread less

112 citations

Journal Article•DOI•

On a model-robust training method for speech recognition

[...]

A. Nadas¹, David Nahamoo¹, Michael Picheny¹•Institutions (1)

IBM¹

01 Sep 1988-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: For minimizing the decoding error rate of the (optimal) maximum a posteriori probability (MAP) decoder, it is shown that the CMLE (or maximum mutual information estimate, MMIE) may be preferable when the model is incorrect.

...read moreread less

Abstract: Training methods for designing better decoders are compared. The training problem is considered as a statistical parameter estimation problem. In particular, the conditional maximum likelihood estimate (CMLE), which estimates the parameter values that maximize the conditional probability of words given acoustics during training, is compared to the maximum-likelihood estimate, which is obtained by maximizing the joint probability of the words and acoustics. For minimizing the decoding error rate of the (optimal) maximum a posteriori probability (MAP) decoder, it is shown that the CMLE (or maximum mutual information estimate, MMIE) may be preferable when the model is incorrect. In this sense, the CMLE/MMIE appears more robust than the MLE. >

...read moreread less

94 citations

Journal Article•DOI•

The throughput efficiency of the go-back-N ARQ scheme under Markov and related error structures

[...]

Clement H. C. Leung¹, Y. Kikumoto, S.A. Sorensen•Institutions (1)

Birkbeck, University of London¹

01 Feb 1988-IEEE Transactions on Communications

TL;DR: It is found that for a given error rate, error patterns having zero correlation between successive transmission generally fare better than those with negative correlation, and that error patterns with positive correlation fare better still.

...read moreread less

Abstract: A formula for the go-back-N ARQ (automatic repeat request) scheme applicable to Markov error patterns is derived. It is a generalization of the well-known efficiency formula p/(p+m(1-p)) (where m is the round trip delay in number of block durations and p is the block transmission success probability), and it has been successfully validated against simulation measurements. It is found that for a given error rate, error patterns having zero correlation between successive transmission generally fare better than those with negative correlation, and that error patterns with positive correlation fare better still. It is shown that the present analysis can be extended in a straightforward manner to cope with error patterns of a more complex nature. Simple procedures for numerical evaluation of efficiency under quite general error structures are presented. >

...read moreread less

84 citations

Proceedings Article•DOI•

Finding clauses in unrestricted text by finitary and stochastic methods

[...]

Eva Ejerhed¹•Institutions (1)

Umeå University¹

09 Feb 1988

TL;DR: A comparison of the error rates of the two parsing methods in the recognition of basic clauses showed that there was a 13% error rates for the regular expression method and a 6.5% error rate for the stochastic method.

...read moreread less

Abstract: The paper presents and compares two different methods of parsing, a regular expression method and a stochastic method, with respect to their success in identifying basic clauses in unrestricted English text. These methods of parsing were developed in order to be applied to the task of improving the detection of large prosodic units in the Bell Labs text-to-speech system, and were so applied experimentally. The paper also discusses the notion of basic clause that was defined as the parsing target. The result of a comparison of the error rates of the two parsing methods in the recognition of basic clauses showed that there was a 13% error rate for the regular expression method and a 6.5% error rate for the stochastic method.

...read moreread less

54 citations

Proceedings Article•DOI•

Speaker stress-resistant continuous speech recognition

[...]

D.B. Paul¹, E.A. Martin¹•Institutions (1)

Massachusetts Institute of Technology¹

11 Apr 1988

TL;DR: This system extends an earlier robust continuous observation HMM IWR system to continuous speech using the DARPA-robust (multi-condition with a pilot's facemask) database.

...read moreread less

Abstract: Most speech recognizers are sensitive to the speech style and the speaker's environment. This system extends an earlier robust continuous observation HMM IWR system to continuous speech using the DARPA-robust (multi-condition with a pilot's facemask) database. Performance on a 207 word, perplexity 14 task is 0.9% word error rate under office conditions and 2.5% (best speaker) and 5% (4 speaker average) for the normal test condition of the database. >

...read moreread less

54 citations

Book•

An efficient selective-repeat ARQ scheme for satellite channels and its throughput analysis

[...]

Philip S. Yu, Shu Lin

01 Dec 1988

TL;DR: In this article, a selective-repeat ARQ scheme with a finite receiver buffer and a finite range of sequence numbers is proposed, and the throughput performance is analyzed and simulated based on the assumption that the channel errors are randomly distributed and the return channel is noiseless.

...read moreread less

Abstract: In this paper, we investigate a selective-repeat ARQ scheme which operates with a finite receiver buffer and a finite range of sequence numbers. The throughput performance of the proposed scheme is analyzed and simulated based on the assumption that the channel errors are randomly distributed and the return channel is noiseless. Both analytical and simulation results show that it significantly outperforms the go-back- N ARQ scheme, particularly for channels with large roundtrip delay and high data rate. It provides high throughput efficiency over a wide range of bit error rates. The throughput remains in a usable range even for very high error rate conditions. The proposed scheme is capable of handling data and/or acknowledgment loss. Furthermore, when buffer overflow occurs at the receiver, the transmitter is capable of detecting it and backs up to the proper location of the input queue to retransmit the correct data blocks.

...read moreread less

50 citations

Patent•DOI•

Process for the recognition of a continuous flow of spoken words

[...]

Annedore Paeseler¹•Institutions (1)

Philips¹

11 Jul 1988-Journal of the Acoustical Society of America

TL;DR: In this article, the recognition process is achieved by comparing the input sequence of speech signals to reference values and summing those which are syntactically permissible until they form a valid word.

...read moreread less

Abstract: Continuous speech recognition assigns predetermined words to syntactic categories and defines the syntactic categories which can follow and precede each predetermined word. The recognition process is achieved by comparing the input sequence of speech signals to reference values and summing those which are syntactically permissible until they form a valid word. Subsequent speech values to previouly calculated valid words are compared to reference values listed in syntactic categories which can follow the predetermined word. For each word, values are updated indicating the current word's sequence number, syntax category, cumulative comparison sum, and the current list of compared words. Values are also stored for each word which identify the previous word, the following word and their syntax categories. This process is repeated until all input values have been processed. The results are then checked to verify valid syntax and the words with the closest match are read out.

...read moreread less

Journal Article•DOI•

A representation of human reliability using fuzzy concepts

[...]

Takehisa Onisawa¹•Institutions (1)

Kumamoto University¹

01 Jul 1988-Information Sciences

TL;DR: It is suggested that the fuzzy expression of human reliability is defined by all the factors required for the task: the error rate, the time required, and so on.

...read moreread less

Proceedings Article•

Improved speaker adaptation using text dependent spectral mappings.

[...]

Ming-Whei Feng, Francis Kubala, Richard Schwartz, John Makhoul

01 Jan 1988

TL;DR: A new probabdistic spectral edure is investigated to estimate the transformation of speaker adaptation and it is found that significant unprovement in recognition has been achieved compared to the previous adaptation algorithm.

...read moreread less

Abstract: of speaker adaptation is to minmize the amount v Models to model the speech from the new speaker d in high recognition accuracy with a grammar of ce, we investigate a new probabdistic spectral edure to estimate the transformation. To evaluate rithm, recognition expenments are carried out on 1000-word resource management continuous ase using a grammar with perplexity 60. The that significant unprovement in recognition has been achieved compared to our previous adaptation algorithm. The average word error rate of speakeradapted models using 2 minutes of training speech is 11.3% compared to 7.1% for speaker-dependent models using 20-28

...read moreread less

Patent•DOI•

Low cost speech recognition system and method

[...]

George R. Doddington¹, Periagaram K. Rajasekaran¹, Michael L. Mcmahan¹, Wallace Anderson¹•Institutions (1)

Texas Instruments¹

28 Jul 1988-Journal of the Acoustical Society of America

TL;DR: In this paper, a low cost speech recognition system was proposed to generate frames of received speech having binary feature components. But the received speech frames were compared with reference templates, and error values representing the difference between the Received Speech and the Reference Templates were generated.

...read moreread less

Abstract: A low cost speech recognition system generates frames of received speech having binary feature components. The received speech frames are compared (18) with reference templates (22) , and error values representing the difference between the received speech and the reference templates (22) are generated. At the end of an utterance, if one template resulted in a sufficiently small error value, the word represented by that template is selected (26) as the recognized word.

...read moreread less

Proceedings Article•DOI•

Variants of cepstrum based speaker identity verification

[...]

G. Velius

11 Apr 1988

TL;DR: Analysis parameters and various distance measures are investigated for a template matching scheme for speaker identity verification (SIV) and performance varies significantly across vocabulary, and average performance is approximately 5% EER for the better algorithms on telephone speech.

...read moreread less

Abstract: Analysis parameters and various distance measures are investigated for a template matching scheme for speaker identity verification (SIV). Two parameters are systematically varied-the length of the signal analysis window, and the order of the linear predictive coding/-cepstrum analysis. Computational costs associated with the choice of parameters are also considered. The distance measures tested are the Euclidean, inverse variance weighting, differential mean weighting, Kahn's simplified weighting, the Mahalanobis distance, and the Fisher linear discriminant. Using the equal error rate (EER) of pairwise utterance dissimilarity distributions, performance is estimated for prespecified and (a simulation of) user-determined input vocabulary. Performance varies significantly across vocabulary, and average performance is approximately 5% EER for the better algorithms on telephone speech. >

...read moreread less

Proceedings Article•DOI•

Improved speaker adaption using text dependent spectral mappings

[...]

Ming-Whei Feng¹, F. Kubla, R. Schwartz, J. Makhoul•Institutions (1)

Northeastern University¹

11 Apr 1988

TL;DR: A novel text-dependent probabilistic spectral mapping method is presented for rapid speaker adaptation that results in significant better performance than the previous algorithms, and also provides recognition performance which is less than two times the word error rate for speaker-dependent training.

...read moreread less

Abstract: A novel text-dependent probabilistic spectral mapping method is presented for rapid speaker adaptation. The algorithm has been tested on the DARPA 1000-word resource management database with a grammar perplexity of 60. It results in significant better performance than the previous algorithms, and also provides recognition performance which is less than two times the word error rate for speaker-dependent training, using two minutes of adaptation speech. >

...read moreread less

Journal Article•DOI•

Multilevel decoding for very-large-size-dictionary speech recognition

[...]

Bernard Merialdo¹•Institutions (1)

IBM¹

01 Mar 1988-Ibm Journal of Research and Development

TL;DR: This paper describes a new organization of the recognition process, Multilevel Decoding (MLD), that allows the system to support a Very-Large-Size Dictionary (VLSD)—one comprising over 100,000 words, which significantly surpasses the capacity of previous speech-recognition systems.

...read moreread less

Abstract: An important concern in the field of speech recognition is the size of the vocabulary that a recognition system is able to support. Large vocabularies introduce difficulties involving the amount of computation the system must perform and the number of ambiguities it must resolve. But, for practical applications in general and for dictation tasks in particular, large vocabularies are required, because of the difficulties and inconveniences involved in restricting the speaker to the use of a limited vocabulary. This paper describes a new organization of the recognition process, Multilevel Decoding (MLD), that allows the system to support a Very-Large-Size Dictionary (VLSD)—one comprising over 100,000 words. This significantly surpasses the capacity of previous speech-recognition systems. With MLD, the effect of dictionary size on the accuracy of recognition can be studied. In this paper, recognition experiments using 10,000- and 200,000-word dictionaries are compared. They indicate that recognition using a 200,000-word dictionary is more accurate than recognition using a 10,000-word dictionary (when unrecognized words are included in the error rate).

...read moreread less

Proceedings Article•DOI•

Speaker adaptation method for HMM-based speech recognition

[...]

Masafumi Nishimura¹, K. Sugawara¹•Institutions (1)

IBM¹

11 Apr 1988

TL;DR: The authors describe a speaker adaptation method consisting of two stages, in the first stage, label prototypes, which represent spectral features, are modified to reduce the total distortion error of vector quantization for a new speaker.

...read moreread less

Abstract: The authors describe a speaker adaptation method consisting of two stages. In the first stage, label prototypes, which represent spectral features, are modified to reduce the total distortion error of vector quantization for a new speaker. In the second stage, well-trained hidden Markov model (HMM) parameters are transformed by using a linear mapping function. This is estimated by counting the correspondences along the alignment between a state sequence of an HMM and a label sequence of a new speaker utterance. This adaptation procedure was tested in an isolated word recognition task using 150 confusable Japanese words. The original label prototypes and HMM parameters were estimated for a male speaker, who spoke each word 10 times. When the adaptation procedure was applied with 25 words, the average error rate for another seven male speakers was reduced from 25.0% to 5.6%, which was roughly the same as that for the original speaker. This procedure was also effective for adaptation between male and female speakers. >

...read moreread less

Proceedings Article•DOI•

Large vocabulary speech recognition using a hidden Markov model for acoustic/phonetic classification

[...]

Stephen E. Levinson¹, Andrej Ljolje¹, L.G. Miller¹•Institutions (1)

Bell Labs¹

11 Apr 1988

TL;DR: A speech recognition system that comprises an acoustic/phonetic decoder, a lexical access mechanism and a syntax analyzer based on a continuously variable duration hidden Markov model and a content-free covering grammar of English is reported.

...read moreread less

Abstract: Experiments with a speech recognition system are reported. The system comprises an acoustic/phonetic decoder, a lexical access mechanism and a syntax analyzer. The acoustic, phonetic and lexical processing are based on a continuously variable duration hidden Markov model (CVDHMM). The syntactic component is based on the Cocke-Kasami-Young (CKY) parser and a content-free covering grammar of English. Lexical items are represented in terms of the 43 phonetic units. In recognition tests conducted on a separate data set, a 70% correct recognition rate on phonetic units in fluent speech was observed. In two additional tests on isolated words, a 40% word recognition was observed with the complete 52000 word lexicon. When the vocabulary size was reduced to 1040 words, the recognition rate improved to 80%. After syntax analysis the word recognition rate rose to 90%. >

...read moreread less

Proceedings Article•DOI•

Optimization of perceptually-based ASR front-end (automatic speech recognition)

[...]

H. Hermansky, J.C. Junqua

11 Apr 1988

TL;DR: Several recently proposed automatic speech recognition (ASR) front-ends are experimentally compared for speaker-dependent and cross-speaker ASR and the perceptually based linear predictive front-end yields the highest accuracies.

...read moreread less

Abstract: Several recently proposed automatic speech recognition (ASR) front-ends are experimentally compared for speaker-dependent and cross-speaker ASR. The perceptually based linear predictive front-end yields the highest accuracies. By modifying its sensitivity to spectral peaks and to spectral tilt and by utilizing the speech dynamics the authors further improve, by about 10%, its error rate in speaker-independent ASR. >

...read moreread less

Proceedings Article•DOI•

1000-word speaker-independent continuous-speech recognition using hidden Markov models

[...]

H. Murviet, M. Weintraub

11 Apr 1988

TL;DR: An algorithm based on hidden Markov models is applied to the task of speaker-independent continuous-speech recognition for a vocabulary of 1000 words with no syntactic constraints, and it was found that the use of several different acoustic features and theUse of word-specific phonetic modeling, where possible, improved system performance.

...read moreread less

Abstract: An algorithm based on hidden Markov models is applied to the task of speaker-independent continuous-speech recognition for a vocabulary of 1000 words with no syntactic constraints. The signal is limited to 4000 Hz. Word models were built from three-state representations of phonetic units, concatenated according to entries in a lexicon. Performance as measured on DARPAs resource management database was 40% correct word recognition. It was found that the use of several different acoustic features and the use of word-specific phonetic modeling, where possible, improved system performance. >

...read moreread less

Proceedings Article•DOI•

Evaluating the performance of connected-word speech recognition systems

[...]

M. Hunt¹•Institutions (1)

National Research Council¹

11 Apr 1988

TL;DR: Simulations show that the commonly used dynamic programming word-sequence matching algorithm has serious shortcomings as an evaluation method at low performance levels, though it is generally reliable at high performance levels and a method using word end-point information provides precise, detailed performance analyses.

...read moreread less

Abstract: Outputs of connected-word recognizers may contain substitution, deletion and insertion errors, and their interpretation is not trivial. Simulations show that the commonly used dynamic programming word-sequence matching algorithm has serious shortcomings as an evaluation method at low performance levels, though it is generally reliable at high performance levels. The strategy of comparing input and output words in strict sequence is found to have little to recommend it. A method using word end-point information, which provides precise, detailed performance analyses, is described. Tests with real data confirm the reliability of the end-point method and the presence of positive bias in performance estimates form the word-sequence matching method. >

...read moreread less

Design and evaluation of an on-line predictive diagnostic system

[...]

Ting-Ting Yao Lin

01 Jan 1988

TL;DR: The Dispersion Frame Technique (DFT) was developed from the observation that electromechanical devices experience a period of deteriorating performance usually in the form of increasing error rate prior to catastrophic failure.

...read moreread less

Abstract: Projections indicate that the use of personal computing environments and distributed networks will increase by an order of magnitude by the turn of the century. In addition, personal workstations will continue to increase in complexity through the use of VLSI hardware. Therefore an efficient and flexible means for maintaining high availability in workstations becomes vitally important. Thus fault handling, as well as the collection and analysis of data produced by faults must be automated. A distributed on-line monitoring and predictive diagnostic system has been developed. The diagnostic system integrates error logging, monitoring, and control functions. The hybrid architecture implementation, where diagnosability is integrated in both the centralized diagnostic server and the individual file server, ensures synchronization and robustness of the communication between the diagnostic system and the file server machines. Data collected from the file servers over the last twenty-two months was analyzed. Twenty-nine permanent faults were identified in the operator's log and were shown to follow an exponential failure distribution. The error log was shown to contain events which are caused by a mixture of transient and intermittent faults. The failure distribution of the transient faults can be characterized by the Weibull function with a decreasing error rate, whereas that of the intermittent faults exhibits an increasing error rate. The failure distribution of the entire error log also follows a Weibull distribution with a decreasing error rate. The parameters of the entire error log distribution are a function of the relationship between transient and intermittent faults as summarized by the ratios of the shape parameters and the relative frequency of error occurrences. It is shown that 25 faults are typically required in this study to give an accurate estimate of the Weibull parameters. Studying the average number of faults before repair activities shows that users will not tolerate such a large number of errors, and subsequent system crashes, prior to an attempted repair. Hence the Dispersion Frame Technique (DFT) was developed from the observation that electromechanical devices experience a period of deteriorating performance usually in the form of increasing error rate prior to catastrophic failure. (Abstract shortened with permission of author.)

...read moreread less

Proceedings Article•DOI•

Use of neural networks for the recognition of place of articulation

[...]

Yoshua Bengio¹, R. De Mori¹•Institutions (1)

McGill University¹

11 Apr 1988

TL;DR: The Boltzmann machine algorithm and the error back propagation algorithm were used to learn to recognize the place of articulation of vowels (front, center or back), represented by a static description of spectral lines, which shows a fault tolerant property of the neural nets.

...read moreread less

Abstract: The Boltzmann machine algorithm and the error back propagation algorithm were used to learn to recognize the place of articulation of vowels (front, center or back), represented by a static description of spectral lines. The error rate is shown to depend on the coding. Results are comparable or better than those obtained by us on the same data using hidden Markov models. The authors also show a fault tolerant property of the neural nets, i.e. that the error on the test set increases slowly and gradually when an increasing number of nodes fail. >

...read moreread less

Proceedings Article•DOI•

Interaction between fast lexical access and word verification in large vocabulary continuous speech recognition

[...]

L. Fissore¹, Pietro Laface², G. Micca¹, R. Pieraccini¹•Institutions (2)

CSELT¹, Polytechnic University of Turin²

11 Apr 1988

TL;DR: A tight integration between the two steps rather than a hierarchical approach has been investigated and the hypothesization and the verification modules are implemented as processes running in parallel.

...read moreread less

Abstract: Recently a two step strategy for large vocabulary isolated word recognition has been successfully experimented. The first step consists in the hypothesization of a reduced set of word candidates on the basis of broad bottom-up features, while the second one is the verification of the hypotheses using more detailed phonetic knowledge. This paper deals with its extension to continuous speech. A tight integration between the two steps rather than a hierarchical approach has been investigated. The hypothesization and the verification modules are implemented as processes running in parallel. Both processes represent lexical knowledge by a tree. Each node of the hypothesization tree is labeled by one of 6 broad phonetic classes. The nodes of the verification tree are, instead, the states of sub-word HMMs. The two processes cooperate to detect word hypotheses along the sentence. >

...read moreread less

DOI•

Influence of IF-filtering on bit error rate floor in coherent optical DPSK-systems

[...]

E. Patzak, P. Meissner

01 Oct 1988

TL;DR: In this paper, the influence of IF-filtering on the bit error rate floor in optical DPSK-systems is investigated, and it is shown that IF filtering leads to a reduction of the error rate, so that the linewidth requirements are reduced by a factor of 068.

...read moreread less

Abstract: We investigate the influence of IF-filtering on the bit error rate floor in optical DPSK-systems; this influence is usually neglected We show that the IF-filtering leads to a reduction of the error rate floor, so that the linewidth requirements are reduced by a factor of 068

...read moreread less

Journal Article•DOI•

A method to analyze performance of digital connections

[...]

S. Dravida¹, M.J. Master¹, C.H. Morton¹•Institutions (1)

Bell Labs¹

01 Mar 1988-IEEE Transactions on Communications

TL;DR: A comprehensive performance analysis method that models, at bit level, the error performance of individual links in an end-to-end connection is presented and the utility and power of the model are illustrated with the help of an example connection.

...read moreread less

Abstract: A comprehensive performance analysis method that models, at bit level, the error performance of individual links in an end-to-end connection is presented. The link model accounts for the burst-error behaviour of each individual link. A method to concatenate several individual links and extract a model for the end-to-end connection is given. This resulting end-to-end model can be used to calculate performance measures such as bit error rate and block error rate for any given block size. A procedure to compute the probability distribution of errors within a specific block is also developed. Finally, a method to compute the probability distribution of blocks having a certain error rate over a given period of time is presented. The utility and power of the model are illustrated with the help of an example connection. >

...read moreread less

Proceedings Article•DOI•

Dynamic adaptation of Hidden Markov models for robust isolated-word speech recognition

[...]

E.A. Martin¹, Richard P. Lippmann¹, D.B. Paul¹•Institutions (1)

Massachusetts Institute of Technology¹

11 Apr 1988

TL;DR: An HMM-based isolated-word recognition system that dynamically adapts word model parameters to new speakers and to stress-induced speech variations that produces results comparable to multistyle-trained systems.

...read moreread less

Abstract: The authors describe an HMM-based isolated-word recognition system that dynamically adapts word model parameters to new speakers and to stress-induced speech variations. During recognition all input tokens presented to the system can be used to augment the current word model parameters. New tokens can be weighted so that adaptation simply increases the size of the training set, or tracks systematic changes by exponentially weighting all previously seen data. This system was tested on the 35-word 10710 token Lincoln stressed speech data base. Speaker adaptation experiments produced error rates equivalent to speaker-trained systems after the presentation of only a single new token per vocabulary word. Stress condition adaptation experiments produced results comparable to multistyle-trained systems after the presentation of several new tokens per vocabulary word. >

...read moreread less

Journal Article•DOI•

Multifont character recognition for typeset documents

[...]

Seymour Shlien¹•Institutions (1)

Government of Canada¹

01 Dec 1988-International Journal of Pattern Recognition and Artificial Intelligence

TL;DR: The design of a multifont character recognizer which uses a binary decision tree to classify a character on the basis of 197 geometric features is described, which was highly sensitive to typeface and error rates varied between 10 percent and 0.1 percent.

...read moreread less

Abstract: An optical character reader for processing typeset documents must be able to handle proportional spacing, the presence of touching characters and a wide variety of type fonts. This paper describes the design of a multifont character recognizer which uses a binary decision tree to classify a character on the basis of 197 geometric features. The algorithm for designing the decision tree is based upon an entropy minimization procedure, and makes no assumptions on the distribution or independence of the binary features. The decision tree classifier provides confidence measures which may be used to reduce the substitution error rate at the expense of higher rejection rates. Methods of reducing the overall error rate by combining the decision tree classifier with other classifiers were examined. In particular, the paper evaluates the performance of a classifier using a combination of multiple decision trees, template matching and contextual post-processing. Error rates were highly sensitive to typeface and varied between 10 percent and 0.1 percent. Computer processing times for the various stages of the system are presented.

...read moreread less

Patent•DOI•

Word spotting in a speech recognition system without predetermined endpoint detection

[...]

Ira A. Gerson¹•Institutions (1)

Motorola¹

31 Oct 1988-Journal of the Acoustical Society of America

TL;DR: The invention is intended to be implemented in a system which has word templates stored in template memory, with the system being capable of accumulating distance measures for states within each word template.

...read moreread less

Abstract: Word spotting in a speech recognition system without predetermining the endpoints of the input speech. The invention is intended to be implemented in a system which has word templates stored in template memory, with the system being capable of accumulating distance measures for states within each word template. The following steps are used to generate a measure of similarity between a subset of the input frames and a word template. The steps are: a) recording a beginning input frame number for each state to identify the potential beginning of the word; b) accumulating distance measures for at least one state for each input frame; c) normalizing the distance measures by substracting a normalization amount from each distance measure; d) recording normalization information corresponding to the normalization amount for each input frame; and e) determining a similarity measure between the word template and a subset of input frames after a given input frame has been processed. The subset is identified from the beginning input frame number corresponding to an end state of the template, through the given input frame number. The similarity measure is based on the normalized distance measure recorded for the end state. and the normalization information.

...read moreread less