Showing papers on "Word error rate published in 1987"

PDF

Open Access

Proceedings Article•DOI•

Multi-style training for robust isolated-word speech recognition

[...]

Richard P. Lippmann¹, E.A. Martin¹, D.B. Paul¹•Institutions (1)

06 Apr 1987

TL;DR: A new training procedure called multi-style training has been developed to improve performance when a recognizer is used under stress or in high noise but cannot be trained in these conditions.

...read moreread less

Abstract: A new training procedure called multi-style training has been developed to improve performance when a recognizer is used under stress or in high noise but cannot be trained in these conditions Instead of speaking normally during training, talkers use different, easily produced, talking styles This technique was tested using a speech data base that included stress speech produced during a workload task and when intense noise was presented through earphones A continuous-distribution talker-dependent Hidden Markov Model (HMM) recognizer was trained both normally (5 normally spoken tokens) and with multi-style training (one token each from normal, fast, clear, loud, and question-pitch talking styles) The average error rate under stress and normal conditions fell by more than a factor of two with multi-style training and the average error rate under conditions sampled during training fell by a factor of four

...read moreread less

344 citations

Journal Article•DOI•

On the use of bandpass liftering in speech recognition

[...]

Biing-Hwang Juang¹, Lawrence R. Rabiner², Jay G. Wilpon²•Institutions (2)

Bell Labs¹, AT&T²

01 Jul 1987-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: This paper has found that a bandpass "liftering" process reduces the variability of the statistical components of LPC-based spectral measurements and hence it is desirable to use such a liftering process in a speech recognizer.

...read moreread less

Abstract: In a template-based speech recognition system, distortion measures that compute the distance or dissimilarity between two spectral representations have a strong influence on the performance of the recognizer. Accordingly, extensive comparative studies have been conducted to determine good distortion measures for improved recognition accuracy. Previous studies have shown that the log likelihood ratio measure, the likelihood ratio measure, and the truncated cepstral measures all gave good recognition performance (comparable accuracy) for isolated word recognition tasks. In this paper we extend the interpretation of distortion measures, based upon the observation that measurements of speech spectral envelopes (as normally obtained from standard analysis procedures such as LPC or filter banks) are prone to statistical variations due to window position fluctuations, excitation interference, measurement noise, etc., and may not accurately characterize the true speech spectrum because of analysis model constraints. We have found that these undesirable spectral measurement variations can be partially controlled (i.e., reduced in the level of variation) by appropriate signal processing techniques. In particular, we have found that a bandpass "liftering" process reduces the variability of the statistical components of LPC-based spectral measurements and hence it is desirable to use such a liftering process in a speech recognizer. We have applied this liftering process to several speech recognition tasks: in particular, single frame vowel recognition and isolated word recognition. Using the liftering process, we have been able to achieve an average digit error rate of 1 percent in a speaker-independent isolated digit test. This error rate is about one-half that obtained without the liftering process.

...read moreread less

291 citations

Journal Article•DOI•

Prosodic structure and spoken word recognition

[...]

François Grosjean¹, James Paul Gee²•Institutions (2)

Northeastern University¹, Boston University²

01 Mar 1987-Cognition

TL;DR: It is argued that the written language notion of the word has had too much impact on models of spoken word recognition and a view of continuous word recognition is presented which takes into account the alternating pattern of weak and strong syllables in the speech stream.

...read moreread less

221 citations

Journal Article•DOI•

Bootstrap Techniques for Error Estimation

[...]

Anil K. Jain¹, Richard C. Dubes¹, Chaur-Chin Chen¹•Institutions (1)

Michigan State University¹

01 May 1987-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Results on the application of several bootstrap techniques in estimating the error rate of 1-NN and quadratic classifiers show that, in most cases, the confidence interval of a bootstrap estimator of classification error is smaller than that of the leave-one-out estimator.

...read moreread less

Abstract: The design of a pattern recognition system requires careful attention to error estimation. The error rate is the most important descriptor of a classifier's performance. The commonly used estimates of error rate are based on the holdout method, the resubstitution method, and the leave-one-out method. All suffer either from large bias or large variance and their sample distributions are not known. Bootstrapping refers to a class of procedures that resample given data by computer. It permits determining the statistical properties of an estimator when very little is known about the underlying distribution and no additional samples are available. Since its publication in the last decade, the bootstrap technique has been successfully applied to many statistical estimations and inference problems. However, it has not been exploited in the design of pattern recognition systems. We report results on the application of several bootstrap techniques in estimating the error rate of 1-NN and quadratic classifiers. Our experiments show that, in most cases, the confidence interval of a bootstrap estimator of classification error is smaller than that of the leave-one-out estimator. The error of 1-NN, quadratic, and Fisher classifiers are estimated for several real data sets.

...read moreread less

201 citations

Journal Article•DOI•

A weighted cepstral distance measure for speech recognition

[...]

Y. Tohkura

01 Oct 1987-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The experimental results show that the weighted cepstral distance measure works substantially better than both the Euclidean cepStral distance and the log likelihood ratio distance measures across two different databases.

...read moreread less

Abstract: A weighted cepstral distance measure is proposed and is tested in a speaker-independent isolated word recognition system using standard DTW (dynamic time warping) techniques. The measure is a statistically weighted distance measure with weights equal to the inverse variance of the cepstral coefficients. The experimental results show that the weighted cepstral distance measure works substantially better than both the Euclidean cepstral distance and the log likelihood ratio distance measures across two different databases. The recognition error rate obtained using the weighted cepstral distance measure was about 1 percent for digit recognition. This result was less than one-fourth of that obtained using the simple Euclidean cepstral distance measure and about one-third of the results using the log likelihood ratio distance measure. The most significant performance characteristic of the weighted cepstral distance was that it tended to equalize the performance of the recognizer across different talkers.

...read moreread less

181 citations

Journal Article•DOI•

A Re-Examination of the Distance-Weighted k-Nearest Neighbor Classification Rule

[...]

James E. S. Macleod¹, Andrew Luk¹, D. Michael Titterington¹•Institutions (1)

University of Glasgow¹

01 Jul 1987

TL;DR: This paper confirms that the asymptotic classification error rate of the (unweighted) k-nearest neighbor (k-NN) rule is lower than that of any weighted k-NN rule, and presents experimental results that were obtained using a generalized form of a weighting function proposed by Dudani.

...read moreread less

Abstract: It was previously proved by Bailey and Jain that the asymptotic classification error rate of the (unweighted) k-nearest neighbor (k-NN) rule is lower than that of any weighted k-NN rule. Equations are developed for the classification error rate of a test sample when the number of training samples is finite, and it is argued intuitively that a weighted rule may then in some cases achieve a lower error rate than the unweighted rule. This conclusion is confirmed by analytically solving a particular simple problem, and as an illustration, experimental results are presented that were obtained using a generalized form of a weighting function proposed by Dudani.

...read moreread less

94 citations

Proceedings Article•DOI•

A speaker-stress resistant HMM isolated word recognizer

[...]

D.B. Paul¹•Institutions (1)

Massachusetts Institute of Technology¹

06 Apr 1987

TL;DR: An effort to make a Hidden Markov Model Isolated Word Recognizer (IWR) tolerant to such speech changes caused by speaker stress was made.

...read moreread less

Abstract: Most current speech recognition systems are sensitive to variations in speaker style, the following is the result of an effort to make a Hidden Markov Model (HMM) Isolated Word Recognizer (IWR) tolerant to such speech changes caused by speaker stress More than an order-of-magnitude reduction of the error rate was achieved for a 105 word simulated stress database and a 0% error rate was achieved for the TI 20 isolated word database

...read moreread less

55 citations

Book Chapter•DOI•

Learning about speech sounds: The NEXUS Project

[...]

Gary L. Bradshaw¹•Institutions (1)

University of Colorado Boulder¹

01 Jan 1987

TL;DR: NEXUS, a computer speech recognition system, incorporates learning heuristics based on this method that permit the system to identify a set of primitive acoustic concepts from experience with words, where the recognition error rate is only one-seventh that of traditional architectures.

...read moreread less

Abstract: Pattern recognition systems of necessity incorporate an approximate-matching process to determine the degree of similarity between an unknown input and all stored references. The matching process serves as an automatic generalization mechanism that permits each reference pattern to act as a set of specific instances. Learning mechanisms do not need to operate by manipulating hypotheses in an abstraction hierarchy, but instead can “seed” instances into the concept space, leaving generalization to the matching algorithm. This strategy represents an attractive alternative to data-driven generalization and discrimination techniques when the abstraction space is very large. NEXUS, a computer speech recognition system, incorporates learning heuristics based on this method that permit the system to identify a set of primitive acoustic concepts from experience with words. The efficacy of the resulting concepts is demonstrated by comparative recognition tests where the recognition error rate is only one-seventh that of traditional architectures.

...read moreread less

38 citations

Proceedings Article•DOI•

Signature recognition through spectral analysis

[...]

Chan Lam¹, D. Kamins, K. Zimmermann•Institutions (1)

Medical University of South Carolina¹

01 Apr 1987

TL;DR: A Gaussian probabilistic model was developed to screen and select from the large set of features and the significant harmonics of the signature were sorted according to the chi-square value, which is equivalent to the signal-to-noise ratio.

...read moreread less

Abstract: Features such as shape, motion and pressure, minutiae details and timing, and transformation methods such as Hadamard and Walsh have been used in signature recognition with various degrees of success. One of the better studies was done by Sato and Kogure using nonlinear warping function. However, it is time consuming in terms of computer time and programming time. In this research, the signatures were normalized for size, orientation, etc. After normalization, the X and Y coordinates of each sampled point of a signature over time (to capture the dynamics of signature writing) were represented as a complex number and the set of complex numbers transformed into the frequency domain via the fast Fourier transform. A Gaussian probabilistic model was developed to screen and select from the large set of features (e.g. amplitude of each harmonics). The significant harmonics of the signature were sorted according to the chi-square value, which is equivalent to the signal-to-noise ratio. Fifteen harmonics with the largest signal-to-noise ratios from the true signatures were used in a discriminant analysis. A total of eight true signatures from a single person and eight each from nineteen forgers were used. It results in an error rate of 2.5%, with the normally more conservative jacknife procedure yielding the same small error rate.

...read moreread less

37 citations

Patent•

Automatic cell transfer system with error rate assessment

[...]

Geoffrey Richard Scotton, Gary Wayne Kenward¹•Institutions (1)

Massachusetts Institute of Technology¹

09 Jun 1987

TL;DR: In this article, a mobile detects a threshold exit bit or symbol error rate in a message transmitted from the base station with which it is communicating, by sampling sequentially transmitted messages from adjacent base stations (on the same or different frequencies).

...read moreread less

Abstract: An automatic cell transfer is made, when a mobile detects certain threshold exit bit or symbol error rate in a message transmitted from the base station with which it is communicating, by the mobile then sampling sequentially transmitted messages from adjacent base stations (on the same or different frequencies) and monitoring their bit or symbol error rates until a base station is found having a bit or symbol error rate less than a preselected minimum entry threshold bit or symbol error rate which in turn is less than the exit bit or symbol error rate and transferring communication from the mobile to the base station so selected. In one arrangement, each base station repeatedly and intermittently transmits a known quality assessment message and the mobile continually reviews the received message for its error rate.

...read moreread less

35 citations

Proceedings Article•DOI•

Context-dependent phonetic Markov models for large vocabulary speech recognition

[...]

A.-M. Derouault¹•Institutions (1)

IBM¹

06 Apr 1987

TL;DR: This paper shows that both the analysis of the errors made by the recognizer, and linguistic facts about phonetic context influence, suggest a method for choosing context dependent models, which allows to limit the growing of the number of phonems, and still account for the most important coarticulation effects.

...read moreread less

Abstract: One approach to large vocabulary speech recognition, is to build phonetic Markov models, and to concatenate them to obtain word models. In previous work, we already designed a recognizer based on 40 phonetic Markov machines, which accepts a 10,000 words vocabulary ([3]), and recently 200,000 words vocabulary ([5]). Since there is one machine per phoneme, these models obviously do not account for coarticulatory effects, which may lead to recognition errors. In this paper, we improve the phonetic models by using general principles about coarticulation effects on automatic phoneme recognition. We show that both the analysis of the errors made by the recognizer, and linguistic facts about phonetic context influence, suggest a method for choosing context dependent models. This method allows to limit the growing of the number of phonems, and still account for the most important coarticulation effects. We present our experiments with a system applying these principles to a set of models for French. With this new system including context-dependant machines, the phoneme recognition rate goes from 82.2% to 85.3%, and the error rate on words with a 10,000 word dictionary, is decreased from 11.2 to 9.8%.

...read moreread less

Proceedings Article•DOI•

Two-stage discriminant analysis for improved isolated-word recognition

[...]

E.A. Martin¹, Richard P. Lippmann, D.B. Paul•Institutions (1)

Massachusetts Institute of Technology¹

01 Apr 1987

TL;DR: A two-stage isolated word speech recognition system that uses a Hidden Markov Model (HMM) recognizer in the first stage and a discriminant analysis system in the second stage, reducing the overall error rate by more than a factor of two.

...read moreread less

Abstract: This paper describes a two-stage isolated word speech recognition system that uses a Hidden Markov Model (HMM) recognizer in the first stage and a discriminant analysis system in the second stage. During recognition, when the first-stage recognizer is unable to clearly differentiate between acoustically similar words such as "go" and "no" the second-stage discriminator is used. The second-stage system focuses on those parts of the unknown token which are most effective at discriminating the confused words. The system was tested on a 35 word, 10,710 token stress speech isolated word data base created at Lincoln Laboratory. Adding the second-stage discriminating system produced the best results to date on this data base, reducing the overall error rate by more than a factor of two.

...read moreread less

Journal Article•DOI•

Articulation error migration: A comparison of single word and connected speech samples

[...]

Timothy J. Healy¹, Charles L. Madison¹•Institutions (1)

Washington State University¹

01 Apr 1987-Journal of Communication Disorders

TL;DR: In this article, the authors compared the frequency and type of articulation error, including error migration, between single word production and connected speech samples when vocabulary was held constant, and found that there were significantly more errors in connected samples than single word utterances.

...read moreread less

Proceedings Article•DOI•

Cepstral domain stress compensation for robust speech recogniton

[...]

Yeunung Chen¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Apr 1987

TL;DR: This paper presents a study of talker- stress-induced intraword variability, and an algorithm that compensates for the systematic changes observed, based on Hidden Markov Models trained by speech tokens in various talking styles.

...read moreread less

Abstract: Automtic speech recognition algorithms generally rely on the assumption that for the distance measure used, intraword variabilities are smaller than interword variabilities so that appropriate separation in the measurement space is possible. As evidenced by degradation of recognition perforrmnce, the validity of such an assumption decreases from simple tasks to complex tasks, from cooperative talkers to casual talkers, and from laboratory talking environments to practical talking environments. This paper presents a study of talker- stress-induced intraword variability, and an algorithm that commpensates for the systematic changes observed. The study is based on Hidden Markov Models trained by speech tokens in various talking styles. The talking styles include normal speech, fast speech, loud speech, soft speech, and talking with noise injected through earphones; the styles are designed to simulate speech produced under real stressful conditions. Cepstral coefficients are used as the parameters in the Hidden Markov Models. The stress compensation algorithm compensates for the variations in the cepstral coefficients in a hypothesis-driven manner. The functional form of the compensation is shown to correspond to the equalization of spectral tilts. Preliminary experiments indicate that a substantial reduction in recognition error rate can be achieved with relatively little increase in computation and storage requirements.

...read moreread less

Proceedings Article•DOI•

A stochastic segment model for phoneme-based continuous speech recognition

[...]

S. Roucos, M. Dunham

06 Apr 1987

TL;DR: The stochastic segment model, the recognition algorithm, and the iterative training algorithm for estimating segment models from continuous speech, including speaker-dependent continuous speech recognition, are described.

...read moreread less

Abstract: Developing accurate and robust phonetic models for the different speech sounds is a major challenge for high performance continuous speech recognition. In this paper, we introduce a new approach, called the stochastic segment model, for modelling a variable-length phonetic segment X, an L-long sequence of feature vectors. The stochastic segment model consists of 1) time-warping the variable-length segment X into a fixed-length segment Y called a resampled segment, and 2) a joint density function of the parameters of the resampled segment Y, which in this work is assumed Gaussian. In this paper, we describe the stochastic segment model, the recognition algorithm, and the iterative training algorithm for estimating segment models from continuous speech. For speaker-dependent continuous speech recognition, the segment model reduces the word error rate by one third over a hidden Markov phonetic model.

...read moreread less

Proceedings Article•DOI•

A frequency-weighted Itakura spectral distortion measure and its application to speech recognition in noise

[...]

F.K. Soong¹, Man Mohan Sondhi¹•Institutions (1)

Bell Labs¹

06 Apr 1987

TL;DR: A more noise-resistant, weighted spectral distortion measure which weights the high SNR regions in frequency more than the lowSNR regions, and weights spectral distortion more at the peaks than at the valleys of the spectrum.

...read moreread less

Abstract: The performance of a recognizer based on the Itakura spectral distortion measure deteriorates when speech signals are corrupted by noise, specially if it is not feasible to train and to test the recognizer under similar noise conditions. To alleviate this problem, we consider a more noise-resistant, weighted spectral distortion measure which weights the high SNR regions in frequency more than the low SNR regions. For the weighting function we choose a "bandwidth broadened" test spectrum; it weights spectral distortion more at the peaks than at the valleys of the spectrum. The amount of weighting is adapted according to an estimate of SNR, and becomes essentially constant in the noise-free case. The new measure has the dot product form and computaional efficiency of the Itakura distortion measure in the autocorrelation domain. It has been tested on a 10 speaker, isolated digit data base in a series of speaker independent speech recognition experiments. Additive white Gaussian noise was used to simulate different SNR conditions (from 5 dB to ∞ dB). The new measure performs as well as the original unweighted Itakura distortion measure at high SNR's, and significantly better at medium to low SNRs. At an SNR of 5 dB, the new measure achieves a digit error rate of 12.49% while the original Itakura distortion gives an error rate of 27.6%. The equivalent SNR improvement at low SNR's, is about 5 - 7 dB.

...read moreread less

Proceedings Article•DOI•

Speech recognition with very large size dictionary

[...]

Bernard Merialdo¹•Institutions (1)

IBM¹

06 Apr 1987

TL;DR: A new strategy is proposed, the Multi-Level Decoding (MLD), that allows to use a Very Large Size Dictionary (VLSD, size more than 100,000 words) in speech recognition and is experimented on a dictation task of French texts.

...read moreread less

Abstract: This paper proposes a new strategy, the Multi-Level Decoding (MLD), that allows to use a Very Large Size Dictionary (VLSD, size more than 100,000 words) in speech recognition. MLD proceeds in three steps: \bullet a Syllable Match procedure uses an acoustic model to build a list of the most probable syllables that match the acoustic signal from a given time frame. \bullet from this list, a Word Match procedure uses the dictionary to build partial word hypothesis. \bullet then a Sentence Match procedure uses a probabilistic language model to build partial sentence hypothesis until total sentences are found. An original matching algorithm is proposed for the Syllable Match procedure. This strategy is experimented on a dictation task of French texts. Two different dictionaries are tested, \bullet one composed of the 10,000 most frequent words, \bullet the other composed of 200,000 words. The recognition results are given and compared. The error rate on words with 10,000 words is 17.3%. If the errors due to the lack of coverage are not counted, the error rate with 10,000 words is reduced to 10.6%. The error rate with 200,000 words is 12.7%.

...read moreread less

Patent•DOI•

System for registering speech information to make a voice dictionary

[...]

Hirohiko Katayama¹•Institutions (1)

Canon Inc.¹

16 Nov 1987-Journal of the Acoustical Society of America

TL;DR: A system for having a mode for registering storing a necessary typical spoken word pattern for use in speech recognition and inputting a same spoken word a plurality of times.

...read moreread less

Abstract: A system for having a mode for registering storing a necessary typical spoken word pattern for use in speech recognition and inputting a same spoken word a plurality of times. In the mode for registering a word pattern, a previous input spoken word, input prior to the Nth time, is reproduced. After listening to the reproduced spoken prompt word, the Nth input of the same spoken word is effected. The system registers one typical spoken word of a well-averaged voice pattern with high accuracy while increasing the system speech recognition rate.

...read moreread less

Patent•

Measuring method for assessing the quality of speech coders and/or transmission routes

[...]

Joachim Zinke, Jens Dr Ing Weber

12 Mar 1987

TL;DR: In this article, a comparison is carried out between the true meaning provided from a transmission memory and the meaning of a speech sample or a test signal recognized by the speech recogniser or speaker recogniser.

...read moreread less

Abstract: In the measuring method according to the invention, speech samples and/or test signals, from which a speech recogniser or speaker recogniser has previously formed a reference pattern during a learning phase, are presented to this speech recogniser or speaker recogniser via a speech coder to be assessed or a transmission route to be tested Using an evaluation computer a comparison is carried out between the true meaning provided from a transmission memory and the meaning of a speech sample or a test signal recognised by the speech recogniser or speaker recogniser In this process, an error rate or a measure of the reliability of recognition is simultaneously calculated over a measurement cycle

...read moreread less

Proceedings Article•DOI•

HMM-Based speech recognition using multi-dimensional multi-labeling

[...]

Masafumi Nishimura¹, K. Toshioka•Institutions (1)

IBM¹

01 Apr 1987

TL;DR: A new vector quantization (VQ; so-called labeling) method of a speech recognition system based on hidden Markov model (HMM) which generates multiple labels at each frame while keeping a conventional HMM formulation.

...read moreread less

Abstract: This paper describes a new vector quantization (VQ; so-called labeling) method of a speech recognition system based on hidden Markov model (HMM). For improving the VQ accuracy in a simple manner, "multi-labeling" which generates multiple labels at each frame was introduced while keeping a conventional HMM formulation. Furthermore, in order to represent characteristics of speech accurately and effectively, "multi-dimensional labeling" was also introduced which quantizes multiple features such as spectral dynamics and spectrum independently. This labeling method was tested in an isolated word recognition task using 150 Japanese confusable words. The recognition error rate was roughly reduced to 1/2 or less compared with the conventional method.

...read moreread less

Journal Article•DOI•

Error rate performance of narrowband multilevel CPFSK signals

[...]

N. Ekanayake¹, K.J.P. Fonseka²•Institutions (2)

Memorial University of Newfoundland¹, Arizona State University²

01 Apr 1987

TL;DR: It is shown that coherent Detection and differential decoding yields better performance than limiter-discriminator detection and differential detection, whereas two noncoherent detectors yield approximately identical performance.

...read moreread less

Abstract: The paper presents a relatively simple method for analysing the effect of IF filtering on the performance of multilevel FM signals. Using this method, the error rate performance of narrowband FM signals is analysed for three different detection techniques, namely limiter-discriminator detection, differential detection and coherent detection followed by differential decoding. The symbol error probabilities are computed for a Gaussian IF filter and a second-order Butterworth IF filter. It is shown that coherent detection and differential decoding yields better performance than limiter-discriminator detection and differential detection, whereas two noncoherent detectors yield approximately identical performance.

...read moreread less

Proceedings Article•DOI•

Robust linear prediction for speech analysis

[...]

Chin-Hui Lee¹•Institutions (1)

Bell Labs¹

01 Apr 1987

TL;DR: Preliminary experiments on natural speech data indicate that the robust LP procedure is relatively insensitive to the placement of the LPC analysis window and to the value of the pitch period, for a given section of speech signal.

...read moreread less

Abstract: In this paper, a robust linear prediction algorithm is proposed. Rather than minimizing the sum of squared residuals as in the conventional linear prediction procedures, the robust LP procedure minimizes the sum of appropriately weighted residuals. The weight is a function of the prediction residual, and the cost function is selected to give more weight to the bulk of smaller residuals while de-weighting the small portion of large residuals. Based on Robustness Theory, the proposed algorithm will always give a more efficient (lower variance) estimate for the prediction coefficients if the excitation source is of Gaussian mixture such that a large portion of the excitations are from a normal distribution with a very small variance while a small portion of the excitations at the glottal openings and closures are from some unknown distribution with a much larger variance. The robust LP algorithm can be used in the front-end feature extractor for a speech recognition system and as an analyzer for a speech coding system. Testing on synthetic vowel data demonstrates that the robust LP procedure is able to reduce the formant and bandwidth error rate by more than an order of magnitude compared to the conventional LP procedures. Preliminary experiments on natural speech data indicate that the robust LP procedure is relatively insensitive to the placement of the LPC analysis window and to the value of the pitch period, for a given section of speech signal.

...read moreread less

Proceedings Article•DOI•

A connected speech recognition system based on spotting diphone-like segments--Preliminary results

[...]

Aaron E. Rosenberg¹, A. Colla•Institutions (1)

Bell Labs¹

01 Apr 1987

TL;DR: A template-based connected speech recognition system, which represents words as sequences of diphone-like segments, has been implemented and evaluated and an evaluation of the recognizer has been carried out on a database of connected digit utterances spoken by a single male talker.

...read moreread less

Abstract: A template-based connected speech recognition system, which represents words as sequences of diphone-like segments, has been implemented and evaluated. The inventory of segments is divided into two principal classes: "steady-state" speech sounds such as vowels, fricatives, and nasals, and "composite" speech sounds consisting of sequences of two or more speech sounds in which the transitions from one sound to another are intrinsic to the representation of the composite sound. Templates representing these segments are extracted from labelled training utterances. Words are represented by network models whose branches are diphone segments. Word juncture phenomena are accommodated by including segment branches that characterize transition pronunciations between specified claases of words. The recognition of a word in a specified utterance takes place by "spotting" all the segments contained in the model of the word. Putative words and word combinations are found by searching for best scoring sequences of segments specified by the models subject to segment separation constraints. A pruning procedure finds the best scoring string of words subject to constraints on word lengths, separations, and overlaps. An evaluation of the recognizer has been carried out on a database of connected digit utterances spoken by a single male talker. Templates are extracted from half the database consisting of 2100 digit utterances and system performance tested on the remaining 2100 utterances. The performance obtained to date is approximately 2% digit error rate and 7 to 8% digit string error rate.

...read moreread less

Proceedings Article•DOI•

Spoken sentence recognition by time-synchronous parsing algorithm of context-free grammar

[...]

Seiichi Nakagawa¹•Institutions (1)

Toyohashi University of Technology¹

01 Apr 1987

TL;DR: A new continuous speech recognition method by phoneme-based word spotting and time-synchronous context-free parsing that is task-independent in terms of reference patterns and task language.

...read moreread less

Abstract: This paper proposes a new continuous speech recognition method by phoneme-based word spotting and time-synchronous context-free parsing The word pattern is composed of the concatenation of phoneme patterns The knowledge of syntax is given in Backus Normal Form Therefore, our method is task-independent in terms of reference patterns and task language The system first spots word candidates in an input sentence, and then generates a word lattice The word spotting is performed by a dynamic time warping method Secondly, it selects the best word sequences found in the word lattice from all possible sentences which are defined by a context-free grammar

...read moreread less

The application of phoneme sequence constraints to word boundary identification in automatic, continuous speech recognition.

[...]

Jonathan Harrington, Ian Johnson, Maggie Cooper

01 Jan 1987

TL;DR: This study examines the set of CV, VC, CVC and some CCVC sequences which are non-occurring in monomorphemic words in a 20,000 word lexicon and suggests that many sequences in which the prevocalic and postvocalic consonants are similar, or identical, are excluded.

...read moreread less

Abstract: This study examines the set of CV, VC, CVC and some CCVC sequences which are non-occurring in monomorphemic words in a 20,000 word lexicon. A preliminary analysis suggests that many sequences in which the prevocalic and postvocalic consonants are similar, or identical, are excluded. The sequences are discussed in relation to 'reduced forms', characteristic offast speech, word boundary assimilation and lexical access.

...read moreread less

Phoneme-based continuous speech recognition without pre-segmentation.

[...]

Yifan Gong, Jean Paul Haton

01 Jan 1987

Journal Article•DOI•

A comparison of two average conditional error rate estimators

[...]

G. M. Fitzmaurice¹, David J. Hand•Institutions (1)

New York University¹

01 Sep 1987-Pattern Recognition Letters

TL;DR: It is shown how choice of a good metric in nearest neighbour estimates of posterior probability can lead to improved average conditional error rate estimates.

...read moreread less

Patent•

Multilevel qam communication system

[...]

Fukuda Eisuke, Noboru Iizuka, Nakamura Yasuhisa, Yoichi Saito, Sadao Takenaka - Show less +1 more

09 Mar 1987

TL;DR: In this article, the error correction encoder/decoder in the inside of a differential logic was proposed to improve the error rate of a multilevel QAM communication system.

...read moreread less

Abstract: PURPOSE: To improve an error rate characteristic with comparatively simple constitution, by providing a required error correction encoder/decoder in the inside of a differential logic. CONSTITUTION: In a multilevel Quadrature Amplitude Modulation(QAM) communication system on which the differential ligic is applied setting a multivalue as Z n , the error correction encoder 2 which performs error encoding independently for each of (n) series is provided in the inside of a differential logic part 1 on a transmission side, and a multilevel QAM signal is transmitted via the encoder 2 and a transmission modulation part 3. Also, on a reception side, the error correction decoder 5 is provided in the inside of a differential logic part 6 similarly. ln such a way, the expansion of a one-bit error to a two-bit error due to the continuance of the one-bit error which occurs in a case where the error correction encoder/decoder is arranged outside the logic can be prevented from being generated, thereby, no two-bit error correction circuit, etc., is required, and it is possible to heighten an error rate characteristic with comparatively simple constitution. COPYRIGHT: (C)1988,JPO&Japio

...read moreread less

Patent•

Detection of digital signal error rates

[...]

Thomas Cook¹•Institutions (1)

Hewlett-Packard¹

14 Apr 1987

TL;DR: In this paper, the error rates above a given threshold are detected by initiating a counter to count a group of n bits on each occurrence of an error bit, and then inspecting the counters on each instance of the error to see whether the counter initiated x error bits earlier is still counting.

...read moreread less

Abstract: Error rates above a given threshold are detected by initiating a counter to count a group of n bits on each occurrence of an error bit. The counters are inspected on each occurrence of an error to see whether the counter initiated x error bits earlier is still counting. If the counter is still counting the error rate is above a threshold of x error bits in a group of n bits in a serial stream.

...read moreread less

Proceedings Article•DOI•

Unsupervised bootstrapping of diphone-like templates for connected speech recognition

[...]

A. Colla, A. Rosenberg

01 Apr 1987

TL;DR: An unsupervised procedure for the construction of template sets for connected speech recognition based on a "segment spotting" approach, where the segments are diphone-like units.

...read moreread less

Abstract: This paper describes an unsupervised procedure for the construction of template sets for connected speech recognition. The procedure has been developed for use in a speech recognition system based on a "segment spotting" approach, where the segments are diphone-like units. The procedure makes use of both phonetic and acoustic knowledge: the former consists of a model of all the words in the task language in terms of the chosen units; the latter is implicitly represented by an initial set of "training" templates. The performance obtained by using the bootstrapped templates in a connected digit recognition task is good (average word error rate of less than 4%).

...read moreread less