scispace - formally typeset
Search or ask a question

Showing papers on "Word error rate published in 1989"


Proceedings Article
01 Jan 1989
TL;DR: Minimal preprocessing of the data was required, but architecture of the network was highly constrained and specifically designed for the task, and has 1% error rate and about a 9% reject rate on zipcode digits provided by the U.S. Postal Service.
Abstract: We present an application of back-propagation networks to handwritten digit recognition. Minimal preprocessing of the data was required, but architecture of the network was highly constrained and specifically designed for the task. The input of the network consists of normalized images of isolated digits. The method has 1% error rate and about a 9% reject rate on zipcode digits provided by the U.S. Postal Service.

3,324 citations


Proceedings ArticleDOI
23 May 1989
TL;DR: The authors present two simple tests for deciding whether the difference in error rates between two algorithms tested on the same data set is statistically significant.
Abstract: The authors present two simple tests for deciding whether the difference in error rates between two algorithms tested on the same data set is statistically significant. The first (McNemar's test) requires the errors made by an algorithm to be independent events and is found to be most appropriate for isolated-word algorithms. The second (a matched-pairs test) can be used even when errors are not independent events and is more appropriate for connected speech. >

715 citations


Journal ArticleDOI
TL;DR: In this paper, a cross-modal priming technique was used to investigate the extent to which a rhyme prime (a prime that differs only in its first segment from the word that is semantically associated with the visual probe) is as effective a prime as the original word itself.
Abstract: Approaches to spoken word recognition differ in the importance they assign to word onsets during lexical access. This research contrasted the hypothesis that lexical access is strongly directional with the hypothesis that word onsets are less important than the overall goodness of fit between input and lexical form. A cross-modal priming technique was used to investigate the extent to which a rhyme prime (a prime that differs only in its first segment from the word that is semantically associated with the visual probe) is as effective a prime as the original word itself. Earlier research had shown that partial primes that matched from word onset were very effective cross-modal primes. The present results show that, irrespective of whether the rhyme prime was a real word or not, and irrespective of the amount of overlap between the rhyme prime and the original word, the rhymes are much less effective primes than the full word. In fact, no overall priming effect could be detected at all except under conditions in which the competitor environment was very sparse. This suggests that word onsets do have a special status in the lexical access of spoken words. A fundamental property of the speech signal is its intrinsic directionality in time. Spoken utterances are spread out along the time line, moving necessarily from beginning to end, in a way that is not true of written language. This directionality of the speech input is strongly reflected in the claims made by the cohort model of spoken word recognition for the manner in which speech inputs are mapped onto the representations of word forms in the mental lexicon (Marslen-Wilson, 1984, 1987; Marslen-Wilson & Tyler, 1980; Marslen-Wilson & Welsh, 1978; Tyler, 1984; Tyler & Wessels, 1983; Warren & Marslen-Wilson, 1987). The cohort model of word recognition stresses the sequential and continuous nature of the mapping between the speech input and mental representations of word forms. This emphasis is closely tied up with the concept of a cohort and its implications for the properties of the on-line lexical decision space. In particular, according to the cohort model, the decision space is determined by the beginnings of words. The speech input at the beginning of the word maps onto all lexical items that share the same initial sequence. This initial set of candidates is termed the word-initial cohort, and the subsequent process of word recognition is determined by the

372 citations


Journal ArticleDOI
TL;DR: The data indicate that the presence in the neighborhood of at least one unit of higher frequency than the stimulus word itself results in interference in stimulus word processing.
Abstract: Most current models of visual word recognition assume that the recognition process may be subdivided into at least three phases, which are referred to here as candidate generation, candidate selection, and conscious identifica­ tion. In the candidate generation phase, the word input contacts a number of orthographically similar lexical representations in memory. Using information from the continued sensory analysis and any contextual informa­ tion available, one of these candidates is selected for con­ scious identification. This general conception of the word recognition process adopts various precise forms in models such as Becker's (1976) verification model; Forster's (1976) search model; McClelland and Rumel­ hart's (1981) interactive-activation model; Morton's (1970) logogen model; Norris's (1986) checking model; and Paap, Newsome, McDonald, and Schvaneveldt's (1982) activation-verification model. These different models place different constraints on the individual oper­ ation of these subprocesses and the ways they interact, but the models all assume that both sublexical and whole­ word units (other than the stimulus word itself) are in­ volved in visual word recognition. Research concerned with the role of whole-word units in word recognition has concentrated largely on establish­ ing effects due to various parameters of the stimulus word itself (e.g., printed frequency, repetition, orthographic and

315 citations


PatentDOI
TL;DR: In this paper, a method for deriving acoustic word representations for use in speech recognition is presented, which involves using dynamic programming to derive a corresponding initial sequence of probabilistic acoustic sub-models for the word independently of any previously derived acoustic model particular to the word.
Abstract: A method is provided for deriving acoustic word representations for use in speech recognition Initial word models are created, each formed of a sequence of acoustic sub-models The acoustic sub-models from a plurality of word models are clustered, so as to group acoustically similar sub-models from different words, using, for example, the Kullback-Leibler information as a metric of similarity Then each word is represented by cluster spelling representing the clusters into which its acoustic sub-models were placed by the clustering Speech recognition is performed by comparing sequences of frames from speech to be recognized against sequences of acoustic models associated with the clusters of the cluster spelling of individual word models The invention also provides a method for deriving a word representation which involves receiving a first set of frame sequences for a word, using dynamic programming to derive a corresponding initial sequence of probabilistic acoustic sub-models for the word independently of any previously derived acoustic model particular to the word, using dynamic programming to time align each of a second set of frame sequences for the word into a succession of new sub-sequences corresponding to the initial sequence of models, and using these new sub-sequences to calculate new probabilistic sub-models

257 citations


Proceedings ArticleDOI
23 May 1989
TL;DR: A description is presented of the authors' current research on automatic speech recognition of continuously read sentences from a naturally-occurring corpus: office correspondence, which combines features from their current isolated-word recognition system and from their previously developed continuous-speech recognition system.
Abstract: A description is presented of the authors' current research on automatic speech recognition of continuously read sentences from a naturally-occurring corpus: office correspondence. The recognition system combines features from their current isolated-word recognition system and from their previously developed continuous-speech recognition system. It consists of an acoustic processor, an acoustic channel model, a language model, and a linguistic decoder. Some new features in the recognizer relative to the isolated-word speech recognition system include the use of a fast match to prune rapidly to a manageable number the candidates considered by the detailed match, multiple pronunciations of all function words, and modeling of interphone coarticulatory behavior. The authors recorded training and test data from a set of ten male talkers. The perplexity of the test sentences was found to be 93; none of sentences was part of the data used to generate the language model. Preliminary (speaker-dependent) recognition results on these talkers yielded an average word error rate of 11.0%. >

251 citations


Journal ArticleDOI
TL;DR: A description is given of an implementation of a novel frame-synchronous network search algorithm for recognizing continuous speech as a connected sequence of words according to a specified grammar that is inherently based on hidden Markov model (HMM) representations.
Abstract: A description is given of an implementation of a novel frame-synchronous network search algorithm for recognizing continuous speech as a connected sequence of words according to a specified grammar. The algorithm, which has all the features of earlier methods, is inherently based on hidden Markov model (HMM) representations and is described in an easily understood, easily programmable manner. The new features of the algorithm include the capability of recording and determining (unique) word sequences corresponding to the several best paths to each grammar node, and the capability of efficiently incorporating a range of word and state duration scoring techniques directly into the forward search of the algorithm, thereby eliminating the need for a postprocessor as in previous implementations. It is also simple and straightforward to incorporate deterministic word transition rules and statistical constraints (probabilities) from a language model into the forward search of the algorithm. >

131 citations


Proceedings ArticleDOI
23 May 1989
TL;DR: The Lincoln stress-resistant HMM (hidden Markov model) CSR has been extended to large-vocabulary continuous speech for both speaker-dependent (SD) and speaker-independent (SI) tasks.
Abstract: The Lincoln stress-resistant HMM (hidden Markov model) CSR has been extended to large-vocabulary continuous speech for both speaker-dependent (SD) and speaker-independent (SI) tasks. Performance on the DARPA resource management task (991-word vocabulary, perplexity 60 word-pair grammar) is 3.5% word error rate for SD training of word-context-dependent triphone models and 12.6% word error rate for SI training of (word-context-free) tied-mixture triphone models. >

77 citations


Journal ArticleDOI
01 Sep 1989
TL;DR: The error rates of linear classifiers that utilize various criterion functions are investigated for the case of two normal distributions with different variances and a priori probabilities, finding that the classifier based on the least mean squares criterion often performs considerably worse than the Bayes rate.
Abstract: The error rates of linear classifiers that utilize various criterion functions are investigated for the case of two normal distributions with different variances and a priori probabilities. It is found that the classifier based on the least mean squares (LMS) criterion often performs considerably worse than the Bayes rate. The perceptron criterion (with suitable safety margin) and the linearized sigmoid generally lead to lower error rates than the LMS criterion, with the sigmoid usually the better of the two. Also investigated are the exceptions to the general trends: only if one class is known to have much larger a priori probability or variance than the other should one expect the LMS or perceptron criteria to be slightly preferable as far as error rate is concerned. The analysis is related to the performance of the back-propagation (BP) classifier, giving some understanding of the success of BP. A neural-net classifier, the adaptive-clustering classifier, suggested by this analysis is compared with BP (modified by using a conjugate-gradient optimization technique) for two problems. It is found that BP usually takes significantly longer to train than the adaptive-clustering technique. >

75 citations


Proceedings Article
01 Jan 1989
TL;DR: The results suggest that classifier selection should often depend more heavily on practical considerations concerning memory and computation resources, and restrictions on training and classification times than on error rate.
Abstract: Eight neural net and conventional pattern classifiers (Bayesian-unimodal Gaussian, k-nearest neighbor, standard back-propagation, adaptive-stepsize back-propagation, hypersphere, feature-map, learning vector quantizer, and binary decision tree) were implemented on a serial computer and compared using two speech recognition and two artificial tasks. Error rates were statistically equivalent on almost all tasks, but classifiers differed by orders of magnitude in memory requirements, training time, classification time, and ease of adaptivity. Nearest-neighbor classifiers trained rapidly but required the most memory. Tree classifiers provided rapid classification but were complex to adapt. Back-propagation classifiers typically required long training times and had intermediate memory requirements. These results suggest that classifier selection should often depend more heavily on practical considerations concerning memory and computation resources, and restrictions on training and classification times than on error rate.

64 citations


Journal ArticleDOI
TL;DR: This paper describes how the design for the modified Kanerva model is derived from Kanerva's original theory, and develops a method to deal with the time varying nature of the speech signal by recognizing static patterns together with a fixed quantity of contextual information.

Journal ArticleDOI
TL;DR: A fast Fourier transform is used to transform normalized signatures into the frequency domain and results in an error rate of 2.5 per cent with the generally more conservative jackknife procedure yielding the same small error rate.

Proceedings ArticleDOI
23 May 1989
TL;DR: In this paper, a linear-predictive-coding-based formant extraction method was used to improve the accuracy of the automatic language identification algorithm, reducing the error rate by more than 50%.
Abstract: A description is given of enhancements to an automatic language identification algorithm previously reported. The algorithm, based on linear-predictive-coding-based formant extraction, was greatly improved, reducing the error rate by more than 50%. This performance was achieved on a large (>9 h), very noisy, six-language database, using trials of less than 10 s. Experiments that improved performance are described, including tests of various distance metrics, expanded and modified parameter sets, and a new voicing statistic. Final performance results were obtained as a function of time, signal-to-noise ratio, and no-decision rate. A new rejection capability was developed to address the open-set identification problem. >

Proceedings ArticleDOI
23 May 1989
TL;DR: A unified framework is discussed which can be used to accomplish the goal of creating effective basic models of speech and points out the relative advantages of each type of speech unit based on the results of a series of recognition experiments.
Abstract: The problem of how to select and construct a set of fundamental unit statistical models suitable for speech recognition is addressed. A unified framework is discussed which can be used to accomplish the goal of creating effective basic models of speech. The performances of three types of fundamental units, namely whole word, phoneme-like, and acoustic segment units, in a 1109-word vocabulary speech recognition task are compared. The authors point out the relative advantages of each type of speech unit based on the results of a series of recognition experiments. >

Journal ArticleDOI
TL;DR: An automatic speaker adaptation algorithm for speech recognition, in which a small amount of training material of unspecified text can be used, which reduces the mean word recognition error rate from 4.9 to 2.9%.
Abstract: The author proposes an automatic speaker adaptation algorithm for speech recognition, in which a small amount of training material of unspecified text can be used. The algorithm is easily applied to vector-quantization- (VQ) speech recognition systems consisting of a VQ codebook and a word dictionary in which each word is represented as a sequence of codebook entries. In the adaptation algorithm, the VQ codebook is modified for each new speaker, whereas the word dictionary is universally used for all speakers. The important feature of this algorithm is that a set of spectra in training frames and the codebook entries are clustered hierarchically. Based on the vectors representing deviation between centroids of the training frame clusters and the corresponding codebook clusters, adaptation is performed hierarchically from small to large numbers of clusters. The spectral resolution of the adaptation process is improved accordingly. Results of recognition experiments using utterances of 100 Japanese city names show that adaptation reduces the mean word recognition error rate from 4.9 to 2.9%. Since the error rate for speaker-dependent recognition is 2.2%, the adaptation method is highly effective. >

Proceedings ArticleDOI
15 Oct 1989
TL;DR: A preliminary investigation of techniques that automatically detect when the speaker has used a word that is not in the vocabulary, and develops a technique that uses a general model for the acoustics of any word to recognize the existence of new words.
Abstract: In practical large vocabulary speech recognition systems, it is nearly impossible for a speaker to remember which words are in the vocabulary. The probability of the speaker using words outside the vocabulary can be quite high. For the case when a speaker uses a new word, current systems will always' recognize other words within the vocabulary in place of the new word, and the speaker wouldn't know what the problem is.In this paper, we describe a preliminary investigation of techniques that automatically detect when the speaker has used a word that is not in the vocabulary. We developed a technique that uses a general model for the acoustics of any word to recognize the existence of new words. Using this general word model, we measure the correct detection of new words versus the false alarm rate.Experiments were run using the DARPA 1000-word Resource Management Database for continuous speech recognition. The recognition system used is the BBN BYBLOS continuous speech recognition system (Chow et al., 1987). The preliminary results indicate a detection rate of 74% with a false alarm rate of 3.4%.

Proceedings ArticleDOI
23 May 1989
TL;DR: It is concluded that there are pattern classification tasks in which an ANN is able to make better use of training data to achieve a lower error rate with a particular size training set.
Abstract: Experiments comparing artificial neural network (ANN), k-nearest-neighbor (KNN), and Bayes' rule with Gaussian distributions and maximum-likelihood estimation (BGM) classifiers were performed. Classifier error rate as a function of training set size was tested for synthetic data drawn from several different probability distributions. In cases where the true distributions were poorly modeled, ANN was significantly better than BGM. In some cases, ANN was also better than KNN. Similar experiments were performed on a voiced/unvoiced speech classification task. ANN had a lower error rate than KNN or BGM for all training set sizes, although BGM approached the ANN error rate as the training set became larger. It is concluded that there are pattern classification tasks in which an ANN is able to make better use of training data to achieve a lower error rate with a particular size training set. >

Proceedings ArticleDOI
23 May 1989
TL;DR: A phoneme environment clustering algorithm, which automatically selects an optimal set of allophones and estimates the missing context, is presented, which additionally gives the means to analyze coarticulation effects automatically and quantitatively.
Abstract: A general principle is proposed to solve problems in context-dependent phoneme segment (or subword unit) based speech recognition, namely, how to choose the set of units and how to estimate the context effect missing in the training data. A phoneme environment clustering algorithm, which automatically selects an optimal set of allophones and estimates the missing context, is presented. This algorithm additionally gives the means to analyze coarticulation effects automatically and quantitatively. The problem is formulated as a clustering technique in phoneme environment space to approximate the mapping function from phoneme environment space to phoneme pattern space by a limited number of centroid patterns based on a distortion measure defined on the phoneme pattern space. The algorithm is tested for phoneme recognition and word recognition, and results are discussed. >

Proceedings ArticleDOI
23 May 1989
TL;DR: Three methods for smoothing discrete probability functions in discrete hidden Markov models for large-vocabulary continuous-speech recognition are presented and results show a 20-30% reduction in error rate.
Abstract: Three methods for smoothing discrete probability functions in discrete hidden Markov models for large-vocabulary continuous-speech recognition are presented. The smoothing is based on deriving a probabilistic co-occurrence matrix between the different vector-quantized spectra. Each estimated probability density is then multiplied by this matrix, ensuring that none of the probabilities are severely underestimated due to lack of training data. Experimental results show a 20-30% reduction in error rate when this smoothing is used. A word error rate of 3.0% is achieved with the DARPA 1000-word continuous speech recognition database and a word-pair grammar with a perplexity of 60. >

Proceedings ArticleDOI
A. Paeseler1, Hermann Ney1
23 May 1989
TL;DR: The authors describe the design of a stochastic language model and its integration into a continuous-speech recognition system that is part of the SPICOS system for understanding database queries spoken in natural language.
Abstract: The authors describe the design of a stochastic language model and its integration into a continuous-speech recognition system that is part of the SPICOS system for understanding database queries spoken in natural language. The recognition strategy is based on statistical decision theory. The stochastic language model for the recognition of database queries is based on probabilities of trigrams, bigrams, and unigrams of word categories, which are intended to reflect lexical and semantic aspects of the SPICOS task. The implementation of stochastic language models in the search procedure is described, and results of recognition experiments are given. By using a stochastic model (perplexity = 124) a reduction of the word error rate from 21.8% without language model (perplexity = 917) to 9.1% was achieved. >

Journal ArticleDOI
TL;DR: It is shown that a version of the model discussed in Cutler & Norris 1988 based on the distinction between “strong” and “weak” vowels enables over 40% of word boundaries to be correctly located at the broad class level although many word boundaries are also inserted at inappropriate points.

Proceedings ArticleDOI
23 May 1989
TL;DR: In this paper, a method is devised that uses the differences in spectral slope between linear predictive coding log magnitude spectra to weight the point-by-point energy differences between the spectra.
Abstract: The major goal of this research is to reduce the discrepancy in recognition performance between normal and abnormal speech, given that reference templates were derived only from normal speech. A method is devised that uses the differences in spectral slope between linear predictive coding log magnitude spectra to weight the point-by-point energy differences between the spectra. The distances of all reference tokens of like phonemes are combined to form a smallest cumulative distance (SCD) method. When SCD is combined with the method of slope-dependent weighting (SDW), the most significant success is obtained. In terms of error rates for a fixed phoneme vector length of five, SDW+SCD is found to reduce the difference in error rate between normal and abnormal speech by approximately 50%. >

Journal ArticleDOI
Fred J. Damerau1, Eric Mays1
TL;DR: An experiment on a large body of text shows that an increase in the word list size decreases the error rate.
Abstract: We examine the effect of increasing word list size on the error rate of spelling correctors. An experiment on a large body of text shows that an increase in the word list size decreases the error rate.

Journal ArticleDOI
TL;DR: In this article, a leave-one-out method is proposed for estimating the true error rate of the selected variables, or alternatively of the selection procedure itself, and Monte Carlo simulations demonstrate the feasibility of the proposed method and indicate its much greater accuracy relative to that of other available methods.
Abstract: Summary Linear discriminant analysis between two populations is considered in this paper. Error rate is reviewed as a criterion for selection of variables, and a stepwise procedure is outlined that selects variables on the basis of empirical estimates of error. Problems with assessment of the selected variables are highlighted. A leave-one-out method is proposed for estimating the true error rate of the selected variables, or alternatively of the selection procedure itself. Monte Carlo simulations, of multivariate binary as well as multivariate normal data, demonstrate the feasibility of the proposed method and indicate its much greater accuracy relative to that of other available methods.

Proceedings ArticleDOI
21 Feb 1989
TL;DR: The algorithms used in the BBN BYBLOS Continuous Speech Recognition system are described and a method for smoothing the discrete densities on the states of the HMM, which is intended to alleviate the problem of insufficient training for detailed phonetic models is presented.
Abstract: In this paper we describe the algorithms used in the BBN BYBLOS Continuous Speech Recognition system. The BYBLOS system uses context-dependent hidden Markov models of phonemes to provide a robust model of phonetic coarticulation. We provide an update of the ongoing research aimed at improving the recognition accuracy. In the first experiment we confirm the large improvement in accuracy that can be derived by using spectral derivative parameters in the recognition. In particular, the word error rate is reduced by a factor of two. Currently the system achieves a word error rate of 2.9% when tested on the speaker-dependent part of the standard 1000-Word DARPA Resource Management Database using the Word-Pair grammar supplied with the database. When no grammar was used, the error rate is 15.3%. Finally, we present a method for smoothing the discrete densities on the states of the HMM, which is intended to alleviate the problem of insufficient training for detailed phonetic models.

Patent
Hiroshi Nakane1
22 Sep 1989
TL;DR: In this paper, an apparatus for reading data stored on a data storage medium is presented, which includes a data readout device coupled with an error detecter coupled to the data read out device for detecting an error rate at which errors occur in the data reproduced by the readout devices.
Abstract: An apparatus for reading data stored on a data storage medium. The apparatus includes a data readout device for reproducing data stored on the data storage medium, an error detecter coupled to the data readout device for detecting an error rate at which errors occur in the data reproduced by the data readout device, a readout controller for operating the data readout device at a selected data readout rate and a system controller coupled to the error detecter and the readout controller for comparing the error rate to a predetermined system maximum error rate and for establishing the selected data readout rate such that the error rate is below the predetermined system maximum error rate.

Proceedings ArticleDOI
F.K. Soong1
23 May 1989
TL;DR: A phonetically labeled acoustic segment (PLAS) approach is proposed for speech analysis-synthesis by means of a bidirectional context-constrained mapping between a phonetic space and an acoustic space.
Abstract: A phonetically labeled acoustic segment (PLAS) approach is proposed for speech analysis-synthesis. The goal is to develop a unified framework for general speech processing by means of a bidirectional context-constrained mapping between a phonetic space and an acoustic space. The PLAS analysis module is a continuous phone (phoneme) recognizer, while the PLAS synthesis module is a phonetically organized acoustic database. To regulate the proposed mapping in a phonetically structured manner, phone context-dependency was imposed in phone modeling, recognition, and synthesis. The PLAS approach was tested successfully on a database of continuously spoken Japanese utterances recorded by a single male talker. The automatic segmentation boundaries derived from modeling PLAS units agreed well with corresponding manual segmentation points, i.e. they were within a +or-20-ms interval 95% of the time. A 4% phoneme recognition error rate was obtained in a continuous recognition test. Natural-sounding speech was synthesized at an average bit rate of 55 b/s allocated to segmental information. >

Proceedings ArticleDOI
21 Feb 1989
TL;DR: The Lincoln stress-resistant HMM CSR has been extended to large vocabulary continuous speech for both speaker-dependent (SD) and speaker-independent (SI) tasks.
Abstract: The Lincoln stress-resistant HMM CSR has been extended to large vocabulary continuous speech for both speaker-dependent (SD) and speaker-independent (SI) tasks. Performance on the DARPA Resource Management task (991 word vocabulary, perplexity 60 word-pair grammar) [1] is 3.4% word error rate for SD training of word-context-dependent triphone models and 12.6% word error rate for SI training of (word-context-free) tied mixture triphone models.

PatentDOI
TL;DR: A dynamic programming algorithm is used in which time warping to match a sample to a reference is in effect permitted, and in which matching is performed with unconstrained endpoints and the error which can be introduced by even the best preliminary decision as to word boundaries is avoided.
Abstract: A cost-effective word recognizer. Each frame of spoken input is compared to a set of reference frames. The comparison is equivalent to embodying the reference frame as an LPC inverse filter, and is preferably done in the autocorrelation domain. To avoid the instability and computational difficulties which can be caused by a high-gain LPC inverse filter, a noise floor is introduced into each reference frame sample. Thus, for each input speech frame, a scalar measures its similarity to each of the vocabulary of reference frames. To achieve connected word recognition based on this similarity measurement, a dynamic programming algorithm is used in which time warping to match a sample to a reference is in effect permitted, and in which matching is performed with unconstrained endpoints. Thus, the word boundary decisions are made on the basis of a local maximum in similarity, and, since no separate word division decision is required, the error which can be introduced by even the best preliminary decision as to word boundaries is avoided.

Journal ArticleDOI
A. Bateman1
TL;DR: It is shown that under both Rician and Rayleigh fading conditions, the use of a reference can eliminate the irreducible error rate phenomenon, with minimal sacrifice in bit error rate performance over an ideal BPSK system.
Abstract: Unified analysis of the performance of binary phase shift keying (BPSK) under static and mobile operating conditions is presented for the case in which a separate reference tone is used for channel sounding and subsequent 'coherent' data detection. It is shown that under both Rician and Rayleigh fading conditions, the use of a reference can eliminate the irreducible error rate phenomenon, with minimal sacrifice in bit error rate performance over an ideal BPSK system. >