scispace - formally typeset
Search or ask a question

Showing papers on "Linear predictive coding published in 1989"


Book
01 Jan 1989
TL;DR: This paper presents principal characteristics of speech speech production models speech analysis and analysis-synthesis systems linear predictive coding (LPC) analysis speech coding speech synthesis speech recognition future directions of speech processing.
Abstract: Principal characteristics of speech speech production models speech analysis and analysis-synthesis systems linear predictive coding (LPC) analysis speech coding speech synthesis speech recognition future directions of speech processing. Appendices: convolution and z-transform vector quantization algorithm neural nests.

307 citations


Journal ArticleDOI
D. Mansour1, Biing-Hwang Juang1
TL;DR: It is found that the orientation (or direction) of the cepstral vector is less susceptible to noise perturbation than the vector norm, and a family of distortion measures based on the projection between two cEPstral vectors is proposed, which have the same computational efficiency as the band-pass cepStral distortion measure.
Abstract: Consideration is given to the formulation of speech similarity measures, a fundamental component in recognizer designs, that are robust to the change of ambient conditions. The authors focus on the speech cepstrum derived from linear prediction coefficients (the LPC cepstrum). By using some common models for noisy speech, they show analytically that additive white noise reduces the norm (length) of the LPC cepstral vectors. Empirical observations on the parameter histograms not only confirm the analytical results through the use of noise models but further reveal that at a given (global) signal-to-noise ratio (SNR), the norm reduction on cepstral vectors with larger norms is generally less than on vectors with smaller norms, and that lower order coefficients are more affected than higher order terms. In addition, it is found that the orientation (or direction) of the cepstral vector is less susceptible to noise perturbation than the vector norm. As a consequence of the above results, a family of distortion measures based on the projection between two cepstral vectors is proposed. The new measures have the same computational efficiency as the band-pass cepstral distortion measure. >

166 citations


Journal ArticleDOI
TL;DR: The authors use the diagnostic acceptability measure (DAM) to evaluate speech quality of the latest 2400-b/s linear-predictive coder (LPC) with a noise suppressor at the front end and used a spectral subtraction technique for noise suppression.
Abstract: Numerous noise-suppression techniques have been developed for operating at the front end of low-bit-rate digital voice terminals. Some of these techniques have been evaluated by standardized intelligibility tests such as the diagnostic rhyme test (DRT). It is well known that the use of a noise suppressor seldom improves the DRT score even though listeners have had the impression that speech quality was enhanced. Unfortunately, noise suppressors have only occasionally been evaluated by standardized quality tests. The authors supplement quality test data for reference purposes. They use the diagnostic acceptability measure (DAM) to evaluate speech quality of the latest 2400-b/s linear-predictive coder (LPC) with a noise suppressor at the front end. They used a spectral subtraction technique for noise suppression. Ten different sets of noisy speech recorded at actual military platforms (such as a helicopter, tank, turboprop, helicopter carrier, or jeep) were input sources. The magnitude of the DAM improvement is substantial: as much as six points on the average, which is large enough to upgrade speech quality somewhat. >

163 citations


Proceedings ArticleDOI
23 May 1989
TL;DR: The colored-noise prefilter greatly enhances the quality and intelligibility of LPC output speech for noisy inputs, and it is demonstrated that such gains are unavailable with white noise assumption Kalman and Wiener filters.
Abstract: A report is presented on experiments using a colored-noise assumption Kalman filter to enhance speech additively contaminated by colored noise, such as helicopter noise and jeep noise, with a particular application to linear predictive coding (LPC) of noisy speech. The results indicate that the colored-noise Kalman filter provides a significant gain in SNR, a clear improvement in the sound spectrogram, and an audible improvement in output speech quality. The authors demonstrate that such gains are unavailable with white noise assumption Kalman and Wiener filters. The colored-noise prefilter greatly enhances the quality and intelligibility of LPC output speech for noisy inputs. >

132 citations


PatentDOI
Kazunori Ozawa1
TL;DR: A speech analysis and synthesis system operates to determine a sound source signal for the entire interval of each speech unit which is to be used for speech synthesis, according to a spectrum parameter obtained from each speech units based on cepstrum as discussed by the authors.
Abstract: A speech analysis and synthesis system operates to determine a sound source signal for the entire interval of each speech unit which is to be used for speech synthesis, according to a spectrum parameter obtained from each speech unit based on cepstrum. The sound source signal and the spectrum parameter are stored for each speech unit. Speech is synthesized according to the spectrum parameter while controlling prosody of the sound source signal. The spectrum of the synthesized speech is compensated through filtering based on cepstrum.

130 citations


Journal ArticleDOI
TL;DR: The proposed pitch estimation only needs a very short frame length and gives accurate results at voice onset and the results show that the algorithm works in noise-free, noisy, and very noisy signals for vowels as well as for voiced constants.
Abstract: The authors present an automatic and reliable algorithm for determining glottal closure instant (GCI). As a byproduct, nonstationary fundamental period estimation is achieved. The computation includes twelve-pole speech linear-prediction analysis, cross correlation, and both direct and inverse fast Fourier transforms (or a convolution). Maximum-likelihood epoch determination is used as the basis for locating GCIs, and the Hilbert transformation is applied to improve performance and reliability. A description of the system and the voiced/unvoiced/mixed (V/UV/M) decision procedure is given. The results show that the algorithm works in noise-free, noisy, and very noisy signals for vowels as well as for voiced constants. The proposed pitch estimation only needs a very short frame length and gives accurate results at voice onset. >

126 citations


Journal ArticleDOI
Sharad Singhal, B. S. Atal1
TL;DR: The algorithm provides a framework for computing multipulse excitation with varying degrees of optimization and computational complexity and finds that speech quality depends on the pulse rate and female speech requires a higher pulse rate than male speech.
Abstract: Although the multipulse model is conceptually simple, the problem of locating the pulses is computationally complex. The authors discuss the basic multipulse model and describe a procedure to compute the excitation with optimally adjusted amplitudes. The algorithm provides a framework for computing multipulse excitation with varying degrees of optimization and computational complexity. The authors find that speech quality depends on the pulse rate. They also find that for the same quality, female speech requires a higher pulse rate than male speech. The pitch dependence can be reduced and speech quality improved for high-pitched speakers by incorporating long delay prediction in the multipulse model. >

119 citations


Proceedings ArticleDOI
23 May 1989
TL;DR: The authors present results on the comparative performance of nonuniform scalar quantizers using three different LPC (linear predictive coding) representations: the arcsine of reflection coefficients, the log area ratios, and the line spectral frequencies.
Abstract: The authors present results on the comparative performance of nonuniform scalar quantizers using three different LPC (linear predictive coding) representations: the arcsine of reflection coefficients, the log area ratios, and the line spectral frequencies. On comparing the spectral distortion introduced by quantizers based on these representations, it was found that the average distortion was very similar for all three, with the arcsine showing fewer large spectral errors. In a parallel study, the performance of the above LPC representations and the autocorrelation coefficients for interpolating the spectrum between adjacent time frames was investigated and revealed only small differences between the different representations. Informal listening tests with a complete 8 kb/s code-excited linear predictive (CELP) coder, incorporating both quantization and interpolation, showed no significant differences between the various LPC representations, suggesting that the random codebook for the excitation is able to compensate for small spectral deviations. >

90 citations


Proceedings ArticleDOI
23 May 1989
TL;DR: A phonetically based segmentation of speech is performed to classify segments into five classes: onset, unvoiced low-pass voiced, steady-state voiced, Steady- state voiced, and transient voiced, using a distinctive coding scheme based on vector excitation coding (VXC).
Abstract: A phonetically based segmentation of speech is performed to classify segments into five classes: onset, unvoiced low-pass voiced, steady-state voiced, and transient voiced. The segment lengths are constrained to an integer multiple of a unit-frame. For each segment class, a distinctive coding scheme based on vector excitation coding (VXC) is used. The maximum bit-rate is 3.6 kb/s, and a moderate coding delay of 45 ms is incurred. Performance is roughly comparable to conventional VXC/CELP (code-excited linear prediction) coding at 4.8 kb/s. >

82 citations


PatentDOI
TL;DR: In this paper, an unvoiced speech performance was improved in low-rate multi-pulse coders by employing a simple architecture with an output quality comparable to code excited linear predictive (CELP) coding.
Abstract: Improved unvoiced speech performance in low-rate multi-pulse coders is achieved by employing a multi-pulse architecture that is simple in implementation but with an output quality comparable to code excited linear predictive (CELP) coding. A hybrid architecture is provided in which a stochastic excitation model that is used during unvoiced speech is also capable of modeling voiced speech by use of random codebook excitation. A modified method for calculating the gain during stochastic excitation is also provided.

73 citations


Proceedings ArticleDOI
Karl Hellwig1, Peter Vary1, D. Massaloux, J. P. Petit, C. Galand, M. Rosso 
27 Nov 1989
TL;DR: The speech coding scheme which will be used as the standard for the European mobile radio system has been selected by the CEPT Groupe Special-Mobile (GSM) as a result of formal subjective listening tests based on the regular-pulse excitation linear predictive coding technique (RPE-LPC) combined with long-term prediction (LTP).
Abstract: The speech coding scheme which will be used as the standard for the European mobile radio system has been selected by the CEPT Groupe Special-Mobile (GSM) as a result of formal subjective listening tests. It is based on the regular-pulse excitation linear predictive coding technique (RPE-LPC) combined with long-term prediction (LTP). The solution is called the RPE-LTP codec. The codec algorithm and the error protection scheme are presented. The net bit rate is 13.0 kb/s, and the gross bit rate, including error protection, is 22.8 kb/s. The experimental implementation based on VLSI signal processors is described. The speech quality obtained with the technique considered is far superior to that obtainable with present-day analog mobile radio systems. A duplex speech codec including error protection can be implemented with two VLSI sign processors with external data memories of about 1 K*16 b. >

Book ChapterDOI
Jinhui Chen1
01 Jan 1989
TL;DR: A candidate algorithm for the new CCITT 16-kb/s speech coding standard is presented, based on backward-adaptive CELP (code-excited linear prediction) where the predictor and the excitation gain are updated by analyzing previously quantized signals.
Abstract: A candidate algorithm for the new CCITT 16-kb/s speech coding standard is presented. This algorithm is based on backward-adaptive CELP (code-excited linear prediction) where the predictor and the excitation gain are updated by analyzing previously quantized signals. The only information transmitted is the excitation vector with a size as small as five samples so as to achieve a one-way coding delay of less than 2 ms. With a clear channel, this 16-kb/s coder slightly outperformed the CCITT standard 32 kb/s ADPCM (adaptive differential pulse code modulation) (G.721) in speech quality as measured by the mean opinion score (MOS). With noisy channels, the coder scored slightly higher than G.721 fora bit-error rate of 10/sup -2/ and significantly higher (by a margin of 0.5) for bit-error rate of 10/sup -3/. >

Proceedings ArticleDOI
23 May 1989
TL;DR: It is shown that an average spectral distortion of approximately 1 dB/sup 2/ can be achieved with 21 and 25 bits/frame using the 2-D DCT and DCT-DPCM schemes, respectively, which is a noticeable improvement over the previously reported bit rates of 32 bits/ frame and above.
Abstract: The intraframe and interframe correlation properties are used to develop two efficient encoding algorithms for speech line spectrum pair (LSP) parameters. The first algorithm (2-D DCT), which requires relatively large coding delays, is based on two-dimensional (time and frequency) discrete cosine transform coding techniques; the second algorithm (DCT-DPCM), which does not need any coding delay, uses one-dimensional discrete cosine transform in the frequency domain and DPCM (differential pulse-code modulation) in the time domain. The performances of these systems for different bit rates and delays are studied, and appropriate comparisons are made. It is shown that an average spectral distortion of approximately 1 dB/sup 2/ can be achieved with 21 and 25 bits/frame using the 2-D DCT and DCT-DPCM schemes, respectively. This is a noticeable improvement over the previously reported bit rates of 32 bits/frame and above. >

Journal ArticleDOI
TL;DR: A series of algorithms for silent and voiced/unvoiced/mixed excitation interval classification, pitch detection, formant estimation and formant tracking was developed, which can surpass the performance of single-channel (acoustic-signal-based) algorithms.
Abstract: The authors describe analysis and synthesis methods for improving the quality of speech produced by D.H. Klatt's (J. Acoust. Soc. Am., vol.67, p.971-95, 1980) software formant synthesizer. Synthetic speech generated using an excitation waveform resembling the glotal volume-velocity was found to be perceptually preferred over speech synthesized using other types of excitation. In addition, listeners ranked speech tokens synthesized with an excitation waveform that simulated the effects of source-tract interaction higher in neutralness than tokens synthesized without such interaction. A series of algorithms for silent and voiced/unvoiced/mixed excitation interval classification, pitch detection, formant estimation and formant tracking was developed. The algorithms can utilize two channels of input data, i.e., speech and electroglottographic signals, and can therefore surpass the performance of single-channel (acoustic-signal-based) algorithms. The formant synthesizer was used to study some aspects of the acoustic correlates of voice quality, e.g., male/female voice conversion and the simulation of breathiness, roughness, and vocal fry. >

PatentDOI
TL;DR: In this article, a speech decoder for synthesizing a speech signal from a digitized speech bit stream of the type produced by processing speech with a speech encoder is described, which includes an analyzer for processing the digitized bit stream to generate an angular frequency and magnitude for each of a plurality of sinusoidal components representing the speech processed by the encoder, the analyzer generating the angular frequencies and magnitudes over a sequence of times, and a random signal generator for generating a time sequence of random phase components; a phase synthesizer for synthesized phases for at
Abstract: A speech decoder apparatus for synthesizing a speech signal from a digitized speech bit stream of the type produced by processing speech with a speech encoder. The apparatus includes an analyzer for processing the digitized speech bit stream to generate an angular frequency and magnitude for each of a plurality of sinusoidal components representing the speech processed by the speech encoder, the analyzer generating the angular frequencies and magnitudes over a sequence of times; a random signal generator for generating a time sequence of random phase components; a phase synthesizer for generating a time sequence of synthesized phases for at least some of the sinusoidal components, the synthesized phases being generated from the angular frequencies and random phase components; and a synthesizer for synthesizing speech from the time sequences of angular frequencies, magnitudes, and synthesized phases.

Proceedings ArticleDOI
23 May 1989
TL;DR: The authors study the problem of coding spectral information in speech at bit rates in the range of 100-400 b/s using speaker-independent phone-based recognition using hidden Markov model (HMM)-based phone models to show a promising approach.
Abstract: The authors study the problem of coding spectral information in speech at bit rates in the range of 100-400 b/s using speaker-independent phone-based recognition. Spectral information is coded as a sequence of phonetic events and a sequence of transitions through the corresponding hidden Markov model (HMM)-based phone models. This simple phonetic speech-coding system has been shown to be a promising approach. A simple inventory of phonemes is sufficient for capturing the bulk of the acoustic information. >

Proceedings ArticleDOI
23 May 1989
TL;DR: A zero-phase sinusoidal analysis-synthesis system which generates natural-sounding speech without the requirement of vocal tract phase is described, which provides a basis for improving sound quality by providing different levels of phase coherence in speech reconstruction for time-scale modification.
Abstract: It has been shown that an analysis-synthesis system based on a sinusoidal representation leads to synthetic speech that is essentially perceptually indistinguishable from the original. A change in speech quality has been observed, however, when the phase relation of the sine waves is altered. This occurs in practice when sine waves are processed for speech enhancement and for speech coding. A description is given of a zero-phase sinusoidal analysis-synthesis system which generates natural-sounding speech without the requirement of vocal tract phase. The method provides a basis for improving sound quality by providing different levels of phase coherence in speech reconstruction for time-scale modification, for a baseline system for coding, and for reducing the peak-to-RMS ratio by dispersion. >

PatentDOI
TL;DR: In this article, an artificial intelligence system is used to decide upon the adjustment of a filter subsystem by distinguishing between noise and speech in the spectrum of the incoming signal of speech plus noise.
Abstract: A system is provided to reduce noise from a signal of speech that is contaminated by noise. The present system employs an artificial intelligence that is capable of deciding upon the adjustment of a filter subsystem by distinguishing between noise and speech in the spectrum of the incoming signal of speech plus noise. The system does this by testing the pattern of a power or envelope function of the frequency spectrum of the incoming signal. The system determines that the fast changing portions of that envelope denote speech whereas the residual is determined to be the frequency distribution of the noise power. This determination is done while examining either the whole spectrum, or frequency bands thereof, regardless of where the maximum of the spectrum lies. In another embodiment of the invention, a feedback loop is incorporated which provides incremental adjustments to the filter by employing a gradient search procedure to attempt to increase certain speech-like features in the system's output. The present system does not require consideration of minima of functions of the incoming signal or pauses in speech. Instead, the present system employs an artificial intelligence system to which is input the envelope pattern of the incoming signal of speech and noise. The present system then filters out of this envelope signal the rapidly changing variations of the envelope over fixed time windows.

PatentDOI
TL;DR: A noise reduction system used for transmission and/or recognition of speech includes a speech analyzer for analyzing a noisy speech input signal thereby converting the speech signal into feature vectors such as autocorrelation coefficients, and a neural network for receiving the feature vectors of the noisy speech signal as its input.
Abstract: A noise reduction system used for transmission and/or recognition of speech includes a speech analyzer for analyzing a noisy speech input signal thereby converting the speech signal into feature vectors such as autocorrelation coefficients, and a neural network for receiving the feature vectors of the noisy speech signal as its input. The neural network extracts from a codebook an index of prototype vectors corresponding to a noise-free equivalent to the noisy speech input signal. Feature vectors of speech are read out from the codebook on the basis of the index delivered as an output from the neural network, thereby causing the speech input to be reproduced on the basis of the feature vectors of speech read out from the codebook.

Journal ArticleDOI
TL;DR: A sinusoidal model is presented where the nonstationary nature of speech is considered by using a time-varying frequency and amplitude for each sinusoid using a suboptimal linear estimator.
Abstract: A sinusoidal model is presented where the nonstationary nature of speech is considered by using a time-varying frequency and amplitude for each sinusoid. The proposed model generalizes other sinusoidal models while still having an analytically tractable short-time spectrum. The estimation of the parameters of the sinusoids is done in the frequency domain by a suboptimal linear estimator. The experimental results obtained with the proposed model illustrate its ability to represent nonstationary speech frames. >

PatentDOI
TL;DR: In this article, a point-by-point division of the signal by an amplitudes function, which is obtained from lowpass filtering the magnitude of signal, is used for pitch detection and speech coding.
Abstract: Processing speech signals applicable to a variety of speech processing including narrowband, mediumband and wideband coding. The speech signal is modified by a normalization process using the envelope of the speech signal such that the modified signal will have more desirable characteristics as seen by the intended processing algorithm. The modification is achieved by a point-by-point division (normalization) of the signal by an amplitudes function, which is obtained from lowpass filtering the magnitude of the signal. Several examples of normalized signal are presented. Application to pitch detection and speech coding are described herein.

Proceedings ArticleDOI
23 May 1989
TL;DR: The authors explore the benefits of time-varying bit allocation to excitation and LPC (linear predictive coding) parameters for the case of codebook-excited LPC, finding that gains due to variable bit allocation were most noticeable in the 6.4 kb/s system, especially with female speakers.
Abstract: The authors explore the benefits of time-varying bit allocation to excitation and LPC (linear predictive coding) parameters for the case of codebook-excited LPC. The overall bit rate in the experiment was 4.8, 6.4, or 8.0 kb/s. In each case, permissible bit rates for the LPC component were 0, 24, 36, or 48 bits per frame, one of which was selected for each speech frame using a brute-force search maximum performance. Average SNR gains over conventional time-invariant methods were modest, on the order of 1 to 2 dB, but gains for certain speech segments were as high as 3 to 5 dB. Perceptually, gains due to variable bit allocation were most noticeable in the 6.4 kb/s system, especially with female speakers. However, even in this case, the benefits of flexible bit allocation were somewhat offset by distortions due to other inadequacies in the coding algorithm. >

Proceedings ArticleDOI
23 May 1989
TL;DR: The study shows that removing less than the full amount of noise and whitening it improves spectral estimation and speech device performance.
Abstract: The authors present the results of a study designed to investigate the effects of subtractive-type noise reduction algorithms on LPC-based spectral parameter estimation as related to the performance of speech processors operating with input SNRs of 15 dB and below. Subtractive noise preprocessing greatly improves the SNR, but system performance improvement is not commensurate. LPC spectral estimation is affected by the character of the residual noise which exhibits greater variance and spectral granularity than the original broadband noise. The study shows that removing less than the full amount of noise and whitening it improves spectral estimation and speech device performance. Techniques and performance results are presented. >

Proceedings ArticleDOI
23 May 1989
TL;DR: In this paper, a method is devised that uses the differences in spectral slope between linear predictive coding log magnitude spectra to weight the point-by-point energy differences between the spectra.
Abstract: The major goal of this research is to reduce the discrepancy in recognition performance between normal and abnormal speech, given that reference templates were derived only from normal speech. A method is devised that uses the differences in spectral slope between linear predictive coding log magnitude spectra to weight the point-by-point energy differences between the spectra. The distances of all reference tokens of like phonemes are combined to form a smallest cumulative distance (SCD) method. When SCD is combined with the method of slope-dependent weighting (SDW), the most significant success is obtained. In terms of error rates for a fixed phoneme vector length of five, SDW+SCD is found to reduce the difference in error rate between normal and abnormal speech by approximately 50%. >

PatentDOI
TL;DR: A speaker verification system receives input speech from a speaker of unknown identity and undergoes linear predictive coding analysis and transformation to maximize separability between true speakers and impostors when compared to reference speech parameters which have been similarly transformed.
Abstract: A speaker verification system receives input speech from a speaker of unknown identity. The speech undergoes linear predictive coding (LPC) analysis and transformation to maximize separability between true speakers and impostors when compared to reference speech parameters which have been similarly transformed. The transformation incorporated a "inter-class" covariance matrix of successful impostors within a database.

Proceedings ArticleDOI
27 Nov 1989
TL;DR: The LD-VXC coder provides very good speech quality at 16 kb/s, moderate complexity, a delay of under 2 ms, and a gentle degradation of quality with transmission errors, and was submitted to the CCITT as a candidate for a future 16-kb/s speech coding standard.
Abstract: To attain a very-low-delay speech coder at 16 kb/s while maintaining a quality acceptable for the public switched telephone network, low delay vector excitation coding (LD-VXC) is introduced. Backward adaptation is used to track the spectral characteristics of the signal without requiring any buffering of the input speech, thereby allowing a very low delay to be achieved in an analysis-by-synthesis structure. The algorithm differs markedly from conventional VXC or CELP (code-excited linear prediction) coders due to the use of backward adaptive linear prediction for modeling the time-varying short- and long-term correlation of speech. The LD-VXC coder provides very good speech quality at 16 kb/s, moderate complexity, a delay of under 2 ms, and a gentle degradation of quality with transmission errors. The algorithm was submitted to the CCITT as a candidate for a future 16-kb/s speech coding standard. >

Proceedings ArticleDOI
23 May 1989
TL;DR: Novel fast optimal algorithms for finding the best sequence in this Barnes-Wall shell innovation codebook makes it possible to design a CELP coder at 9.6 kb/s with good quality and still implementable on a current digital-signal-processing chip.
Abstract: The authors present an algebraic code-excited linear prediction (CELP) speech coder where the innovation codebook comes from the first spherical code of the Barnes-Wall lattice in 16 dimensions. Novel fast optimal algorithms for finding the best sequence in this Barnes-Wall shell innovation codebook are described. This algebraic codebook makes it possible to design a CELP coder at 9.6 kb/s with good quality and still implementable on a current digital-signal-processing chip. >

Proceedings ArticleDOI
23 May 1989
TL;DR: A novel approach to narrow- and medium-band speech coding that can dynamically balance the transmission rate between the excitation and the spectral parameters is introduced, improving the subjective speech quality.
Abstract: The authors introduce a novel approach to narrow- and medium-band speech coding that can dynamically balance the transmission rate between the excitation and the spectral parameters. The coding algorithm, called multimode coding, operates several coding blocks, each of which has a different bit assignment in parallel, and selects the optimum coding block frame by frame based on an evaluation of the reproduced speech quality. This coding algorithm is applied to 4.8 and 8.0 kb/s CELP coders, and 2.0-2.4 dB of SNRseg improvement is achieved over conventional CELP coders. The spectral distortion measure is added as an evaluation function, improving the subjective speech quality. >

Proceedings ArticleDOI
23 May 1989
TL;DR: Results support the hypothesis that the higher orders of PLP contain significant speaker-specific information, with ASI performance improving rapidly up to order 8, and then far more slowly yet consistently up toOrder 16, and a similar pattern is seen for codebook size, with fast improvements up to size 64, with more gradual gains thereafter.
Abstract: Results of an experimental study and the optimization of features for a conventional vector-quantization codebook-based automatic speaker identification (ASI) system are presented. Standard LPC (linear predictive coding) and a perceptually weighted feature termed PLP (perceptually based linear prediction) are compared using appropriate distance measures, namely, the log-likelihood, and three cepstral variants: constant weighting, the robot-power-sum, and the inverse variance. PLP features combined with a weighted cepstral measure are found to be consistently the best in a number of different digit-independent ASI experiments. Results support the hypothesis that the higher orders of PLP (>5) contain significant speaker-specific information, with ASI performance improving rapidly up to order 8, and then far more slowly yet consistently up to order 16. A similar pattern is seen for codebook size, with fast improvements up to size 64, with more gradual gains thereafter. >

Proceedings ArticleDOI
B. Atal1
23 May 1989
TL;DR: The author presents results on the precision that is necessary in theexcitation without producing perceptible distortion in the output speech signal and an estimate of the minimum number of bits necessary for accurate reproduction of the excitation.
Abstract: A description is presented of a framework for developing compact and accurate representation of LPC (linear predictive coding) excitation. In this representation, the excitation waveform is expressed as a linear combination of the eigenvectors of the autocorrelation matrix of the LPC filter's impulse response. This representation allows a systematic study of changes in the filter excitation on the speech output. The author presents results on the precision that is necessary in the excitation without producing perceptible distortion in the output speech signal and an estimate of the minimum number of bits necessary for accurate reproduction of the excitation. >