scispace - formally typeset
Search or ask a question

Showing papers on "Linear predictive coding published in 1985"


Proceedings ArticleDOI
26 Apr 1985
TL;DR: A code-excited linear predictive coder in which the optimum innovation sequence is selected from a code book of stored sequences to optimize a given fidelity criterion, indicating that a random code book has a slight speech quality advantage at low bit rates.
Abstract: We describe in this paper a code-excited linear predictive coder in which the optimum innovation sequence is selected from a code book of stored sequences to optimize a given fidelity criterion. Each sample of the innovation sequence is filtered sequentially through two time-varying linear recursive filters, one with a long-delay (related to pitch period) predictor in the feedback loop and the other with a short-delay predictor (related to spectral envelope) in the feedback loop. We code speech, sampled at 8 kHz, in blocks of 5-msec duration. Each block consisting of 40 samples is produced from one of 1024 possible innovation sequences. The bit rate for the innovation sequence is thus 1/4 bit per sample. We compare in this paper several different random and deterministic code books for their effectiveness in providing the optimum innovation sequence in each block. Our results indicate that a random code book has a slight speech quality advantage at low bit rates. Examples of speech produced by the above method will be played at the conference.

1,343 citations


Proceedings ArticleDOI
S. Roucos1, A. Wilgus1
26 Apr 1985
TL;DR: A new and simple method for speech rate modification that yields high quality rate-modified speech and both objective and informal subjective results for the new and previous TSM methods are presented.
Abstract: We present a new and simple method for speech rate modification that yields high quality rate-modified speech. Earlier algorithms either required a significant amount of computation for good quality output speech or resulted in poor quality rate-modified speech. The algorithm we describe allows arbitrary linear or nonlinear scaling of the time axis. The algorithm operates in the time domain using a modified overlap-and-add (OLA) procedure on the waveform. It requires moderate computation and could be easily implemented in real time on currently available hardware. The algorithm works equally well on single voice speech, multiple-voice speech, and speech in noise. In this paper, we discuss an earlier algorithm for time-scale modification (TSM), and present both objective and informal subjective results for the new and previous TSM methods.

420 citations


Journal ArticleDOI
TL;DR: Preliminary results indicate that higher quality or lower bit rates may be achieved with enough computational resources, and an extension of the centroid computation used in vector quantization is presented.
Abstract: Rate-distortion theory provides the motivation for using data compression techniques on matrices of N LPC vectors. This leads to a simple extension of speech coding techniques using vector quantization. The effects of using the generalized Lloyd algorithm on such matrices using a summed Itakura-Saito distortion measure are studied, and an extension of the centroid computation used in vector quantization is presented. The matrix quantizers so obtained offer substantial reductions in bit rates relative to full-search vector quantizers. Bit rates as low as 150 bits/s for the LPC matrix information (inclusive of gain, but without pitch and voicing) have been achieved for a single speaker, having average test sequence and codebook distortions comparable to those in the equivalent full-search vector quantizer operating at 350 bits/s. Preliminary results indicate that higher quality or lower bit rates may be achieved with enough computational resources.

188 citations


Patent
11 Jun 1985
TL;DR: In this paper, the LPC residual of the speech signal is coded using minimum phase spectral reconstruction techniques by transforming the residual signal in a manner approximately a minimum phase signal, and then applying spectral reconstruction technique for representing the linear predictive (LPC) residual signal by either its Fourier Transform magnitude or phase.
Abstract: Method of encoding speech at medium to high bit rates while maintaining very high speech quality, as specifically directed to the coding of the linear predictive (LPC) residual signal using either its Fourier Transform magnitude or phase. In particular, the LPC residual of the speech signal is coded using minimum phase spectral reconstruction techniques by transforming the LPC residual signal in a manner approximately a minimum phase signal, and then applying spectral reconstruction techniques for representing the LPC residual signal by either its Fourier Transform magnitude or phase. The non-iterative spectral reconstruction technique is based upon cepstral coefficients through which the magnitude and phase of a minimum phase signal are related. The LPC residual as reconstructed and regenerated is used as an excitation signal to a LPC synthesis filter in the generation of analog speech signals via speech synthesis from which audible speech may be produced.

137 citations


Proceedings ArticleDOI
26 Apr 1985
TL;DR: It is demonstrated through analysis of both synthetic and natural speech that psychoacoustic concepts of spectral auditory integration in vowel perception, namely the F1, F2' concept of Carlson and Fant and the 3.5 Bark auditory integration concept of Chistovich, are well modeled by the PLP method.
Abstract: A novel speech analysis method which uses several established psychoacoustic concepts, the perceptually based linear predictive analysis (PLP), models the auditory spectrum by the spectrum of the low-order all-pole model. The auditory spectrum is derived from the speech waveform by critical-band filtering, equal-loudness curve pre-emphasis, and intensity-loudness root compression. We demonstrate through analysis of both synthetic and natural speech that psychoacoustic concepts of spectral auditory integration in vowel perception, namely the F1, F2' concept of Carlson and Fant and the 3.5 Bark auditory integration concept of Chistovich, are well modeled by the PLP method. A complete speech analysis-synthesis system based on the PLP method is also described in the paper.

97 citations


Proceedings ArticleDOI
26 Apr 1985
TL;DR: The use of Line-Spectrum Pairs (LSPs) makes it possible to employ bit-saving measures more readily than the better known reflection coefficients, and the intelligibility of an LSP-based, pitch-excited vocoder can be made as high as 87 for three male speakers.
Abstract: A low-bit-rate speech encoder must employ bit-saving measures to achieve intelligible and natural sounding synthesized speech. Some important measures are: (a) quantization of parameters based on their spectral-error sensitivities (i.e., coarser quantization for spectrally less sensitive parameters), and (b) quantization of parameters in accordance with properties of auditory perception (i.e., coarser quantization of the higher frequency components of the speech spectral envelope, and finer representation of spectral peaks than valleys). The use of Line-Spectrum Pairs (LSPs) makes it possible to employ these measures more readily than the better known reflection coefficients. As a result, the intelligibility of an LSP-based, pitch-excited vocoder operating at 800 bits/second (b/s) can be made as high as 87 for three male speakers (as measured by the Diagnostic Rhyme Test (DRT)) which is only 1.4 below that of the 2400-b/s LPC. Likewise, the intelligibility of a 4800-b/s nonpitch-excited vocoder is as high as 92.3 which compares favorably with scores from current 9600-b/s vocoders.

85 citations


Patent
24 Dec 1985
TL;DR: In this paper, the authors proposed a digital speech coding circuit that makes use of linear predictive coding, vector quantization and difference, Huffman coding and excitation estimation to produce digital representations of human speech having bit rates low enough to be transmitted over such channels as telephone lines and at the same time being capable of being synthesized in the receiver portion of the circuit to produce analog speech of high intelligibility and quality.
Abstract: A digital speech coding circuit makes use of linear predictive coding, vector quantization and difference, Huffman coding, and excitation estimation to produce digital representations of human speech having bit rates low enough to be transmitted over such channels as telephone lines and at the same time being capable of being synthesized in the receiver portion of the circuit to produce analog speech of high intelligibility and quality. The transmitter portion of the circuit comprises a series connection of a low pass filter, analog to digital converter, linear predictive coding module comprising five resonators for establishing five center frequencies and bandwidths of the analog speech, vector quantization module comprising binary representation of the likely combinations of resonances found in human speech, Huffman coding module, a variable bit rate to fixed bit rate converter, and optionally, an encryption module. Another branch of the transmitter circuit extends from the output of the analog to digital converter to the bit rate converter and comprises a series combination of an inverse filter and an excitation estimation module having parallel outputs respectively representative of a voiced/unvoiced signal, the excitation amplitude, and the excitation pulse position. The receiver portion of the circuit comprises a series connection of a fixed bit rate to variable bit rate converter, a bit unmapping module which produces separate outputs representative of the reflection coefficients and excitation of the speech, a synthesis filter which receives these outputs and produces a digital signal representative of the analog speech, a digital to analog converter, and a low pass filter.

79 citations


Proceedings ArticleDOI
26 Apr 1985
TL;DR: Preliminary results indicate that speech and noisy speech synthesized based on this model do not have the "buzziness" typically associated with vocoder speech and is essentially the same as the original speech or the noisy speech in both intelligibility and quality.
Abstract: A new model-based speech analysis/synthesis system is presented in this paper. In this model, the short-time spectrum of speech is modeled as the product of an excitation spectrum and a spectral envelope. The spectral envelope is some smoothed version of the speech spectrum and the excitation spectrum is represented by the pitch period and a voiced/unvoiced (V/UV) decision for each harmonic. In speech analysis, the model parameters are estimated by explicit comparison between the original speech spectrum and the synthetic speech spectrum. Preliminary results indicate that speech and noisy speech synthesized based on this model do not have the "buzziness" typically associated with vocoder speech and is essentially the same as the original speech or the noisy speech in both intelligibility and quality. Potential applications of this new model and its parameter estimation include high quality speech analysis/synthesis, time scale modification of speech and noisy speech, and pitch detection.

73 citations


Proceedings ArticleDOI
26 Apr 1985
TL;DR: A flexible analysis-synthesis system with signal dependent features is described and used to realize some desired voice characteristics in synthesized speech.
Abstract: A flexible analysis-synthesis system with signal dependent features is described and used to realize some desired voice characteristics in synthesized speech. The intelligibility of synthetic speech appears to depend on the ability to reproduce dynamic sounds such as stops, whereas the quality of voice is mainly determined by the true reproduction of voiced segments. We describe our work in converting the speech of one speaker to sound like that of another. A number of factors are important for maintaining the quality of the voice during this conversion process. These factors are derived from both the speech and electroglottograph signals.

72 citations


Proceedings ArticleDOI
26 Apr 1985
TL;DR: In this paper a sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves.
Abstract: In this paper a sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves. The resulting synthetic waveform preserves the waveform shape and is essentially perceptually indistinguishable from the original speech. Furthermore, in the presence of noise the perceptual characteristics of the speech and the noise are maintained. Based on this system, a coder operating at 8 kbps is developed that codes the amplitudes and phases of each of the sine wave components and uses a harmonic model to code all of the frequencies. Since not all of the phases can be coded, a high frequency regeneration technique is developed that exploits the properties of the sinusoidal representation of the coded baseband signal. Based on a relatively limited data base, computer simulation has demonstrated that coded speech of good quality can be achieved. A real-time simulation is being developed to provide a more thorough evaluation of the algorithm.

70 citations


Journal ArticleDOI
TL;DR: A new stochastic model for generating speech signals suitable for coding at low bit rates is described, in which the speech waveform is represented as a zero mean Gaussian process with slowly-varying power spectrum.

Proceedings ArticleDOI
01 Dec 1985
TL;DR: Previous work on isolated word recognition based on hidden Markov models is extended by replacing the discrete symbol representation of the speech signal by a continuous Gaussian mixture density, so that the inherent quantization error introduced by the discrete representation is essentially eliminated.
Abstract: In this paper we extend previous work on isolated word recognition based on hidden Markov models by replacing the discrete symbol representation of the speech signal by a continuous Gaussian mixture density. In this manner the inherent quantization error introduced by the discrete representation is essentially eliminated. The resulting recognizer was tested on a vocabulary of the 10 digits across a wide range of talkers and test conditions, and shown to have an error rate at least comparable to that of the best template recognizers and significantly lower than that of the discrete symbol hidden Markov model system. Several issues involved in the training of the continuous density models and in the implementation of the recognizer are discussed.

Proceedings ArticleDOI
01 Apr 1985
TL;DR: A new approach for low bit rate coding of speech in a pitch-excited vocoder context is presented, which has quality equivalent to a 2400 bps fixed-rate LPC vocoder while requiring only slightly more storage and computational resources.
Abstract: A new approach for low bit rate coding of speech in a pitch-excited vocoder context is presented in this paper. The new technique, which operates at about 800 bps, has quality equivalent to a 2400 bps fixed-rate LPC vocoder while requiring only slightly more storage and computational resources. The improved performance of this system is based on two specific developments. The first is a novel use of line spectrum pair (LSP) coefficients in a structure which allows a low average bit rate while constraining the resulting distortion to have a low perceptual impact. The second is a frame-to-frame parameter interpolation algorithm which both reduces the bit rate and simultaneously insures more speech-like formant trajectories than those derived from vector quantizers at comparable bit rates.

Journal ArticleDOI
TL;DR: Modifications which improve the quality of the synthesized speech without requiring the transmission of additional data are presented and diagnostic acceptability measure tests show an increase of up to five points in overall speech quality with the implementation of these improvements.
Abstract: The major weakness of the current narrow-band LPC synthesizer lies in the use of a "canned" invariant excitation signal, The use of such an excitation signal is based on three primary assumptions, namely, 1) that the amplitude spectrum of the excitation signal is flat and time invariant, 2) that the phase spectrum of the voiced excitation signal is a time-invariant function of frequency, and 3) that the probability density function of the phase spectrum of the unvoiced excitation signal is also time invariant. This paper critically examines these assumptions and presents modifications which improve the quality of the synthesized speech without requiring the transmission of additional data. Diagnostic acceptability measure (DAM) tests show an increase of up to five points in overall speech quality with the implementation of each of these improvements. These modifications can also improve the speech quality of LPC-based speech synthesizers.

PatentDOI
John Ellis1, Bruce L. Townsend1
TL;DR: In this paper, the authors proposed a wideband speech signal transmission scheme based on linear predictive coding (LPC) for a high band of frequencies between 4 and 8 kHz, which is compatible with existing limited bandwidth voice channel transmission arrangements.
Abstract: Speech signal components in a high band of frequencies between 4 and 8 kHz are transmitted via a digital transmission channel, which carries speech signal samples at frequencies below 4 kHz and sampled at a rate of 8 kHz, by replacing the least significant bit of the samples with bits of information derived from the high band by linear predictive coding. These information bits are transmitted in frames, each frame comprising a synchronizing bit and bits representing the power of and a set of filter coefficients for the high band signal components occurring in a period corresponding to the frame duration. Each such bit is transmitted redundantly three or six times in view of bit stealing techniques already used for signalling on digital transmission links. The resulting wideband speech signal transmission is compatible with existing limited bandwidth voice channel transmission arrangements.

Proceedings ArticleDOI
01 Apr 1985
TL;DR: This paper describes an effective and efficient time domain speech encoding technique that has an appealingly low complexity, and produces (near) toll quality speech at rates below 16 kbit/s.
Abstract: This paper describes an effective and efficient time domain speech encoding technique that has an appealingly low complexity, and produces (near) toll quality speech at rates below 16 kbit/s. The proposed coder uses linear predictive techniques to remove the short-time correlation in the speech signal. The remaining (residual) information is then modeled by a regular (in time) excitation signal that, when inputted to the time-varying model filter, produces a signal that is "close" to the reference speech signal. The procedure for finding the appropriate excitation model parameters incorporates the solution of a few sets of linear equations and is of moderate complexity compared to competing coding systems such as Adaptive Transform Coding and Multi-Pulse Excitation Coding.


Proceedings ArticleDOI
S. Roucos1, A. Wilgus1
26 Apr 1985
TL;DR: Methods for high-quality modification of the pitch and duration of a segment of a speech waveform are presented and shown how these methods can be applied to improve the quality of the segment vocoder's output speech.
Abstract: We propose a new method of synthesis to be used for the segment vocoder, which transmits intelligible speech at rates below 300 b/s. The earlier segment vocoder applies LPC analysis to input speech, divides it into segments of variable duration, matches each segment with the nearest template from a codebook, concatenates at the receiver the set of nearest templates, and finally synthesizes the resultant sequence of speech frames using LPC synthesis. The quality of such a segment vocoder cannot exceed that of a standard unquantized LPC vocoder, which sounds buzzy due to the pulse/noise excitation used. Alternatively, by beginning with the waveforms (not the spectral representation) corresponding to the set of nearest templates, we can independently modify the pitch, energy, and duration of each template to match those of the input segment. These modified segments are then concatenated to produce the output waveform. We present here methods for high-quality modification of the pitch and duration of a segment of a speech waveform and show how these methods can be applied to improve the quality of the segment vocoder's output speech.

Proceedings ArticleDOI
01 Apr 1985
TL;DR: It is shown that the closed phase method can yield variable results if the EGG information, or an alternative closed phase indicator, is not used correctly for analysis frame positioning.
Abstract: Accurate characterization of the vocal tract filter relies on speech data obtained from the closed glottal interval. Use of the electroglottograph (EGG) as a noninvasive and reliable means of monitoring glottal vibratory characteristics has increased the popularity of the two channel approach to speech analysis. The cycle by cycle auxiliary glottal information provided by the EGG facilitates implementation of pitch synchronous closed phase procedures. This paper shows that the closed phase method can yield variable results if the EGG information, or an alternative closed phase indicator, is not used correctly for analysis frame positioning. Synchronized glottal area waveforms obtained from ultra high speed laryngeal films verify true closure for our analyses. Results presented for other known regions also confirm many previous observations.

Proceedings ArticleDOI
26 Apr 1985
TL;DR: A modification of the usual LPC speaker-dependent speech recognition algorithms yielded significantly improved recognition performance in an F-16 fighter cockpit environment with about half the number of substitutions.
Abstract: A modification of the usual LPC speaker-dependent speech recognition algorithms yielded significantly improved recognition performance in an F-16 fighter cockpit environment.The LPC model is first transformed into spectral amplitudes using asimulated filter bank. Statistically optimum linear transformation of the filter bank amplitudes to "principal spectral components" (PSC) provides a set of uncorrelated features. These features are rank ordered and the least significant features are discarded. The data base used for experiments consisted of 5 male speakers uttering a 70-word vocabulary ten times for training in 85 dBA noise level, and 3 times for test in each of 97, 106 and 112 dBA noise levels. The PSC method yielded about half the number of substitutions of the standard LPC method.

Journal ArticleDOI
TL;DR: The digital coding of waveforms principles and applications to speech and video that we provide for you will be ultimate to give preference as mentioned in this paper. And this reading book is your chosen book to accompany you when in your free time, in your lonely.
Abstract: The digital coding of waveforms principles and applications to speech and video that we provide for you will be ultimate to give preference. This reading book is your chosen book to accompany you when in your free time, in your lonely. This kind of book can help you to heal the lonely and get or add the inspirations to be more inoperative. Yeah, book as the widow of the world can be very inspiring manners. As here, this book is also created by an inspiring author that can make influences of you to do more.

Proceedings ArticleDOI
M. Copperi1, D. Sereno
26 Apr 1985
TL;DR: The main objective is improving the excitation representation in a linear predictive coding scheme and, hence, the subjective quality of synthesized speech signals.
Abstract: Considerable effort has been and is currently being concentrated on improving the speech quality at low and very low bit rates. Recently new models of LPC excitation have been devised, which are able to yield good quality speech by exploiting our knowledge of the human speech production and perception processes. Unfortunately, these models generally require too much computational load to be easily implemented on currently available hardware. This paper describes an efficient speech coder, capable of providing acceptable quality speech, within the limitations of both low bit rate (approximately 2.4 kbit/s) and real-time implementation. The coder is based upon pattern classification and cluster analysis with perceptually-meaningful error minimization criteria. Our main objective is improving the excitation representation in a linear predictive coding scheme and, hence, the subjective quality of synthesized speech signals.

PatentDOI
Tetsu Taguchi1
TL;DR: Speech analysis and synthesis invole analysis for sinusoidal components and pitch frequency, and synthesis by first phase-resetting to zero at pitch period all sine oscillater components, whether periodic for voiced speech, or at random period in accordance with a random code for unvoiced speech as discussed by the authors.
Abstract: Speech analysis and synthesis invole analysis for sinusoidal components and pitch frequency, and synthesis by first phase-resetting to zero at pitch period all sine oscillater components, whether periodic for voiced speech, or at random period in accordance with a random code for unvoiced speech. As a result, the synthesized speech signal has the initial line spectrum spread due to pitch structure for better speech quality. Frequency modulation may also be used.

Proceedings ArticleDOI
26 Apr 1985
TL;DR: It is demonstrated that good quality intelligible speech signal can be reconstructed using only a small portion of information contained in the (t-m-Cm) plots, which is the complete characterization of the time-domain signal.
Abstract: This paper concerns with the Fourier-Bessel (FB) representation of speech signal and its application in speech analysis and synthesis. The Fourier-Bessel representation of the speech signal is obtained using Bessel function as a basis set. This is accomplished by determining the Fourier-Bessel series expansion coefficients C m of the short-time speech signal at successive instants of time. The successive sets of coefficients are then displayed in time-index-coefficient (t-m-C m ) plot. The patterns on the plot yield discriminating feature for the spoken words and may be used as speech feature for speech recognition. Unlike the spectrogram which represents the spectral magnitude content of the signal, the (t-m-C m ) plot is the complete characterization of the time-domain signal. It is demonstrated that good quality intelligible speech signal can be reconstructed using only a small portion of information contained in the (t-m-C m ) plots. Examples showing discriminating (t-m-C M ) plots and reconstructed speech using only a small portion of information in the plot are given for vowel, stop, fricative and affricate.

Journal ArticleDOI
TL;DR: An analytical derivation of a simple noniterative technique for extracting a multiple impulse excitation model for synthesized speech directly from the LPC residual sequence, which is very applicable for speech enhancement where processor capability is limited.
Abstract: This paper provides an analytical derivation of a simple noniterative technique for extracting a multiple impulse excitation model for synthesized speech directly from the LPC residual sequence. While suboptimal with respect to "multipulse" techniques, this method is very applicable for speech enhancement where processor capability is limited. The results suggest an additional "orthogonality" requirement between the excitation sequence and the resulting prediction error, which aids in the intuitive understanding of the method.

Journal ArticleDOI
TL;DR: The co-occurrence matrix, a two-dimensional histogram of pairs of sample amplitudes, is explored as a representation of the digital speech waveform and is shown to lead to a good estimator of the pitch period of voiced speech.
Abstract: The co-occurrence matrix, a two-dimensional histogram of pairs of sample amplitudes, is explored as a representation of the digital speech waveform. Co-occurrence matrix representations support a hypothesis-testing approach to digital speech analysis. This approach is pursued in the formulation of a quantitative (chi-square) measure of sample amplitude dependence, based on co-occurrence matrices. This measure, which is higly sensitive to quasi-periodicity, is shown to lead to a good estimator of the pitch period of voiced speech. Co-occurrence matrix representations are employed in conjunction with pattern classification methods in experiments involving the voiced-unvoiced-silence analysis of speech, and in an experimental pitch extraction algorithm which is tested on continuous speech.

Book ChapterDOI
01 Apr 1985
TL;DR: In this article, a system for speech synthesis by rule is described, which uses demisyllables (DSs) as phonetic units and concatenation is discussed in detail; the pertinent stage converts a string of phonetic symbols into a stream of speech parameter frames.
Abstract: A system for speech synthesis by rule is described which uses demisyllables (DSs) as phonetic units. The problem of concatenation is discussed in detail; the pertinent stage converts a string of phonetic symbols into a stream of speech parameter frames. For German about 1650 DSs are required to permit synthesizing a very large vocabulary. Synthesis is controlled by 18 rules which are used for splitting up the phonetic string into DSs, for selecting the DSs in such a way that the inventory size is minimized, and- last but not least - for concatenation. The quality and intelligibility of the synthetic signal is very good; in a subjective test the median word intelligibility dropped from 96.6% for a LPC vocoder to 92.1% for the DS synthesis, and the quality difference between the DS synthesis and ordinary vocoded speech was judged very small.

Journal ArticleDOI
TL;DR: If the linear predictive coding filter coefficients are imagined as specifying a point in an n -dimensional space, then the set of all stable filters fills up a certain region in this space, and the shape of this region is determined for low- order filters, and general properties are deduced for higher-order filters.
Abstract: If the linear predictive coding filter coefficients are imagined as specifying a point in an n -dimensional space, then the set of all stable filters fills up a certain region in this space. The shape of this region is determined for low-order filters, and general properties are deduced for higher-order filters.

Proceedings ArticleDOI
01 Apr 1985
TL;DR: A new high-quality speech information compression method which introduces techniques of eliminating unnecessary samples of prediction residual wave pulses to obtain a thinned-out residual and produces slightly higher quality speech than does the MPE method.
Abstract: A new high-quality speech information compression method is developed. This method introduces techniques of eliminating unnecessary samples of prediction residual wave pulses to obtain a thinned-out residual. First, a thinning-out procedure which minimizes the quality degradation is formulated. Next, a procedure which simplifies this thinning-out procedure under several hypotheses is defined. Subjective evaluation of this procedure using preference tests confirms that almost no quality degradation occurs. Pitch information is utilized. Adding the process of repetitive use of the thinned-out residual to the procedure, preference tests are carried out at a bit-rate of 9.6 kb/s for purposes of comparison with the newest MPE which includes the pitch prediction process. The results are that our proposed method produces slightly higher quality speech than does the MPE method. The number of processing steps is less than one-third that of MPE.

Proceedings ArticleDOI
01 Apr 1985
TL;DR: A new approach to isolated word recognition is examined, based on an extension of vector quantization speech coding, called matrix quantizationspeech coding, that was developed by Tsao and Gray.
Abstract: A new approach to isolated word recognition is examined. This approach is based on an extension of vector quantization speech coding, called matrix quantization speech coding, that was developed by Tsao and Gray. In this new approach, a codebook containing a set of time-ordered-sequences of speech spectra represents each vocabulary word. A word is recognized by encoding it with each codebook and classifying the input word according to the codebook that yields the smallest distortion. On the digits, this approach achieved a speaker independent recognition accuracy greater than 98%. The approach is described, experimental results are presented, and comparisons with vector quantization based approaches are given.