scispace - formally typeset
Search or ask a question

Showing papers on "Linear predictive coding published in 1992"


PatentDOI
TL;DR: In this paper, a variable rate coding of frames of digitized speech samples is proposed, comprising the steps of determining a level of speech activity for a frame of digitised speech samples, selecting an encoding rate from a set of rates based upon the determined level of activity within said frame, and coding said frame according to a predetermined coding format for said selected rate wherein each rate has a corresponding different coding format.
Abstract: A method of speech signal compression, by variable rate coding of frames of digitized speech samples, comprising the steps of: determining a level of speech activity for a frame of digitized speech samples; selecting an encoding rate from a set of rates based upon said determined level of speech activity within said frame; coding said frame according to a predetermined coding format for said selected rate wherein each rate has a corresponding different coding format; providing for said frame a corresponding output data packet at said selected rate.

552 citations


Journal ArticleDOI
Yariv Ephraim1
01 Oct 1992
TL;DR: A unified statistical approach for the three basic problems of speech enhancement is developed, using composite source models for the signal and noise and a fairly large set of distortion measures.
Abstract: Since the statistics of the speech signal as well as of the noise are not explicitly available, and the most perceptually meaningful distortion measure is not known, model-based approaches have recently been extensively studied and applied to the three basic problems of speech enhancement: signal estimation from a given sample function of noisy speech, signal coding when only noisy speech is available, and recognition of noisy speech signals in man-machine communication. Research on the model-based approach is integrated and put into perspective with other more traditional approaches for speech enhancement. A unified statistical approach for the three basic problems of speech enhancement is developed, using composite source models for the signal and noise and a fairly large set of distortion measures. >

383 citations


Proceedings ArticleDOI
23 Mar 1992
TL;DR: The authors have developed a technique that is more robust to such steady-state spectral factors in speech that is conceptually simple and computationally efficient.
Abstract: Most speech parameter estimation techniques are easily influenced by the frequency response of the communication channel. The authors have developed a technique that is more robust to such steady-state spectral factors in speech. The approach is conceptually simple and computationally efficient. The new method is described, and experimental results are proposed that show significant advantages for the proposed method. >

297 citations


Journal ArticleDOI
TL;DR: A time-scale modification system that preserves shape-invariant joint time- scale and pitch modification during voicing is developed using a version of the sinusoidal analysis-synthesis system that models and independently modifies the phase contributions of the vocal tract and vocal cord excitation.
Abstract: The simplified linear model of speech production predicts that when the rate of articulation is changed, the resulting waveform takes on the appearance of the original, except for a change in the time scale. A time-scale modification system that preserves this shape-invariance property during voicing is developed. This is done using a version of the sinusoidal analysis-synthesis system that models and independently modifies the phase contributions of the vocal tract and vocal cord excitation. An important property of the system is its ability to perform time-varying rates of change. Extensions of the method are applied to fixed and time-varying pitch modification of speech. The sine-wave analysis-synthesis system also allows for shape-invariant joint time-scale and pitch modification, and allows for the adjustment of the time scale and pitch according to speech characteristics such as the degree of voicing. >

245 citations


Journal ArticleDOI
J.-H. Chen1, Richard V. Cox1, Y.-C. Lin, Nuggehally Sampath Jayant2, M.J. Melchner2 
TL;DR: The official CCITT laboratory tests revealed that the speech quality of this 16 kb/s LD-CELP coder is either equivalent to or better than that of the CCITT G.721 standard 32-kb/s ADPCM coder for almost all conditions tested.
Abstract: A low-delay code-excited linear prediction (LD-CELP) speech coder which is expected to be standardized in 1992 as a CCITT G Series Recommendation for universal applications of speech coding at 16 kb/s is presented. The coder achieves a one-way coding delay of less than 2 ms by making both the LPC predictor and the excitation gain backward-adaptive and by using a small excitation vector size of five samples. The official CCITT laboratory tests revealed that the speech quality of this 16 kb/s LD-CELP coder is either equivalent to or better than that of the CCITT G.721 standard 32-kb/s ADPCM coder for almost all conditions tested. A description of the LD-CELP algorithm, its implementation on the DSP32C for CCITT testing, and performance results from these tests are presented. >

206 citations


PatentDOI
TL;DR: A speech coding apparatus compares the closeness of the feature value of a featurevector signal of an utterance to the parameter values of prototype vector signals to obtain prototype match scores for the feature vector signal and each prototype vector signal.
Abstract: A speech coding apparatus compares the closeness of the feature value of a feature vector signal of an utterance to the parameter values of prototype vector signals to obtain prototype match scores for the feature vector signal and each prototype vector signal. The speech coding apparatus stores a plurality of speech transition models representing speech transitions. At least one speech transition is represented by a plurality of different models. Each speech transition model has a plurality of model outputs, each comprising a prototype match score for a prototype vector signal. Each model output has an output probability. A model match score for a first feature vector signal and each speech transition model comprises the output probability for at least one prototype match score for the first feature vector signal and a prototype vector signal. A speech transition match score for the first feature vector signal and each speech transition comprises the best model match score for the first feature vector signal and all speech transition models representing the speech transition. The identification value of each speech transition and the speech transition match score for the first feature vector signal and each speech transition are output as a coded utterance representation signal of the first feature vector signal.

176 citations


Proceedings ArticleDOI
23 Mar 1992
TL;DR: The authors present a method for segmenting speech waveforms containing several speakers into utterances, each from one individual, and then identifying each utterance as coming from a specific individual or group of individuals.
Abstract: The authors present a method for segmenting speech waveforms containing several speakers into utterances, each from one individual, and then identifying each utterance as coming from a specific individual or group of individuals. The procedure is unsupervised in that there is no training set, and sequential in that information obtained in early stages of the process is utilized in later stages. >

77 citations


Proceedings ArticleDOI
23 Mar 1992
TL;DR: The authors discuss the application of generalized analysis-by-synthesis coding to the pitch predictor of a code excited linear predictor (CELP) coder, which makes it possible to transmit the pitch prediction parameters at a much lower rate than conventional approaches, without compromising speech quality.
Abstract: Many modifications can be applied to a speech signal without changing its perceptual quality. For a particular speech coder, the coding efficiency will differ for distinct modifications. To exploit this, the authors introduced a generalized analysis-by-synthesis procedure. In this procedure, a search is performed over a multitude of modified original signals (on a blockwise basis), and the signal which can be encoded with the least distortion is selected for transmission. At the receiver, a quantized version of this modified original signal is constructed. The authors discuss the application of generalized analysis-by-synthesis coding to the pitch predictor of a code excited linear predictor (CELP) coder. The use of this technique makes it possible to transmit the pitch predictor parameters at a much lower rate than conventional approaches, without compromising speech quality. >

56 citations


Journal ArticleDOI
TL;DR: Two procedures for the detection of laryngeal pathology were developed: a spectral distortion measure using pitch synchronous and asynchronous methods with linear predictive coding vectors and vector quantization, and analysis of the EGG signal using time interval and amplitude difference measures.
Abstract: The purpose of this research was to develop quantitative measures for the assessment of laryngeal function using speech and electroglottographic (EGG) data Two procedures for the detection of laryngeal pathology were developed: (1) a spectral distortion measure using pitch synchronous and asynchronous methods with linear predictive coding (LPC) vectors and vector quantization (VQ), and (2) analysis of the EGG signal using time interval and amplitude difference measures The VQ procedure was conjectured to offer the possibility of circumventing the need to estimate the glottal volume velocity waveform by inverse filtering techniques The EGG procedure was to evaluate data that was 'nearly' a direct measure of vocal fold vibratory motion and thus was conjectured to offer the potential for providing an excellent assessment of laryngeal function A threshold based procedure gave 759 and 690% probability of pathological detection using procedures (1) and (2), respectively, for 29 patients with pathological voices and 52 normal subjects The false alarm probability was 96% for the normal subjects >

50 citations


Journal ArticleDOI
TL;DR: The aim of the present paper is to extend the use of the LSF representation for more general speech recognition systems and to widen the scope of its results.

49 citations


PatentDOI
TL;DR: In this paper, a speech signal is received into a bank of bandpass filters and the instantaneous amplitude modulation and frequency modulation of each harmonic in the speech waveform is determined, for example, by computing a weighted average of the frequency modulations of the harmonics.
Abstract: A method and apparatus for extracting information from human speech are disclosed. A speech signal is received into a bank of bandpass filters and the instantaneous amplitude modulation and frequency modulation of each harmonic in the speech waveform is determined. A logarithm of the instantaneous frequency of the speech fundamental frequency is determined, for example, by computing a weighted average of the frequency modulations of the harmonics. An output signal is formed having the logarithm of the frequency of the thus determined speech fundamental and the logarithms of the amplitude modulation for the ten lowest frequency speech harmonics and/or the speech envelope.

PatentDOI
TL;DR: In this paper, a method and system are provided for alleviating the harmful effects of convolutional distortions of speech, such as the effect of a telecommunication channel, on the performance of an automatic speech recognizer (ASR).
Abstract: A method and system are provided for alleviating the harmful effects of convolutional distortions of speech, such as the effect of a telecommunication channel, on the performance of an automatic speech recognizer (ASR). The technique is based on the filtering of time trajectories of an auditory-like spectrum derived from the Perceptual Linear Predictive (PLP) method of speech parameter estimation.

Proceedings ArticleDOI
23 Mar 1992
TL;DR: The improved PS-VXC coder operated by the authors has a subjective performance closely matching that of the 4.8 kb/s DoD CELP coder.
Abstract: Several major modifications to the phonetically segmented vector excitation coding (PS-VXC) coder by the authors (1989, 1990) reported previously have resulted in enhanced speech quality while reducing the delay, complexity, and bit rate. Speech is segmented into variable-length phonetic classes and a VXC coding module is tailored to each class. Coding techniques include adaptive linear predictive coding (LPC) analysis and interpolation, two-stage excitation coding of onsets, comb filtering, modified perceptual weighting, and pitch contour smoothing. The improved PS-VXC coder operates at a peak rate of 3.4 kb/s with an average rate of 3.0 kb/s and has a subjective performance closely matching that of the 4.8 kb/s DoD CELP coder. >

Proceedings ArticleDOI
23 Mar 1992
TL;DR: The authors present a tree searched multi-stage vector quantization scheme which achieves spectral distortion lower than 1 dB with low complexity and good robustness using 24 b/frame and it is shown that TS-MSVQ significantly outperforms the split-codebook approach.
Abstract: The authors present a tree searched multi-stage vector quantization (TS-MSVQ) scheme which achieves spectral distortion lower than 1 dB with low complexity and good robustness using 24 b/frame. The M-L search is used and it is shown that it achieves performance close to that of the optimal search for a relatively small M. The best performance/complexity trade-offs are obtained with relatively small size codebooks cascaded in a three-four stage configuration. Results for log-area ratio (LAR) and line spectral pain (LSP) parameters are presented. A training technique which reduces outliers at the expense of a slight average performance degradation is introduced. The robustness across different languages and input spectral shapings is studied. Finally, it is shown that TS-MSVQ significantly outperforms the split-codebook approach. >

Journal ArticleDOI
TL;DR: Two new real functions defined from the reciprocal and antireciprocal parts of the predictor polynomials obtained from the split Levinson algorithm are proposed and shown to obey three-term recurrence relations.

Journal ArticleDOI
TL;DR: A new structure called the product code HMM uses two independent HMM per class, one for spectral shape and one for gain, which outperformed the conventional structure with an accuracy of over 96% for three classes.
Abstract: Linear predictive coding (LPC), vector quantization (VQ), and hidden Markov models (HMMs) are three popular techniques from speech recognition which are applied in modeling and classifying nonspeech natural sounds. A new structure called the product code HMM uses two independent HMM per class, one for spectral shape and one for gain. Classification decisions are made by scoring shape and gain index sequences from a product code VQ. In a series of classification experiments, the product code structure outperformed the conventional structure, with an accuracy of over 96% for three classes. >

Proceedings ArticleDOI
23 Mar 1992
TL;DR: Using the original method developed by Laforia, a series of text-independent speaker recognition experiments, characterized by a long-term multivariate auto-regressive modelization, gives first-rate results without using more than one sentence.
Abstract: Two models, the temporal decomposition and the multivariate linear prediction, of the spectral evolution of speech signals capable of processing some aspects of the speech variability are presented. A series of acoustic-phonetic decoding experiments, characterized by the use of spectral targets of the temporal decomposition techniques and a speaker-dependent mode, gives good results compared to a reference system (i.e., 70% vs. 60% for the first choice). Using the original method developed by Laforia, a series of text-independent speaker recognition experiments, characterized by a long-term multivariate auto-regressive modelization, gives first-rate results (i.e., 98.4% recognition rate for 420 speakers) without using more than one sentence. Taking into account the interpretation of the models, these results show how interesting the cinematic models are for obtaining a reduced variability of the speech signal representation. >

Proceedings ArticleDOI
23 Mar 1992
TL;DR: Simulation results reveal that KF based speech coding has significant advantage over the equivalent LP based systems, particularly when used with coarsely quantized measurements.
Abstract: The use of Kalman filtering (KF) techniques in speech coding is investigated. The authors show that the common linear predictor (LP) is a special case of the KF based on an all-pole signal model. They also show that the KF algorithm provides fixed-lag smoothing at no additional complexity. Simulation results reveal that KF based speech coding has significant advantage over the equivalent LP based systems, particularly when used with coarsely quantized measurements. >

PatentDOI
Motoaki Koyama1
TL;DR: In this paper, a speech segment detector is used to detect speech segments and a reference pattern memory for storing reference patterns, and a speech recognition section for comparing the detected speech segment detected by the detector with the reference patterns stored in the Reference Pattern Memory and selecting the reference pattern most similar to that of the speech segment.
Abstract: A speech recognition LSI system comprises a speech segment detector for detecting a speech segment from a speech segment detected, a reference pattern memory for storing reference patterns, and a speech recognition section for comparing the speech segment detected by the detector with the reference patterns stored in the reference pattern memory and selecting the reference pattern most similar to that of the speech segment. The system further comprises a recording/reproduction device for recording the speech signal and for reproducing only the speech segment the speech segment detector has detected, so that an operator can hear the speech segment.

Proceedings ArticleDOI
23 Mar 1992
TL;DR: A novel spectral coding method, two-dimensional differential line spectra pair coding (2DdLSP), is proposed, taking advantage of the strong inter-frame, and intra-frame correlation of LSP parameters to reduce the variance of the parameters to be quantized.
Abstract: A novel spectral coding method, two-dimensional differential line spectra pair coding (2DdLSP), is proposed Taking advantage of the strong inter-frame, and intra-frame correlation of LSP parameters, a two-dimensional linear prediction technique is used to reduce the variance of the parameters to be quantized One scalar quantization and two vector quantization schemes are designed to quantize the 2-D prediction residuals Without further buffering delay, the spectral distortion of 1 dB/sup 2/ can be achieved at 19 b/frame when the frame period is 10 ms Both within- and out-of-training tests show the robustness of the method to speech data variance >

Proceedings ArticleDOI
23 Mar 1992
TL;DR: The improved LPC vocoder performs much better in acoustic background noise, and it produces natural sounding speech in both quiet and noisy environments.
Abstract: A number of improvements to the mixed excitation linear predictive coding (LPC) vocoder are presented. First, the authors have added more sophisticated frequency shaping of the pulse and noise in the mixture. They use a bandpass filter bank to attain a staircase approximation to any desired noise shape. Voicing strength in each frequency band is controlled by periodicity analysis of both the bandpass filtered speech and the bandpass speech envelope. Second, the authors have improved their pitch detection algorithm by using separate searches on the LPC residual and the input speech signal. Finally, they have added a fixed pulse shaping filter based on a spectrally flattened synthetic glottal pulse. The improved LPC vocoder performs much better in acoustic background noise, and it produces natural sounding speech in both quiet and noisy environments. >

Proceedings ArticleDOI
23 Mar 1992
TL;DR: An approach to text-independent speaker verification that uses a two-stage classifier that consists of a speaker-independent phoneme detector trained to recognize a phoneme that is distinctive from speaker to speaker.
Abstract: Text-independent speaker verification systems typically depend upon averaging over a long utterance to obtain a feature set for classification. However, not all speech is equally suited to the task of speaker verification. An approach to text-independent speaker verification that uses a two-stage classifier is presented. The first stage consists of a speaker-independent phoneme detector trained to recognize a phoneme that is distinctive from speaker to speaker. The second stage is trained to recognize the frames of speech from the target speaker that are admitted by the phoneme detector. A common feature vector based on the linear predictive coding (LPC) cepstrum is projected in different directions for each of these pattern recognition tasks. Results of tests using the described speaker verification system are shown. >

Proceedings ArticleDOI
23 Mar 1992
TL;DR: An algorithm for 2.4 kb/s speech coding is described, which results in a better compromise between bit allocation for short-term quantization and residual coding and an improved high-frequency regeneration.
Abstract: An algorithm for 2.4 kb/s speech coding is described. The main problem addressed is the coding of voiced speech. A way of coding the pitch structure is introduced. Compared with traditional coding schemes, it results in a better compromise between bit allocation for short-term quantization and residual coding. The coder uses vector quantization of the short-term parameters (line spectrum frequencies). The residual is lowpass filtered to obtain the baseband signal. Unvoiced frames are coded by means of a method based on repetition and interpolation of pitch pulses. The method exploits the high correlation between pitch pulses. Harmonic postfiltering is applied to obtain an improved high-frequency regeneration. >

Proceedings ArticleDOI
23 Mar 1992
TL;DR: A signal approximation via data-adaptive normalized Gaussian functions is presented, which resembles the traditional Gabor expansion, but it is more precise and efficient.
Abstract: A signal approximation via data-adaptive normalized Gaussian functions is presented. This approach resembles the traditional Gabor expansion, but it is more precise and efficient. Numerical simulations for the speech signal are included to demonstrate the effectiveness of the new scheme. >

Proceedings ArticleDOI
06 Dec 1992
TL;DR: The real-time implementation of a wideband ACELP speech coder at 9.6 kb/s is presented and the quality of the encoded wideband speech was judged vastly superior to that of the original narrowband speech.
Abstract: The real-time implementation of a wideband ACELP speech coder at 9.6 kb/s is presented. The coder is implemented on a TMS320C30 floating-point DSP chip. The attempt to implement an ACELP coder for wideband speech in real time results in 3-4 times more complexity than that for narrowband speech. Very efficient algorithms for searching the pitch and codebook parameters have been introduced. The pitch search was brought down to 20% of real time by the combination of an efficient open-loop approach and a decimation procedure. The excitation search complexity was significantly reduced by using two codebooks. The first models the main features in the excitation and is very efficiently searched using focused search. The second has a simple structure and does not need exhaustive search. The quality of the encoded wideband speech at 9.6 kb/s was judged vastly superior to that of the original narrowband speech. >


Proceedings ArticleDOI
23 Mar 1992
TL;DR: Four voice packet reconstruction methods used for speech coded by code excited linear prediction (CELP)-type speech coders are described and their performance is discussed.
Abstract: Four voice packet reconstruction methods used for speech coded by code excited linear prediction (CELP)-type speech coders are described. In the first method, the authors generalize the waveform substitution technique originally developed for the PCM coded speech to the CELP speech coding. In the second method, a priority level is assigned to each speech frame to protect against those perceptually important and hard-to-reconstruct speech frames being lost. The third and fourth methods both split the information bits in a frame into two groups of different levels of importance. In method three, the bits for representing the filter parameters are given high priority and bits for representing the excitation signals are given low priority. Method four is an embedded coding technique based on two-stage CELP. The four methods were tested in combination with a simulated voice activity and queuing model and their performance is discussed. >

Patent
Willem Bastiaan Kleijn1
14 Dec 1992
TL;DR: In this article, a method and apparatus for processing a reconstructed speech signal from an analysis-by-synthesis decoder are provided to improve the quality of reconstructed speech by using smoothing techniques.
Abstract: A method and apparatus for processing a reconstructed speech signal from an analysis-by-synthesis decoder are provided to improve the quality of reconstructed speech. By operation of the invention, one or more traces in a reconstructed speech signal are identified. Traces are sequences of like-features in the reconstructed speech signal. The like-features are identified by time-distance data received from the long term predictor of the decoder. The identified traces are smoothed by one of the known smoothing techniques. A smoothed version of the reconstructed speech signal is formed by combining one or more of the smoothed traces. The original reconstructed speech signal may be that provided by a long term predictor of the decoder. Values of the reconstructed speech signal and smoothed speech signal may be combined based on a measure of periodicity in speech.

Book
01 Jan 1992
TL;DR: The application of Audio/Speech Recognition for Military Requirements and Quality Evaluation of Speech Processing Systems is studied.
Abstract: 1: Overview of Voice Communications and Speech Processing.- 2: The Speech Signal.- 3: Speech Coding.- 4: Voice Interactive Information Systems.- 5: Speech Recognition Based on Pattern Recognition Approaches.- 6: Quality Evaluation of Speech Processing Systems.- 7: Speech Processing Standards.- 8: Application of Audio/Speech Recognition for Military Requirements.- Selective Bibliography with Abstract.

Patent
Mei Yong1
21 Sep 1992
TL;DR: In this paper, a priority assignment method and device for assigning a priority to a selected speech frame coded by a linear predictive coder based on at least two of: an energy of the speech frame, a log spectral distance between a frame and a frame immediately previous, and a pitch predictor coefficient for the selected frame.
Abstract: A priority assignment method and device are set forth for assigning a priority to a selected speech frame coded by a linear predictive coder based on at least two of: an energy of the speech frame, a log spectral distance between a frame and a frame immediately previous, and a pitch predictor coefficient for the selected speech frame. The invention protects against loss of perceptually important and hard-to-reconstruct speech frames.