scispace - formally typeset
Search or ask a question
Topic

Linear predictive coding

About: Linear predictive coding is a research topic. Over the lifetime, 6565 publications have been published within this topic receiving 142991 citations. The topic is also known as: Linear predictive coding, LPC.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, Hidden Markov Models (HMMs) and Gaussian Mixture Models (GMMs) are used for generating low-level speech waveforms from high-level symbolic inputs via intermediate acoustic feature sequences.
Abstract: Hidden Markov models (HMMs) and Gaussian mixture models (GMMs) are the two most common types of acoustic models used in statistical parametric approaches for generating low-level speech waveforms from high-level symbolic inputs via intermediate acoustic feature sequences. However, these models have their limitations in representing complex, nonlinear relationships between the speech generation inputs and the acoustic features. Inspired by the intrinsically hierarchical process of human speech production and by the successful application of deep neural networks (DNNs) to automatic speech recognition (ASR), deep learning techniques have also been applied successfully to speech generation, as reported in recent literature.

203 citations

Journal ArticleDOI
TL;DR: Using the generalized baseband coder formulation, it is demonstrated that under reasonable assumptions concerning the weighting filter, an attractive low-complexity/high-quality coder can be obtained.
Abstract: This paper describes an effective and efficient time domain speech encoding technique that has an appealing low complexity, and produces toll quality speech at rates below 16 kbits/s. The proposed coder uses linear predictive techniques to remove the short-time correlation in the speech signal. The remaining (residual) information is then modeled by a low bit rate reduced excitation sequence that, when applied to the time-varying model filter, produces a signal that is "close" to the reference speech signal. The procedure for finding the optimal constrained excitation signal incorporates the solution of a few strongly coupled sets of linear equations and is of moderate complexity compared to competing coding systems such as adaptive transform coding and multipulse excitation coding. The paper describes the novel coding idea and the procedure for finding the excitation sequence. We then show that the coding procedure can be considered as an "optimized" baseband coder with spectral folding as high-frequency regeneration technique. The effect of various analysis parameters on the quality of the reconstructed speech is investigated using both objective and subjective tests. Further, modifications of the basic algorithm, and their impact on both the quality of the reconstructed speech signal and the complexity of the encoding algorithm, are discussed. Using the generalized baseband coder formulation, we demonstrate that under reasonable assumptions concerning the weighting filter, an attractive low-complexity/high-quality coder can be obtained.

202 citations

Journal ArticleDOI
TL;DR: It is shown experimentally that as the number of stages is increased above the optimal performance/complexity tradeoff, the quantizer robustness and outlier performance can be improved at the expense of a slight increase in rate.
Abstract: A tree-searched multistage vector quantization (VQ) scheme for linear prediction coding (LPC) parameters which achieves spectral distortion lower than 1 dB with low complexity and good robustness using rates as low as 22 b/frame is presented. The M-L search is used, and it is shown that it achieves performance close to that of the optimal search for a relatively small M. A joint codebook design strategy for multistage VQ which improves convergence speed and the VQ performance measures is presented. The best performance/complexity tradeoffs are obtained with relatively small size codebooks cascaded in a 3-6 stage configuration. It is shown experimentally that as the number of stages is increased above the optimal performance/complexity tradeoff, the quantizer robustness and outlier performance can be improved at the expense of a slight increase in rate. Results for log area ratio (LAR) and line spectral pairs (LSPs) parameters are presented. A training technique that reduces outliers at the expense of a slight average performance degradation is introduced. The method significantly outperforms the split codebook approach. >

201 citations

Journal ArticleDOI
TL;DR: Experimental results show that the use of a priori information and the calculation of the instantaneous speech and noise excitation variances on a frame-by-frame basis result in good performance in both stationary and nonstationary noise conditions.
Abstract: In this paper, we present a new technique for the estimation of short-term linear predictive parameters of speech and noise from noisy data and their subsequent use in waveform enhancement schemes. The method exploits a priori information about speech and noise spectral shapes stored in trained codebooks, parameterized as linear predictive coefficients. The method also uses information about noise statistics estimated from the noisy observation. Maximum-likelihood estimates of the speech and noise short-term predictor parameters are obtained by searching for the combination of codebook entries that optimizes the likelihood. The estimation involves the computation of the excitation variances of the speech and noise auto-regressive models on a frame-by-frame basis, using the a priori information and the noisy observation. The high computational complexity resulting from a full search of the joint speech and noise codebooks is avoided through an iterative optimization procedure. We introduce a classified noise codebook scheme that uses different noise codebooks for different noise types. Experimental results show that the use of a priori information and the calculation of the instantaneous speech and noise excitation variances on a frame-by-frame basis result in good performance in both stationary and nonstationary noise conditions.

200 citations

Journal ArticleDOI
TL;DR: Recent results obtained in waveform coding of speech with vector quantization are reviewed, with Vector quantization appearing to be a suitable coding technique which caters to this dual requirement of effective speech coding.
Abstract: V ECTOR QUANTIZATION (VQ), a new direction in source coding, has recently emerged as a powerful and widely applicable coding technique. I t was first applied to analysis/synthesis of speech, and has allowed Linear Predictive Coding (LPC) rates to be dramatically reduced to 800 b/s with very slight reduction in quality, and further compressed to rates as low as 150 b/s while retaining intelligibility [ 1,2]. More recently, the technique has found its way to waveform coding [3-51, where its applicability and effectiveness is less obvious and not widely known. There is currently a great need for a low-complexity speech coder at the rate of 16 kb/s which attains essentially “toll” quality, roughly equivalent to that of standard 64-kb/s log PCM codecs. Adaptive DPCM schemes can attain this quality with low complexity for the proposed 32 kb/s CCITT standard, but at 16 kb/s the quality of ADPCM or adaptive delta modulation schemes is inadequate. More powerful methods, such as subband coding or transform coding, are capable of producing acceptable speech quality at 16kb/s but have a much higher implementation complexity. The difficulty is further compounded by the need for a scheme that can handle both speech and voiceband data at the 16 kb/s rate. These two types of waveforms occupy the same bandwidth in the subscriber loop part of the telephone network, yet they have a widely different statistical character. Effective speech coding at this rate must be geared to the specific character of speech and must exploit our knowledge of human hearing. On the other hand, a waveform that carries data must be coded and later reconstructed so that a modem can still extract the data with an acceptably low error rate. This is purely a signal processing operation not involving human perception. Vector quantization appears to be a suitable coding technique which caters to this dual requirement. VQ may become the key to 16 kb/s coding; it may also lead to improved quality waveform coding at 8 or 9.6 kb/s. In this paper, we review recent results obtained in waveform coding of speech with vector quantization and

198 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Noise
110.4K papers, 1.3M citations
81% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Feature vector
48.8K papers, 954.4K citations
80% related
Filter (signal processing)
81.4K papers, 1M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202225
202126
202042
201925
201837