scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Predictive Coding of Speech at Low Bit Rates

Bishnu S. Atal1
01 Apr 1982-IEEE Transactions on Communications (IEEE)-Vol. 30, Iss: 4, pp 600-614
TL;DR: A new class of speech coders are described which allow one to realize the precise optimum noise spectrum which is crucial to achieving very low bit rates, but also represent the important first step in bridging the gap between waveform coders and vocoders without suffering from their limitations.
Abstract: Predictive coding is a promising approach for speech coding. In this paper, we review the recent work on adaptive predictive coding of speech signals, with particular emphasis on achieving high speech quality at low bit rates (less than 10 kbits/s). Efficient prediction of the redundant structure in speech signals is obviously important for proper functioning of a predictive coder. It is equally important to ensure that the distortion in the coded speech signal be perceptually small. The subjective loudness of quantization noise depends both on the short-time spectrum of the noise and its relation to the short-time spectrum of the Speech signal. The noise in the formant regions is partially masked by the speech signal itself. This masking of quantization noise by speech signal allows one to use low bit rates while maintaining high speech quality. This paper will present generalizations of predictive coding for minimizing subjective distortion in the reconstructed speech signal at the receiver. The quantizer in predictive coders quantizes its input on a sample-by-sample basis. Such sample-by-sample (instantaneous) quantization creates difficulty in realizing an arbitrary noise spectrum, particularly at low bit rates. We will describe a new class of speech coders in this paper which could be considered to be a generalization of the predictive coder. These new coders not only allow one to realize the precise optimum noise spectrum which is crucial to achieving very low bit rates, but also represent the important first step in bridging the gap between waveform coders and vocoders without suffering from their limitations.
Citations
More filters
Proceedings ArticleDOI
26 Apr 1985
TL;DR: A code-excited linear predictive coder in which the optimum innovation sequence is selected from a code book of stored sequences to optimize a given fidelity criterion, indicating that a random code book has a slight speech quality advantage at low bit rates.
Abstract: We describe in this paper a code-excited linear predictive coder in which the optimum innovation sequence is selected from a code book of stored sequences to optimize a given fidelity criterion. Each sample of the innovation sequence is filtered sequentially through two time-varying linear recursive filters, one with a long-delay (related to pitch period) predictor in the feedback loop and the other with a short-delay predictor (related to spectral envelope) in the feedback loop. We code speech, sampled at 8 kHz, in blocks of 5-msec duration. Each block consisting of 40 samples is produced from one of 1024 possible innovation sequences. The bit rate for the innovation sequence is thus 1/4 bit per sample. We compare in this paper several different random and deterministic code books for their effectiveness in providing the optimum innovation sequence in each block. Our results indicate that a random code book has a slight speech quality advantage at low bit rates. Examples of speech produced by the above method will be played at the conference.

1,343 citations

Journal ArticleDOI
01 Nov 1985
TL;DR: This tutorial review presents the basic concepts employed in vector quantization and gives a realistic assessment of its benefits and costs when compared to scalar quantization, and focuses primarily on the coding of speech signals and parameters.
Abstract: Quantization, the process of approximating continuous-amplitude signals by digital (discrete-amplitude) signals, is an important aspect of data compression or coding, the field concerned with the reduction of the number of bits necessary to transmit or store analog data, subject to a distortion or fidelity criterion. The independent quantization of each signal value or parameter is termed scalar quantization, while the joint quantization of a block of parameters is termed block or vector quantization. This tutorial review presents the basic concepts employed in vector quantization and gives a realistic assessment of its benefits and costs when compared to scalar quantization. Vector quantization is presented as a process of redundancy removal that makes effective use of four interrelated properties of vector parameters: linear dependency (correlation), nonlinear dependency, shape of the probability density function (pdf), and vector dimensionality itself. In contrast, scalar quantization can utilize effectively only linear dependency and pdf shape. The basic concepts are illustrated by means of simple examples and the theoretical limits of vector quantizer performance are reviewed, based on results from rate-distortion theory. Practical issues relating to quantizer design, implementation, and performance in actual applications are explored. While many of the methods presented are quite general and can be used for the coding of arbitrary signals, this paper focuses primarily on the coding of speech signals and parameters.

961 citations

Journal ArticleDOI
Joseph Picone1
01 Sep 1993
TL;DR: A tutorial on signal processing in state-of-the-art speech recognition systems is presented, reviewing those techniques most commonly used, and three important trends that have developed in the last five years in speech recognition are examined.
Abstract: A tutorial on signal processing in state-of-the-art speech recognition systems is presented, reviewing those techniques most commonly used. The four basic operations of signal modeling, i.e. spectral shaping, spectral analysis, parametric transformation, and statistical modeling, are discussed. Three important trends that have developed in the last five years in speech recognition are examined. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or time-derivative, spectral information, have become common. Second, similarity transform techniques, often used to normalize and decorrelate parameters in some computationally inexpensive way, have become popular. Third, the signal parameter estimation problem has merged with the speech recognition process so that more sophisticated statistical models of the signal's spectrum can be estimated in a closed-loop manner. The signal processing components of these algorithms are reviewed. >

792 citations


Cites methods from "Predictive Coding of Speech at Low ..."

  • ...There are several ways to do this in an LP model: a stabilized covariance method [59] that reduces the dynamic range in the spectrum, a perceptual-weighting method [60] that broadens the bandwidths of the LP model slightly, or a stabilized autocorrelation method [44] in which a small amount of noise is added to the autocorrelation function....

    [...]

Journal ArticleDOI
Kuldip K. Paliwal1, B. Atal1
TL;DR: It is shown that the split vector quantizer can quantize LPC information in 24 bits/frame with an average spectral distortion of 1 dB and less than 2% of the frames having spectral distortion greater than 2 dB.
Abstract: For low bit rate speech coding applications, it is important to quantize the LPC parameters accurately using as few bits as possible. Though vector quantizers are more efficient than scalar quantizers, their use for accurate quantization of linear predictive coding (LPC) information (using 24-26 bits/frame) is impeded by their prohibitively high complexity. A split vector quantization approach is used here to overcome the complexity problem. An LPC vector consisting of 10 line spectral frequencies (LSFs) is divided into two parts, and each part is quantized separately using vector quantization. Using the localized spectral sensitivity property of the LSF parameters, a weighted LSF distance measure is proposed. With this distance measure, it is shown that the split vector quantizer can quantize LPC information in 24 bits/frame with an average spectral distortion of 1 dB and less than 2% of the frames having spectral distortion greater than 2 dB. The effect of channel errors on the performance of this quantizer is also investigated and results are reported. >

665 citations


Cites methods from "Predictive Coding of Speech at Low ..."

  • ...A tenth-order LPC analysis, based on the stabilized covariance method with high frequency compensation [ 21 ] and error weighting [22], is performed every 20 ms using a 20-ms analysis window....

    [...]

Journal ArticleDOI
James C. Candy1
TL;DR: A modulator that employs double integration and two-level quantization is easy to implement and is tolerant of parameter variation.
Abstract: Sigma delta modulation is viewed as a technique that employs integration and feedback to move quantization noise out of baseband. This technique may be iterated by placing feedback loop around feedback loop, but when three or more loops are used the circuit can latch into undesirable overloading modes. In the desired mode, a simple linear theory gives a good description of the modulation even when the quantization has only two levels. A modulator that employs double integration and two-level quantization is easy to implement and is tolerant of parameter variation. At sampling rates of 1 MHz it provides resolution equivalent to 16 bit PCM for voiceband signals. Digital filters that are suitable for converting the modulation to PCM are also described.

608 citations

References
More filters
Journal ArticleDOI
TL;DR: Application of this method for efficient transmission and storage of speech signals as well as procedures for determining other speechcharacteristics, such as formant frequencies and bandwidths, the spectral envelope, and the autocorrelation function, are discussed.
Abstract: A method of representing the speech signal by time‐varying parameters relating to the shape of the vocal tract and the glottal‐excitation function is described. The speech signal is first analyzed and then synthesized by representing it as the output of a discrete linear time‐varying filter, which is excited by a suitable combination of a quasiperiodic pulse train and white noise. The output of the linear filter at any sampling instant is a linear combination of the past output samples and the input. The optimum linear combination is obtained by minimizing the mean‐squared error between the actual values of the speech samples and their predicted values based on a fixed number of preceding samples. A 10th‐order linear predictor was found to represent the speech signal band‐limited to 5kHz with sufficient accuracy. The 10 coefficients of the predictor are shown to determine both the frequencies and bandwidths of the formants. Two parameters relating to the glottal‐excitation function and the pitch period are determined from the prediction error signal. Speech samples synthesized by this method will be demonstrated.

1,124 citations

Journal ArticleDOI
TL;DR: Part II will give the mathematical criterion for the best predictor for use in the predictive coding of particular messages, will give examples of such messages, and will show that the error term which is transmitted in predictive coding may always be coded efficiently.
Abstract: Predictive coding is a procedure for transmitting messages which are sequences of magnitudes. In this coding method, the transmitter and the receiver store past message terms, and from them estimate the value of the next message term. The transmitter transmits, not the message term, but the difference between it and its predicted value. At the receiver this error term is added to the receiver prediction to reproduce the message term. This procedure is defined and messages, prediction, entropy, and ideal coding are discussed to provide a basis for Part II, which will give the mathematical criterion for the best predictor for use in the predictive coding of particular messages, will give examples of such messages, and will show that the error term which is transmitted in predictive coding may always be coded efficiently.

452 citations

Journal ArticleDOI
TL;DR: New results of masking and loudness reduction of noise are reported and the design principles of speech coding systems exploiting auditory masking are described.
Abstract: In any speech coding system that adds noise to the speech signal, the primary goal should not be to reduce the noise power as much as possible, but to make the noise inaudible or to minimize its subjective loudness. ’’Hiding’’ the noise under the signal spectrum is feasible because of human auditory masking: sounds whose spectrum falls near the masking threshold of another sound are either completely masked by the other sound or reduced in loudness. In speech coding applications, the ’’other sound’’ is, of course, the speech signal itself. In this paper we report new results of masking and loudness reduction of noise and describe the design principles of speech coding systems exploiting auditory masking.

434 citations

Journal ArticleDOI
TL;DR: Improved speech quality is obtained by efficient removal of formant and pitch-related redundant structure of speech before quantizing, and by effective masking of the quantizer noise by the speech signal.
Abstract: Predictive coding methods attempt to minimize the rms error in the coded signal. However, the human ear does not perceive signal distortion on the basis of rms error, regardless of its spectral shape relative to the signal spectrum. In designing a coder for speech signals, it is necessary to consider the spectrum of the quantization noise and its relation to the speech spectrum. The theory of auditory masking suggests that noise in the formant regions would be partially or totally masked by the speech signal. Thus, a large part of the perceived noise in a coder comes from frequency regions where the signal level is low. In this paper, methods for reducing the subjective distortion in predictive coders for speech signals are described and evaluated. Improved speech quality is obtained: 1) by efficient removal of formant and pitch-related redundant structure of speech before quantizing, and 2) by effective masking of the quantizer noise by the speech signal.

376 citations

Journal ArticleDOI
TL;DR: Preliminary studies suggest that the binary difference signal and the predictor parameters together can be transmitted at approximately 10 kilobits/second which is several times less than the bit rate required for log-PCM encoding with comparable speech quality.
Abstract: We describe in this paper a method for efficient encoding of speech signals, based on predictive coding. In this coding method, both the transmitter and the receiver estimate the signal's current value by linear prediction on the previously transmitted signal. The difference between this estimate and the true value of the signal is quantized, coded and transmitted to the receiver. At the receiver, the decoded difference signal is added to the predicted signal to reproduce the input speech signal. Because of the nonstationary nature of the speech signals, an adaptive linear predictor is used, which is readjusted periodically to minimize the mean-square error between the predicted and the true value of the signals. The predictive coding system was simulated on a digital computer. The predictor parameters, comprising one delay and nine other coefficients related to the signal spectrum, were readjusted every 5 milliseconds. The speech signal was sampled at a rate of 6.67 kHz, and the difference signal was quantized by a two-level quantizer with variable step size. Subjective comparisons with speech from a logarithmic PCM encoder (log-PCM) indicate that the quality of the synthesized speech signal from the predictive coding system is approximately equal to that of log-PCM speech encoded at 6 bits/sample. Preliminary studies suggest that the binary difference signal and the predictor parameters together can be transmitted at approximately 10 kilobits/second which is several times less than the bit rate required for log-PCM encoding with comparable speech quality.

291 citations