scispace - formally typeset
Search or ask a question

Showing papers on "Linear predictive coding published in 1981"


Journal ArticleDOI
TL;DR: In this paper, a set of functions of time obtained from acoustic analysis of a fixed, sentence-long utterance are extracted by means of LPC analysis successively throughout an utterance to form time functions, and frequency response distortions introduced by transmission systems are removed.
Abstract: This paper describes new techniques for automatic speaker verification using telephone speech. The operation of the system is based on a set of functions of time obtained from acoustic analysis of a fixed, sentence-long utterance. Cepstrum coefficients are extracted by means of LPC analysis successively throughout an utterance to form time functions, and frequency response distortions introduced by transmission systems are removed. The time functions are expanded by orthogonal polynomial representations and, after a feature selection procedure, brought into time registration with stored reference functions to calculate the overall distance. This is accomplished by a new time warping method using a dynamic programming technique. A decision is made to accept or reject an identity claim, based on the overall distance. Reference functions and decision thresholds are updated for each customer. Several sets of experimental utterances were used for the evaluation of the system, which include male and female utterances recorded over a conventional telephone connection. Male utterances processed by ADPCM and LPC coding systems were used together with unprocessed utterances. Results of the experiment indicate that verification error rate of one percent or less can be obtained even if the reference and test utterances are subjected to different transmission conditions.

1,187 citations


Journal ArticleDOI
TL;DR: An information theory approach to the theory and practice of linear predictive coded speech compression systems is developed and it is shown that a traditional LPC system can be viewed as a minimum distortion or nearest-neighbor system where the distortion measure is a minimum discrimination information between a speech process model and an observed frame of actual speech.
Abstract: An information theory approach to the theory and practice of linear predictive coded (LPC) speech compression systems is developed. It is shown that a traditional LPC system can be viewed as a minimum distortion or nearest-neighbor system where the distortion measure is a minimum discrimination information between a speech process model and an observed frame of actual speech. This distortion measure is used in an algorithm for computer-aided design of block source codes subject to a fidelity criterion to obtain a 750-bits/s speech compression system that resembles an LPC system but has a much lower rate, a larger memory requirement, and requires no on-line LPC analysis. Quantitative and informal subjective comparisons are made among our system and LPC systems.

217 citations


Journal ArticleDOI
TL;DR: A low bit-rate vocoder designed for improved speech reproduction quality and robustness is described, designed around a new algorithm, the spectral envelope estimator, which forms the nucleus of the spectral analyzer.
Abstract: This paper describes a low bit-rate vocoder designed for improved speech reproduction quality and robustness. The vocoder is designed around a new algorithm, the spectral envelope estimator, which forms the nucleus of the spectral analyzer. In addition to estimating the speech spectrum, the spectral analyzer also allows determination of a continuous estimate of the background noise spectrum, which is used for noise suppression. A maximum-likelihood pitch estimator, which shares the signal processing of the spectral envelope estimator, has been integrated into the vocoder to yield accurate pitch estimates of noisy speech. This system is capable of good quality speech reproduction at bit rates down to 2.4 kbits/s.

130 citations


01 Jun 1981

108 citations


Proceedings ArticleDOI
01 Apr 1981
TL;DR: An LPC base-band vocoder is developed and experiments have shown the coder to be robust to background noise and implementation aspects as well as simulation results are discussed.
Abstract: An LPC base-band vocoder is developed. The novel feature concerns the coding of the base-band. A model is set up for the base-band as a set of modulated tones. Algorithms are presented for the extraction of amplitude and phase/frequency of the tones. Implementation aspects as well as simulation results are discussed. Total bit rates in the order of 3,2-4.8 kbits are possible where approximately one half of the bits represents the base-band coding. Experiments have shown the coder to be robust to background noise.

67 citations


Journal ArticleDOI
TL;DR: Experimental results showed the speech quality to be comparable to and slightly better than that produced by an auto-correlation LPC vocoder using a Hamming window.
Abstract: A method for recursively computing the autocorrelation estimates needed for LPC analysis in a vocoder environment has been developed theoretically and studied experimentally. The method has three specific advantages: 1) it requires very little memory for its implementation; 2) it is realized by a structure consisting of several identical modules; and 3) the effective window length may be changed without varying the structure. Experimental results showed the speech quality to be comparable to and slightly better than that produced by an auto-correlation LPC vocoder using a Hamming window.

66 citations


Journal ArticleDOI
TL;DR: A modified autocorrelation method of linear prediction is proposed for pitch-synchronous analysis of voiced speech that guarantees the stability of the estimated all-pole filter and is shown to perform better than the covariance and autcorrelation methods of linear Prediction.

22 citations


Proceedings ArticleDOI
Hermann Ney1
01 Apr 1981
TL;DR: An optimization technique for locating the initial and final points of utterances by means of dynamic programming and results are presented for end-point detection in a speaker recognition system using only the speech intensity as acoustic parameter.
Abstract: This paper describes an optimization technique for locating the initial and final points of utterances. Acoustic parameters extracted from each signal segment are converted into a cost function versus time. An overall cost for the presence of a speech signal is introduced and is to be optimized with respect to the unknown initial and final points. The optimization is carried out by means of dynamic programming. The computation grows linearly with the number of segments. In a second stage, the locations of the obtained endpoints are refined by matching, transition templates against the input signal. Results are presented for end-point detection in a speaker recognition system using only the speech intensity as acoustic parameter.

21 citations


Proceedings ArticleDOI
01 Apr 1981
TL;DR: A technique for improving speech reception for persons with high frequency hearing loss based on pitch invariant frequencg lowering of the short term spectral envelope is described and spectrographic examples illustrating the range of transformations achieved are presented.
Abstract: A technique for improving speech reception for persons with high frequency hearing loss based on pitch invariant frequencg lowering of the short term spectral envelope is described. The speech signal is segmented pitch synchronously, processed by a technique described by oppenheim and Johnson (1) to achieve nonuniform spectral warping, dilated in time to achieve frequencg lowering, and resynthesized with the original periodicity. Both the overall bandwidth reduction and the relative compression of low and high frequency components can be specified. Spectrographic examples illustrating the range of transformations achieved are presented to illustrate the capability of the system.

20 citations


PatentDOI
TL;DR: A multiple rate voice processing system incorporating a complete linear predictive coding algorithm wherein the algorithm is partitioned among a plurality of integrated circuit chips so that all communications between chips occur at low data rates.
Abstract: A multiple rate voice processing system incorporating a complete linear predictive coding algorithm wherein the algorithm is partitioned among a plurality of integrated circuit chips so that all communications between chips occur at low data rates.

20 citations


Journal ArticleDOI
TL;DR: A low-rate wave form coder for speech compression is designed using techniques from universal source coding, fake process tree encoding, and linear predictive coding to yield a fidelity that compares well with the best existing adaptive-waveform coder of the same rate.
Abstract: A low-rate (about one bit per sample) waveform coder for speech compression is designed using techniques from universal source coding, fake process tree encoding, and linear predictive coding (LPC). The system does not require on-line adaptation or LPC analysis, yet it yields a fidelity that compares well with the best existing adaptive-waveform coder of the same rate.

Proceedings ArticleDOI
01 Jan 1981
TL;DR: The panel will explore the many voices of the new IC speech synthesizers, including: 'rules'-generated speech, synthesis-by-analysis and Mozer waveform encoding.
Abstract: The panel will explore the many voices of the new IC speech synthesizers, including: 'rules'-generated speech, synthesis-by-analysis and Mozer waveform encoding. Advantages of analog sampled-data filters versus digital filters, linear prediction (LPC) versus formant encoding and ways to beat the quality/ bit rate trade-offs will also be covered.

Patent
Bruce Alan Fette1
26 May 1981
TL;DR: A linear predictive coding (LPC) voice synthesizer as discussed by the authors is an integrated circuit on a single semiconductor chip, which circuit is programmed to provide the all pole lattice filter method of speech synthesis.
Abstract: A linear predictive coding (LPC) voice synthesizer formed as an integrated circuit on a single semiconductor chip, which circuit is programmed to provide the all pole lattice filter method of speech synthesis. The apparatus smoothly interpolates between correlation coefficients during the synthesis operation.

Proceedings ArticleDOI
01 Apr 1981
TL;DR: This paper describes a CMOS integrated array processor for computing the dynamic time warp algorithm which allows many popular variations including LPC and frequency domain representations of speech.
Abstract: Dynamic time warping is an established technique for time alignment and comparison of speech segments in speech recognition. This paper describes a CMOS integrated array processor for computing the dynamic time warp algorithm. It allows many popular variations including LPC and frequency domain representations of speech. High speed is obtained by extensive pipelining, parallel computation, and simultaneous matching of multiple patterns. A realistic application using 40 nine-component LPC vectors per word permits 10,000 word comparisons per second or, equivalently, real time recognition of a 10,000 word vocabulary.

Journal ArticleDOI
TL;DR: In this paper recent innovations in channel vocoders are described that pertain to these requirements and the conclusion is drawn that the channel vocoder has significant potential for fulfilling the above needs.
Abstract: Recent work in the field of speech digitization has led to the identification of a variety of new requirements for the speech terminal. Among these are the need for variable rate, robustness, very low rates, low cost, weight, and power. Coupled with these needs is the ever-present desire for higher quality systems. In this paper recent innovations in channel vocoders are described that pertain to these requirements and the conclusion is drawn that the channel vocoder has significant potential for fulfilling the above needs.

Patent
10 Aug 1981
TL;DR: In this paper, a signal exhibiting redundancy is transmitted in a reduced bandwidth by performing a linear interpolation over a number of frames, where interpolated coefficients are tested against quantized values to see if they differ by no more than a threshold.
Abstract: A signal exhibiting redundancy, such as speech subjected to linear predictive coding, is transmitted in a reduced bandwidth by performing a linear interpolation over a number of frames. Interpolated coefficients are tested against quantized values to see if they differ by no more than a threshold. If they do not, only the last frame is sent and intermediate values are reconstructed by interpolation. If the interpolated values differ by more than the threshold from the quantized values, the number of frames for interpolation is reduced and the interpolation is repeated. This is continued until either interpolation is successful or else the next consecutive frame is sent. The required bandwidth for transmission can be varied by varying the threshold, the maximum number of frames for interpolation, the number of LPC coefficients, or a combination of these.

Proceedings ArticleDOI
C. Un1, K. Choi
01 Mar 1981
TL;DR: A robust linear predictive coding (LPC) method that can be used in noisy as well as quiet environment has been studied and a performance improvement of about 5 dB can be gained by using this method.
Abstract: A robust linear predictive coding (LPC) method that can be used in noisy as well as quiet environment has been studied. In this method, noise autocorrelation coefficients are first obtained and updated during non-speech periods. Then, the effect of additive noise in the input speech is removed by subtracting values of the noise autocorrelation coefficients from those of autocorrelation coefficients of corrupted speech in the course of computation of linear prediction coefficients. When signal-to-noise ratio of the input speech ranges from 0 to 10 dB, a performance improvement of about 5 dB can be gained by using this method. The proposed method is computationally very efficient and requires a small storage area.

PatentDOI
TL;DR: A speech synthesis system with a linear predictive filter as discussed by the authors utilizes coded reflection coefficients to produce digital signals representative of human speech, and a variable interpolation circuit within the linear predictive filters allows a variable number of interpolation steps to be calculated between successive values of reflection coefficients.
Abstract: A speech synthesis system with a linear predictive filter. The linear predictive filter utilizes coded reflection coefficients to produce digital signals representative of human speech. A variable interpolation circuit within the linear predictive filter allows a variable number of interpolation steps to be calculated between successive values of reflection coefficients. Additionally, a user programmable option allows the user to select a linear, nonlinear, or a combination form of interpolation based on stored scale data. The synthesizer output circuit also functions as a digital-to-analog converter.

PatentDOI
Leon W. Cox1
TL;DR: The speech synthesizer is capable of electronically synthesizing human speech from coded speech data including parameters as stored either in a solid state memory on a permanent basis or alternatively as temporarily stored in another memory, wherein the codedspeech data is made available from an external source, such as a central processing unit of a commercial or home-type computer, as coupled to the speech synthesizers.
Abstract: Speech synthesizer and a computer system having the speech synthesizer operably coupled thereto to provide speech capability for the computer system. The speech synthesizer is capable of electronically synthesizing human speech from coded speech data including parameters as stored either in a solid state memory on a permanent basis or alternatively as temporarily stored in another memory, wherein the coded speech data is made available from an external source, such as a central processing unit of a commercial or home-type computer, as coupled to the speech synthesizer. The speech synthesizer may be in the form of a speech module including a speech synthesizer processor for converting coded speech data into digital speech signals in combination with a mode selector which selectively applies either the coded speech data from a read-only-memory within the speech module or the coded speech data obtained from the external source to the speech synthesizer processor in response to a control signal provided by the external source for determining which of the two alternative operating modes will be employed in a given instance. The computer system is provided with speech capability by including the speech module as a component thereof in combination with a computer input device, the central processing unit of the computer, and an audio amplifier and speaker connected to a digital-to-analog converter of the speech module so as to generate audible human speech from the digital speech signals provided by the speech synthesizer processor of the speech module.

Journal ArticleDOI
TL;DR: A low cost voice response system is presented, which performs text-to-speech conversion of any English text, built around an LPC synthesizer chip and a microprocessor.
Abstract: A low cost voice response system is presented, which performs text-to-speech conversion of any English text. The system is built around an LPC synthesizer chip and a microprocessor. Text-to-allophone rules are used to convert an input string of ASCII characters into allophonic codes. LPC parameters are then drawn from an allophone library, which takes very little storage space, and concatenated using a fast and simple algorithm to produce natural sounding speech.

Proceedings ArticleDOI
01 Apr 1981
TL;DR: Vector Quantization is applied to modify a 2400 bps LPC vocoder to operate at 800 bps, while retaining acceptable intelligibility and naturalness of quality, and several new properties are presented.
Abstract: Vector Quantization is applied to modify a 2400 bps LPC vocoder to operate at 800 bps, while retaining acceptable intelligibility and naturalness of quality. The design of this speech compression system is discussed and compared to other very low bit rate vocoders. Advantages of vector quantization over a scalar technique are examined in detail, and several new properties are presented.

Proceedings ArticleDOI
01 Apr 1981
TL;DR: The Navy has developed a Multirate Processor (MRP), which generates digitized speech at 2.4, 9.6, and 16 kb/s by the linear predictive coding principle, and under various operational conditions, the Diagnostic Rhyme Test (DRT) scores of the MRP compare favorably to the DRT scores of an existing 16 kb /s rate Continuously Variable Slope Delta (CVSD) encoder.
Abstract: The Navy has developed a Multirate Processor (MRP) which generates digitized speech at 2.4, 9.6, and 16 kb/s by the linear predictive coding principle. This multirate capability is achieved by embedding the 2.4 kb/s data in the 9.6 kb/s data stream and the 9.6 kb/s data in the 16 kb/s data stream. Conversion between the rates is accomplished by truncating a certain portion of the bits from the higher-data rate signal or appending extra bits to the lower-data rate signal. The MRP mediumband (9.6 kb/s or 16 kb/s) mode is a baseband residual excited LPC in which the baseband residual is transmitted in terms of Fourier spectral components. Under various operational conditions, the Diagnostic Rhyme Test (DRT) scores for the 9.6 kb/s rate of the MRP compare favorably to the DRT scores of an existing 16 kb/s rate Continuously Variable Slope Delta (CVSD) encoder.

PatentDOI
TL;DR: In this paper, an input signal representative of the spoken utterance is passed through a clipper to generate a clipped input signal, and a sampler generates a plurality of discrete binary values, each discrete binary value corresponding to a sample value of the clipped signal.
Abstract: The present invention relates to a speech recognition system and the method therefor, which analyzes a sampled clipped speech signal for identifying a spoken utterance. An input signal representative of the spoken utterance is passed through a clipper to generate a clipped input signal. A sampler generates a plurality of discrete binary values, each discrete binary value corresponding to a sample value of the clipped input signal. A processor then analyzes the plurality of sample values thereby identifying the spoken utterance. Analysis includes determining linear prediction coefficients of the autocorrelation function of speech utterences.

Journal ArticleDOI
TL;DR: The results showed that system performance was best with an analysis parameter set equivalent to what is currently being used in the computer simulations, and that variations in parameter values that reduced computation also degraded performance, whereas variations in parameters that increased computation did not lead to improved performance.
Abstract: For practical hardware implementations of isolated-word recognition systems, it is important to understand how the feature set chosen for recognition affects the overall performance of the recognizer. In particular, we would like to determine whether hardware implementations could be simplified by reducing computation and memory requirements without significantly degrading overall system performance. The effects of system bandwidth (both in training and testing the recognizer) on the performance must also be considered since the conditions under which the system is used may be different than those under which it was trained. Finally, we must take account of the effects of finite word-length implementations, on both the computation of features and of distances, for the system to properly operate. In this paper we present the results of a study to determine the effects on recognition error rate of varying the basic analysis parameters of a linear predictive coding (LPC) model of speech. The results showed that system performance was best with an analysis parameter set equivalent to what is currently being used in the computer simulations, and that variations in parameter values that reduced computation also degraded performance, whereas variations in parameter values that increased computation did not lead to improved performance.

Proceedings ArticleDOI
01 Apr 1981
TL;DR: The application of a new source coding scheme called Modulo-PCM with side information to speech coding is studied and the performance characteristics of an adaptive and a non-adaptive scheme are evaluated using two speech utterances.
Abstract: The application of a new source coding scheme called Modulo-PCM with side information to speech coding is studied. The performance characteristics of an adaptive and a non-adaptive scheme are evaluated using two speech utterances.

Proceedings ArticleDOI
01 Apr 1981
TL;DR: This paper presents a recursive pole-zero lattice form for speech analysis based on the recently developed square-root normalized lattice forms, and a comparison between the performance of AR and ARMA lattice filters is presented.
Abstract: All-zero filters in tapped-delay-line or lattice implementations are commonly used for speech deconvolution. The analysis techniques are mostly non-recursive, operating on a block of data at a time. In this paper we present a recursive pole-zero lattice form for speech analysis. The algorithm is based on the recently developed square-root normalized lattice forms. A comparison between the performance of AR and ARMA lattice filters is presented, using synthetic data. Preliminary results using speech data are also discussed.

Proceedings ArticleDOI
B. Atal1, J. Remde
01 Apr 1981
TL;DR: A split-band adaptive predictive coding system for digital transmission of speech signals that division of the prediction residue signal into many frequency bands results in more accurate pitch prediction - particularly, at low frequencies.
Abstract: We describe a split-band adaptive predictive coding system for digital transmission of speech signals. In this system, the prediction residue signal obtained after spectral prediction is filtered into 2 or more frequency bands. Each of the filtered signals is reduced further by pitch prediction and is quantized by a 15-level noise feedback quantizer. The input to the quantizer is severely center-clipped to produce a quantized signal with low entropy. The division of the prediction residue signal into many frequency bands results in more accurate pitch prediction - particularly, at low frequencies. The split-band system uses separate quantizers for each frequency band. The step size of the quantizer and the center-clipping threshold can thus be adjusted to optimize speech quality in each band.

Proceedings ArticleDOI
01 Apr 1981
TL;DR: This paper discusses a form of non-linear prediction, namely, the prediction of the phase of speech signals, based upon a new treatment of the classical speech production model within a short-time analysis/synthesis framework.
Abstract: Prediction plays a key role in many signal processing applications. Linear Prediction has, in particular, been extremely useful to the development of digital speech processing techniques and applications. There is however a growing need for improved forms of prediction. We discuss, in this paper, a form of non-linear prediction, namely, the prediction of the phase of speech signals. This study is conducted within a short-time analysis/synthesis framework and is based upon a new treatment of the classical speech production model. Experimental data are presented confirming the theoretical results. Finally the use of phase prediction to low-bit rate, high-quality coding applications is discussed.

Proceedings ArticleDOI
R. Cox1, D. Malah
01 Apr 1981
TL;DR: The recently developed time domain harmonic sealing (TDHS) algorithm has been found to be the basis for an effective enhancement technique and a class of windows for its implementation is established.
Abstract: Periodically structured noise is noise which occurs randomly but with a fixed or slowly varying period. The noise periodicity is usually due to some underlying process, such as block processing of the speech where discontinuities between successive blocks result. This type of noise permeates the entire speech spectrum and is not removable by standard filtering techniques. The recently developed time domain harmonic sealing (TDHS) algorithm has been found to be the basis for an effective enhancement technique. In this paper we discuss the underlying theory of this technique and establish a class of windows for its implementation. As an example the frame rate noise of adaptive transform coding was perceptually reduced using this technique. Results from a subjective testing experiment using ATC coded speech with bit rates of 7.2 to 16 Kb/s indicated an improvement in quality equivalent to an increase in code rate of 2.4 to 3 Kb/s for speech originally coded at 7.2 to 12 Kb/s.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a method of improving the quality of noise-de-graded synthesized speech by using noise reduction methods which are based on the previously proposed comb filtering in the frequency region for the case that the noisy speech signal is processed by the PARCOR analysis-synthesis method.
Abstract: Speech analysis-synthesis is an effective method for low bit-rate speech coding. However, it has been pointed out that system performance degrades as noise is added to the input speech. In this paper, we describe a method of improving the quality of noise-de-graded synthesized speech by using noise reduction methods which are based on the previously proposed comb filtering in the frequency region for the case that the noisy speech signal is processed by the PARCOR analysis-synthesis method. Quality degradation of synthesized speech due to additive noise is caused mainly by an increase in spectral distortion. In the speech analysis method proposed in this paper, the basic frequency (pitch) of the speech is stably extracted from noisy speech and spectral distortion is eliminated by obtaining the spectral envelope parameter after reducing the noise from the input speech based on this pitch information. Furthermore, the proposed method prevents degradation of the quality of the synthesized speech by using the extracted pitch as the pitch parameter of the analysis-synthesis system. As a result of auditory experiments, it is shown that the subjective speech quality and intelligibility of the synthesized speech are improved by the proposed method. In addition, we obtain some clues to the configuration of a speech analysis-synthesis system which is resistant to additive noise.