scispace - formally typeset
Search or ask a question

Showing papers on "Linear predictive coding published in 1980"


Journal ArticleDOI
TL;DR: An efficient and intuitive algorithm is presented for the design of vector quantizers based either on a known probabilistic model or on a long training sequence of data.
Abstract: An efficient and intuitive algorithm is presented for the design of vector quantizers based either on a known probabilistic model or on a long training sequence of data. The basic properties of the algorithm are discussed and demonstrated by examples. Quite general distortion measures and long blocklengths are allowed, as exemplified by the design of parameter vector quantizers of ten-dimensional vectors arising in Linear Predictive Coded (LPC) speech compression with a complicated distortion measure arising in LPC analysis that does not depend only on the error vector.

7,935 citations


Journal ArticleDOI
01 Oct 1980

1,565 citations


Journal ArticleDOI
TL;DR: In this paper, a spectral decomposition of a frame of noisy speech is used to attenuate a particular spectral line depending on how much the measured speech plus noise power exceeds an estimate of the background noise.
Abstract: One way of enhancing speech in an additive acoustic noise environment is to perform a spectral decomposition of a frame of noisy speech and to attenuate a particular spectral line depending on how much the measured speech plus noise power exceeds an estimate of the background noise. Using a two-state model for the speech event (speech absent or speech present) and using the maximum likelihood estimator of the magnitude of the speech spectrum results in a new class of suppression curves which permits a tradeoff of noise suppression against speech distortion. The algorithm has been implemented in real time in the time domain, exploiting the structure of the channel vocoder. Extensive testing has shown that the noise can be made imperceptible by proper choice of the suppression factor.

854 citations


Journal ArticleDOI
TL;DR: The vector quantizing approach is shown to be a mathematically and computationally tractable method which builds upon knowledge obtained in linear prediction analysis studies and is introduced in a nonrigorous form.
Abstract: With rare exception, all presently available narrow-band speech coding systems implement scalar quantization (independent quantization) of the transmission parameters (such as reflection coefficients or transformed reflection coefficients in LPC systems). This paper presents a new approach called vector quantization. For very low data rates, realistic experiments have shown that vector quantization can achieve a given level of average distortion with 15 to 20 fewer bits/frame than that required for the optimized scalar quantizing approaches presently in use. The vector quantizing approach is shown to be a mathematically and computationally tractable method which builds upon knowledge obtained in linear prediction analysis studies. This paper introduces the theory in a nonrigorous form, along with practical results to date and an extensive list of research topics for this new area of speech coding.

754 citations


Proceedings ArticleDOI
B. Atal1, M. Schroeder
01 Apr 1980
TL;DR: This method of quantization not only improves the speech quality by accurate quantization of the predicted residual when its amplitude is large but also allows encoding of the prediction residual at bit rates below 1 bit/sample.
Abstract: Adaptive predictive coding of speech signals at bit rates lower than 10 kbits/sec often requires the use of 2-level (1 bit) quantization of the samples of the prediction residual. Such a coarse quantization of the prediction residual can produce audible quantizing noise in the reproduced speech signal at the receiver. This paper describes a new method of quantization for improving the speech quality. The improvement is obtained by center clipping the prediction residual and by fine quantization of the high-amplitude portions of the prediction residual. The threshold of center clipping is adjusted to provide encoding of the prediction residual at a specified bit rate. This method of quantization not only improves the speech quality by accurate quantization of the prediction residual when its amplitude is large but also allows encoding of the prediction residual at bit rates below 1 bit/sample.

113 citations


Proceedings ArticleDOI
09 Apr 1980
TL;DR: For very low data rates, realistic experiments have shown that vector quantization can achieve a given level of average distortion with fifteen to twenty fewer bits per frame than that required for optimized scalar quantizing approachs presently in use.
Abstract: With rare exception, all presently available narrowband speech coding systems implement scalar quantization (independent quantization) of the transmission parameters (such as reflection coefficients or transformed reflection coefficients in LPC systems). In this paper a new approach called Vector Quantizatlon is presented. For very low data rates, realistic experiments have shown that vector quantization can achieve a given level of average distortion with fifteen to twenty fewer bits per frame than that required for optimized scalar quantizing approachs presently in use.

107 citations


DOI
01 Feb 1980
TL;DR: In this article, the authors present a full specification of all the essential features of the JSRU vocoder configuration, with comments on the reasons for the design decisions and reference to supporting research where appropriate.
Abstract: During the period from 1956 to 1966 the UK Government's Joint Speech Research Unit was conducting research into channel vocoders, culminating in a laboratory-built design suitable for evaluation by potential users over digital transmission networks at 2400 bit/s. The success of the basic vocoder design was such that it has since been engineered in various forms for widespread operational use, using different technologies as they have evolved. In view of the JSRU vocoder's continued competitiveness with other narrow-band speech coding techniques, such as linear predictive coding, this paper has been written to give a full specification of all the essential features of the vocoder configuration, with comments on the reasons for the design decisions and reference to supporting research where appropriate. The two most important factors contributing to this vocoder's successful performance are the use of narrow-band single-resonant circuits for the synthesis filters and the use of differential coding between channels in the digitisation process.

77 citations


Journal ArticleDOI
TL;DR: In this paper, the autocorrelation LPC analysis of speech in additive noise is studied and the beneficial effects of proper preemphasis are reaffirmed in terms of decreased numerical error as well as decreased LPC order needed for a good spectral fit.
Abstract: A study of the autocorrelation LPC analysis of speech in additive noise is presented. In the noise-free case it is shown that finite word length implementation of the analysis may produce stable but poor spectral estimates. The beneficial effects of proper preemphasis are reaffirmed in terms of decreased numerical error as well as decreased LPC order needed for a good spectral fit. For the ease of noisy input speech the conditions for severe distortion of the spectral estimate are presented. A proper LPC spectral analysis of speech in additive noise is shown to require a higher order fit than currently used, a more precise implementation, and a more accurate parameter quantization for transmission.

73 citations


Journal ArticleDOI
TL;DR: A signal model based more directly upon the phsyics of of speech generation is proposed and implemented and parametric control of the synthesis model is implemented by an adaptive procedure that minimizes the spectral difference between a human speech input and the synthetic output of the model.
Abstract: A traditional model of the speech signal has provided the underpinning of vocoder technology since the inception of analysis/synthesis telephony. The model is a first‐order approximation to human speech generation in which the source of vocal sound and the resonant acoustic system are treated as linear, separable elements. This source‐system model cannot properly account for a number of acoustic factors now known to exist in speech generation. We propose and implement here a signal model based more directly upon the phsyics of of speech generation. We also implement parametric control of the synthesis model by an adaptive procedure that minimizes the spectral difference between a human speech input and the synthetic output of the model.The adapted parameters constitute a low bit‐rate representation of the input human speech. We test a preliminary form of the system by computer simulation and demonstrate that in simple inital trials the signal model is able to adapt in a realistic manner.

62 citations


Journal ArticleDOI
TL;DR: In this article, the performance of the Burg method for speech analysis is compared to the autocorrelation and covariance methods and the results do not find any justification for preferring the computationally more complex Burg method.
Abstract: The performance of the Burg method for speech analysis is compared to the autocorrelation and covariance methods. The criterion of goodness is the accuracy of the spectral approximation, filter stability, windowing requirements, data frame length, and spectral resolution. A mathematical comparison is presented for the simple first-order signal. Spectral comparisons are presented for a second-order speech-like signal. Real speech synthesis using the analysis results of the autocorrelation and Burg methods are subjectively compared. The results do not find any justification for preferring the computationally more complex Burg method.

40 citations


PatentDOI
Jr. Carl J. May1
TL;DR: In this article, a speech detector uses a signal classifier to identify portions of a representation of the average magnitude of a group of signal samples indicative of either speech or noise, and a level estimator uses selectively obtained signal measures from the defined portions of the representation to provide adaptively variable decision levels.
Abstract: A speech detector uses a signal classifier (19) to identify portions of a representation of the average magnitude of a group of signal samples indicative of either speech or noise. A controller (33) in the signal classifier follows a four state sequence using appropriate time constants for signal measures in a variety of signal conditions in defining the speech and noise portions of the representation. A level estimator (21) uses selectively obtained signal measures from the defined portions of the representation to provide adaptively variable decision levels. A speech definer (16) compares the representation to a first decision level and the signal samples to a higher decision level to indicate the occurrence of speech signal activity when either decision level is exceeded. In a two way transmission arrangement, a receive trunk speech detector uses a stretcher (133) to prevent adaptation of the transmit speech detector thresholds when echo signals are present.

Journal ArticleDOI
TL;DR: This paper presents an interpretation of the log likelihood ratio measure within the theoretical framework of a waveform coder distortion model, and discusses the implications of this interpretation and how it can be applied to the formulation of better objective measures of wave form coder performance.
Abstract: The log likelihood measure has been widely used in speech research for comparing speech signals. Recently, it has been proposed as a measure for assessing the quality of coded speech. In this paper we present an interpretation of the log likelihood ratio measure within the theoretical framework of a waveform coder distortion model. We then discuss the implications of this interpretation and show how it can be applied to the formulation of better objective measures of waveform coder performance.

PatentDOI
TL;DR: In this article, a speech/silence discriminator is used on a telephone line to distinguish between periods of speech and periods of silence, where the adaptive threshold has a minimum value of -60 dBm.
Abstract: The speech/silence discriminator is used on a telephone line to distinguish between periods of speech and periods of silence. A signal derived from the speech signal is, in the speech state compared to an adaptive threshold which is a fraction of the maximum, to be eventually quantized, reached by said signal during the period of speech which is considered; the speech to silence transition being determined when the signal decreases below said threshold level, the threshold level being determined during each speech period as a function of the maximum attained during the period. The adaptive threshold has a minimum value of -60 dBm. It is also comprised of a noise level evaluation circuit which determines the threshold decision level (≧-60 dBM) for the transition from silence to speech which once reached, in the period of silence, enables the discriminator to go from the silent state to the speech state. It is useable in speech interpolation systems.

PatentDOI
TL;DR: In this paper, an improved apparatus for the linear predictive coding of human speech is presented, in which the speech is sampled through the use of analog filters, and the LPC computations are performed with respect to such samples using digital techniques.
Abstract: Improved apparatus for the linear predictive coding of human speech in which the speech is sampled through the use of analog filters and the linear predictive coding computations are performed with respect to such samples using digital techniques. The filters are MOS switched capacitor filters which can be implemented on a silicon chip together with the digital circuitry. Specific circuits for implementing two different linear predictive coding speech analysis techniques are disclosed.

Journal ArticleDOI
Chong Un1, Hyeong Gi Lee1
TL;DR: This paper presents a new method of voiced/unvoiced/ silence discrimination of speech based on the results of counting bit alternations of the bit stream from linear delta modulation of the speech signal and zero crossings of a band-pass filtered output of the decoded LDM signal.
Abstract: This paper presents a new method of voiced/unvoiced/ silence discrimination of speech. The decision algorithm is based on the results of counting bit alternations of the bit stream from linear delta modulation (LDM) of the speech signal and zero crossings of a band-pass filtered output of the decoded LDM signal. Computer simulation of the system with real speech has yielded accurate results. Economical realization of the discriminator hardware using standard integrated circuits is also considered.

Proceedings ArticleDOI
J. Olive1
01 Apr 1980
TL;DR: The scheme for synthesis employing these concatenative units, where segments obtained from natural speech were linearly concatenated, will be discussed in detail.
Abstract: In previous papers we discussed a speech synthesis by rule scheme where segments obtained from natural speech were linearly concatenated. These segments included the consonants and the transitions from consonants to vowels, vowels to vowels, and vowels to consonants. Each synthesis parameter was defined by few sets of LPC area parameters, and in the concatenative process, straight line interpolation was used to obtain the complete set of area parameters. Informal listening and some formal intelligibility testing revealed that this simplified description of the synthesis segments was not sufficient to produce the speech quality that would satisfy us. Consequently, it was decided to improve the definition of the concatenative units. This paper will discuss in detail the scheme for synthesis employing these concatenative units.

Proceedings ArticleDOI
01 Apr 1980
TL;DR: The results of this study indicate that LPC derived parameters perform better than do those derived from cepstral and spectral data.
Abstract: Four automatic speaker recognition techniques were investigated with a contain speech data base to determine their effectiveness in a text independent mode. These four techniques used the correlation of short and long term spectral averages, cepstral measurements of long term spectral averages, orthogonal linear prediction of the speech waveform, and long term average LPC reflection coefficients combined with pitch and overall power. The results of this study indicate that LPC derived parameters perform better than do those derived from cepstral and spectral data. Recognition accuracies of 95% and 93% were obtained for LPC based techniques with 13 seconds of unknown speech. The corresponding recognition accuracies for the cepstral and spectral based systems were 79% and 54% respectively.


Proceedings ArticleDOI
09 Apr 1980
TL;DR: This paper explores the intermediate solutions between fixed prediction and forward adaptative prediction in ADPCM which consists of using a finite number of preselected linear predictors of order M to select the optimum set of predictors with respect to the overall prediction gain.
Abstract: This paper explores the intermediate solutions between fixed prediction and forward adaptative prediction in ADPCM which consists of using a finite number, L, of preselected linear predictors of order M. The design problem of selecting the optimum set of predictors with respect to the overall prediction gain is formulated and an iterative procedure is described to obtain the solutions. The relative prediction-gain improvement is computed for a 3 sec. speech sample and for several values of L,M, and block size showing that \frac{1}{2} of the adaptative over fixed-prediction improvement in dB is reached with only L=4 and 2/3 with L=8 . The design problem solved by minimizing Itakura distance is shown to yield essentially identical performances. A linear discriminant property in the autocorrelation space is pointed out. Based on that property a pattern classification approach is proposed as an hardware-efficient coding algorithm.

Proceedings ArticleDOI
01 Apr 1980
TL;DR: Some of the frequency domain expressions of statistical distance measures between stationary vector Gaussian processes recently derived by the authors have been empirically verified to be very useful speech recognition and speech analysis-synthesis.
Abstract: We summarize some new frequency domain expressions of statistical distance measures between stationary vector Gaussian processes recently derived by the authors. Both time-discrete and time-continuous processes are treated. Some of the frequency domain distance measures have been empirically verified to be very useful speech recognition and speech analysis-synthesis.

Proceedings ArticleDOI
01 Apr 1980
TL;DR: A scheme for low-bit-rate transmission of speech that consists of a cascade of a codec with a post-processor capable of speech enhancement, based on a modified version of the LPC Vocoder Driven Adaptive Transform Coding algorithm.
Abstract: The purpose of this paper is to describe a scheme for low-bit-rate transmission of speech. The scheme consists of a cascade of a codec with a post-processor. The codec is based on a modified version of the LPC Vocoder Driven Adaptive Transform Coding algorithm. The Post-Processor performs a short-time Fourier analysis/synthesis at the receiver output and, by exploring the known structure of the quantization noise introduced by the codec, it is capable of speech enhancement. The performance of this scheme will be demonstrated at 9.6 kb/s and at 7.2 kb/s.

ReportDOI
31 Jul 1980
TL;DR: In this article, a speech analysis-synthesis system was developed which is capable of independent manipulation of the fundamental frequency and spectral envelope of a speech waveform, which has applications in the areas of voice modification, baseband-excited vocoders, time-scale modification, and frequency compression as an aid to the partially deaf.
Abstract: : A new speech analysis-synthesis system has been developed which is capable of independent manipulation of the fundamental frequency and spectral envelope of a speech waveform. The system deconvolves the original speech with the spectral-envelope estimate to obtain a model for the excitation. Hence, explicit pitch extraction is not required. As a consequence, the transformed speech is more natural sounding than would be the case if the excitation were modeled as a sequence of pulses. The system has applications in the areas of voice modification, baseband-excited vocoders, time-scale modification, and frequency compression as an aid to the partially deaf. (Author)

Proceedings ArticleDOI
01 Apr 1980
TL;DR: Experiments discussed here show that LPC synthesis is generally very close to natural speech in the high frequency region and that most of the degradation is in the low frequency reglon.
Abstract: Results of past studies on the quality problems of LPC speech are reviewed. The causes of the quality problems are found to lie within the basic model assumptions as well as inaccuracies in LPC analysis and errors introduced in pitch and voicing detection and parameter quantization. Experiments discussed here show that LPC synthesis is generally very close to natural speech in the high frequency region and that most of the degradation is in the low frequency reglon (approximately less than 1500 Hz).

Proceedings ArticleDOI
09 Apr 1980
TL;DR: Improvements on the classical model of speech are presented which produces speech that is significantly better than currently available systems and an efficient encoding of the prediction residuals of the two components.
Abstract: This paper presents improvements on the classical model which produces speech that is significantly better than currently available systems. The first major improvement results from treating speech as a two source phenomenon that can be separated for parallel but independent analysis/ synthesis. This two component decomposition is accomplished by making use of the quasi-periodic nature of 'voiced' speech. The second major improvement in bit compression and robustness of operation results from an efficient encoding of the prediction residuals of the two components. The key step is to encode the residual of the periodic component by picking out and transmitting the essential information for only one cycle (pitch period) of the residual.

Proceedings ArticleDOI
Chong Un1, Won Sung
01 Apr 1980
TL;DR: An improved 4800 bps LPC vocoder system that virtually eliminates the buzzy effect from synthetic speech and the vocoder speech quality is more natural than that of a conventional LPC Vocoder.
Abstract: We present an improved 4800 bps LPC vocoder system that virtually eliminates the buzzy effect from synthetic speech. Excitation signal in the new system is formed by adding high-pass filtered pitch pulses or random noise to a baseband residual signal (0 - 600 Hz) that has been coded by pitch predictive PCM. Since the baseband residual is used as a part of excitation, the system is also robust to V/UV and pitch errors. According to our informal listening tests, the synthetic speech of the new system does not have the buzzy effect. As a result the vocoder speech quality is more natural than that of a conventional LPC vocoder.

Patent
Jr. Carl J. May1
30 Oct 1980
TL;DR: In this article, a speech detector uses a signal classifier to identify portions of a representation of the average magnitude of a group of signal samples indicative of either speech or noise, and a level estimator uses selectively obtained signal measures from the defined portions of the representation to provide adaptively variable decision levels.
Abstract: A speech detector uses a signal classifier (19) to identify portions of a representation of the average magnitude of a group of signal samples indicative of either speech or noise. A controller (33) in the signal classifier follows a four state sequence using appropriate time constants for signal measures in a variety of signal conditions in defining the speech and noise portions of the representation. A level estimator (21) uses selectively obtained signal measures from the defined portions of the representation to provide adaptively variable decision levels. A speech definer (16) compares the representation to a first decision level and the signal samples to a higher decision level to indicate the occurrence of speech signal activity when either decision level is exceeded. In a two way transmission arrangement, a receive trunk speech detector uses a stretcher (133) to prevent adaptation of the transmit speech detector thresholds when echo signals are present.

Proceedings ArticleDOI
M. Macchi1
09 Apr 1980
TL;DR: Demisyllables and affixes have been found to be very promising units for concatenative speech synthesis and are being investigated for use in speech synthesis.
Abstract: Demisyllables and affixes (1,2,3) have been found to be very promising units for concatenative speech synthesis

Proceedings ArticleDOI
01 Apr 1980
TL;DR: This paper presents the results of the investigation of the various aspects of baseband LPC coders with the goal of maximizing the speech quality at a transmission bit-rate of 9.6 kb/s and for channel bit-error rates of up to 1%.
Abstract: This paper presents the results of our investigation of the various aspects of baseband LPC coders with the goal of maximizing the speech quality at a transmission bit-rate of 96 kb/s and for channel bit-error rates of up to 1% Important among these aspects are: baseband width, coding of baseband, high-frequency regeneration, and error protection of important transmission parameters The paper discusses these and other issues, presents the results of speech-quality tests conducted during the various stages of optimization, and describes the details of the optimized speech coder

Proceedings ArticleDOI
L. Siegel1
01 Apr 1980
TL;DR: The use of the SIMD (single instruction stream-multiple data stream) mode of parallelism to perform linear predictive coding analysis is explored and parallel algorithms for the autocorrelation formulation of linear prediction are presented and analyzed.
Abstract: The use of the SIMD (single instruction stream-multiple data stream) mode of parallelism to perform linear predictive coding analysis is explored. Parallel algorithms for the autocorrelation formulation of linear prediction are presented and analyzed. The algorithms are evaluated in terms of the number of arithmetic operations and interprocessor data transfers required.

Journal ArticleDOI
TL;DR: A high quality Speech synthesizer system which consists of 3 LSI chips, a speech synthesizer, a 128k bit ROM and a general purpose microprocessor has been developed, based on the recently developed PARCOR voice compression technique.
Abstract: A high quality speech synthesizer system which consists of 3 LSI chips, a speech synthesizer, a 128k bit ROM and a general purpose microprocessor has been developed. This system is based on the recently developed Partial Autocorrelation (PARCOR) voice compression technique. This system can generate high quality speech from a data rate of less than 2400 bits per second. Several new techniques are applied for this system to improve the quality of generated speech especially of the female voice. This system has many advantageous features such as speech speed control and external pitch excitation.