scispace - formally typeset
Search or ask a question

Showing papers on "Linear predictive coding published in 1978"


Book
05 Sep 1978
TL;DR: This paper presents a meta-modelling framework for digital Speech Processing for Man-Machine Communication by Voice that automates the very labor-intensive and therefore time-heavy and expensive process of encoding and decoding speech.
Abstract: 1. Introduction. 2. Fundamentals of Digital Speech Processing. 3. Digital Models for the Speech Signal. 4. Time-Domain Models for Speech Processing. 5. Digital Representation of the Speech Waveform. 6. Short-Time Fourier Analysis. 7. Homomorphic Speech Processing. 8. Linear Predictive Coding of Speech. 9. Digital Speech Processing for Man-Machine Communication by Voice.

3,103 citations


Journal ArticleDOI
TL;DR: This paper considers the estimation of speech parameters in an all-pole model when the speech has been degraded by additive background noise and develops a procedure based on maximum a posteriori (MAP) estimation techniques which is related to linear prediction analysis of speech.
Abstract: This paper considers the estimation of speech parameters in an all-pole model when the speech has been degraded by additive background noise. The procedure, based on maximum a posteriori (MAP) estimation techniques is first developed in the absence of noise and related to linear prediction analysis of speech. The modification in the presence of background noise is shown to be nonlinear. Two suboptimal procedures are suggested which have linear iterative implementations. A preliminary illustration and discussion based both on a synthetic example and real speech data are given.

590 citations


Journal ArticleDOI
TL;DR: Preliminary tests indicate that the least mean-square adaptive filtering approach for removing the deleterious effects of additive noise on the speech signal improves the perceived speech quality and increases the signal-to-noise ratio (SNR) by 7 dB in a 0 dB environment.
Abstract: A least mean-square (LMS) adaptive filtering approach has been formulated for removing the deleterious effects of additive noise on the speech signal. Unlike the classical LMS adaptive filtering scheme, the proposed method is designed to cancel out the clean speech signal. This method takes advantage of the quasi-periodic nature of the speech signal to form an estimate of the clean speech signal at time t from the value of the signal at time t minus the estimated pitch period. For additive white noise distortion, preliminary tests indicate that the method improves the perceived speech quality and increases the signal-to-noise ratio (SNR) by 7 dB in a 0 dB environment. The method has also been shown to partially remove the perceived granularity of CVSD coded speech signals and to lead to an improvement in the linear prediction analysis/synthesis of noisy speech.

207 citations


Proceedings ArticleDOI
10 Apr 1978
TL;DR: Improved speech quality is obtained a) by efficient removal of formant and pitch related redundant structure of speech before quantizing and b) by effective masking of the quantizer noise by the speech signal.
Abstract: Predictive coding methods attempt to minimize the r.m.s. error in the coded signal. However, the human ear does not perceive signal distortion on the basis of r.m.s. error regardless of its spectral shape relative to the signal spectrum. Specifically, for speech signals, the locations of the formant frequencies and their rates of change with time influence the audibility, and thus the subjective distortion of any quantizing noise. In this paper, methods for reducing the subjective distortion in predictive coders for speech siganls are described and evaluated. Improved speech quality is obtained a) by efficient removal of formant and pitch related redundant structure of speech before quantizing and b) by effective masking of the quantizer noise by the speech signal.

94 citations


PatentDOI
TL;DR: A system and method for speech recognition provides a means of printing phonemes in response to received speech signals utilizing inexpensive components and an algorithm for detecting major slope transitions of the analog speech signals.
Abstract: A system and method for speech recognition provides a means of printing phonemes in response to received speech signals utilizing inexpensive components. The speech signals are inputted into an amplifier which provides negative feedback to normalize the amplitude of the speech signals. The normalized speech signals are delta modulated at a first sampling rate to produce a corresponding first sequence of digital pulses. The negative feedback signal of the amplifier is delta modulated at a second sampling rate to produce a second sequence of digital pulses corresponding to amplitude information of the speech signals. The speech signals are filtered and utilized to produce a digital pulse corresponding to high frequency components of the speech signals having magnitudes in excess of a threshold voltage. A microprocessor contains an algorithm for detecting major slope transitions of the analog speech signals in response to the first sequence of digital signals by detecting information corresponding to presence and absence of predetermined numbers of successive slope reversals in the delta modulator producing the first sequence of digital pulses. The algorithm computes cues from the high frequency digital pulse and the second sequence of pulses. The algorithm computes a plurality of speech waveform characteristic ratios of time intervals between various slope transitions and compares the speech waveform characteristic ratios with a plurality of stored phoneme ratios representing a set of phonemes to detect matching therebetween. The order of comparing is determined on the basis of the cues and a configuration of a phoneme decision tree contained in the algorithm. When a matching occurs, a signal corresponding to the matched phoneme is produced and utilized to cause the phoneme to be printed. In one embodiment of the invention, the speech signals are produced by the earphone of a standard telephone headset.

60 citations


Journal ArticleDOI
TL;DR: A new method of digitising speech waveforms is described, based on the comparison of successive segments of the waveform with a suitably stored catalogue of possible distinct shapes.
Abstract: A new method of digitising speech waveforms is described, based on the comparison of successive segments of the waveform with a suitably stored catalogue of possible distinct shapes.

34 citations


Proceedings ArticleDOI
01 Apr 1978
TL;DR: This coding scheme, in addition to the baseband excitation concepts, takes advantage of the association of recently published digital speech processing techniques such that transversal predictive coding, splitband coding by signal decimation/interpolation and adaptive block quantization.
Abstract: This paper describes a common voice coding architecture based on a Voice Excited Predictive Coding (VEPC) scheme allowing operation at different bit rates : 9600, 7200 bps or below by simply modifying the bandwidth allocated to the coding of the baseband excitation signal. This coding scheme, in addition to the baseband excitation concepts, takes advantage of the association of recently published digital speech processing techniques such that transversal predictive coding, splitband coding by signal decimation/interpolation and adaptive block quantization. Simulations have shown that the proposed architecture allows to obtain a 'standard telephone quality' assuming a 300-3400 Hz telephone bandwidth at transmission rates below 9600 bps.

28 citations


Proceedings ArticleDOI
10 Apr 1978
TL;DR: It is demonstrated that it is possible to achieve pattern recognition classification with much less computational effort by adopting a scheme based on the concept of variable decision space, using only three features and by avoiding the time consuming linear prediction analysis.
Abstract: A pattern recognition approach for deciding whether a given segment of speech should be classified as voiced speech, unvoiced speech or silence based on a set of five measurements of the signal is given by Atal and Rabiner [1]. In this paper, we demonstrate that it is possible to achieve this classification with much less computational effort. These computational savings are mainly achieved by adopting a scheme based on the concept of variable decision space, using only three features and by avoiding the time consuming linear prediction analysis.

22 citations


15 Dec 1978
TL;DR: The usefulness of the new approach for speech modeling has been successfully established after several parameter quantization methods were considered to achieve the desired low bit rates.
Abstract: : This constitutes our final report on a research program aimed at the development of a high quality low data rate speech transmission system based on new types of speech modeling algorithms. Several such algorithms were developed and tested on simulated and real speech data. These algorithms have many desirable features including the capability of rapidly tracking time-varying model parameters. The best algorithm was used as the basis of a speech transmission system in order to test the quality of the speech models. The model parameters (reflection coefficients) together with pitch information and speech energy form a speech parameter vector to be transmitted and used to reconstruct the original speech. Several parameter quantization methods were considered to achieve the desired low bit rates. The various algorithms as well as the complete transmission system were coded and tested. Simulation results are very promising and the usefulness of our new approach for speech modeling has been successfully established. (Author)

19 citations


PatentDOI
TL;DR: In this article, a method of communicating Digital Speech Data to a speech synthesis circuit is described. But the data is stored in a memory which is coupled to the speech synthesis circuits.
Abstract: A method of communicating Digital Speech Data to a speech synthesis circuit. The data is compressed to on the order of 1000-1200 bits, per second for normal human speech. The speech synthesis circuit utilizes linear predictive coding techniques for producing high quality speech or other sounds. The data is preferably stored in a memory which is coupled to the speech synthesis circuit. The data has variable frame lengths; in the disclosed embodiment, four different frame lengths are described having frame lengths from four bits to forty-nine bits. The memory stores the variable frame length data and communicates the same to the speech synthesis circuit in response to certain control signals.

12 citations


Proceedings ArticleDOI
10 Apr 1978
TL;DR: This paper describes a method of speech coding in a high ambient noise environment and shows that the spectral envelope of speech signal is a most reliable information when the noise reduction method proposed in this paper is used.
Abstract: Preservation of both the spectral distribution and the periodicity of speech signals are essential in speech processing. This paper describes a method of speech coding in a high ambient noise environment and shows that the spectral envelope of speech signal is a most reliable information when the noise reduction method proposed in this paper is used. Also reported in this paper comparisons of several pitch extraction methods with extensive experimental data, based on which a pitch extraction method suited for noisy speech signals is proposed.

Proceedings ArticleDOI
01 Apr 1978
TL;DR: Several methods are presented for the objective speech quality evaluation of narrowband LPC vocoders, based on a framework that was proposed at the 1976 ICASSP conference, and high correlations obtained indicate the usefulness of these methods.
Abstract: Several methods are presented for the objective speech quality evaluation of narrowband LPC vocoders, based on a framework that we proposed at the 1976 ICASSP conference. In each method, the error in short-term spectral behavior between vocoded speech and the original is computed once every 10 ms. These errors are appropriately weighted and averaged over an utterance to produce a single objective score. Several short-term error measures, and time-weighting and averaging techniques are investigated. We evaluate the objective methods by correlating the resulting objective scores with formal subjective speech quality judgments. High correlations obtained indicate the usefulness of these methods.

Proceedings ArticleDOI
01 Apr 1978
TL;DR: Several techniques for reducing the effect of channel bit errors on the synthesized speech are described, which cause no measurable degradation of the LPC speech transmitted over an error-free channel and they require less than a one percent increase in computer execution time.
Abstract: The U.S. Government has developed a real-time 2400 bps Linear Predictive Coded (LPC) voice algorithm which was designed to provide maximum intelligibility and quality within the time and accuracy limitations imposed by modern high-speed minicomputers. The algorithm which resulted provides excellent intelligibility and quality when transmitted over an ideal channel. However, the speech is significantly degraded in an error environment. This paper describes several techniques for reducing the effect of channel bit errors on the synthesized speech. These techniques cause no measurable degradation of the LPC speech transmitted over an error-free channel and they require less than a one percent increase in computer execution time.

Journal ArticleDOI
TL;DR: Several linear prediction vocoder modifications and an evaluation of their effects on intelligibility are presented and lower order coefficient representation and faster analysis update during unvoiced speech improves the sustention feature with little degradation to the other features, and almost no increase to transmission rate.
Abstract: Several linear prediction vocoder modifications and an evaluation of their effects on intelligibility are presented. Diagnostic rhyme test (DRT) comparisons among 1) the fixed filter order, fixed analysis frame rate vocoder, 2) various modified vocoders, and 3) the input speech, are implemented using two speakers, seven listeners, and a selected set of word pairs reflecting six phonemic attribute contrasts. Reducing the filter order from ten to four for unvoiced speech frames analyzed at a rate of 44.4 frames/s produces no significant decrease in the scores for all of the six features tested by the DRT. Updating the unvoiced analysis frames with shorter windows at twice the frame rate (88.8 frames/s) leads to a significant improvement in the scores for the sustention feature. Lower order coefficient representation and faster analysis update during unvoiced speech improves the sustention feature with little degradation to the other features, and almost no increase to transmission rate. Results of the DRT evaluation and considerations for implementing the test within ordinary laboratory facilities are discussed.

Proceedings ArticleDOI
01 Apr 1978
TL;DR: The pitch predictor is not useful on balance and should be eliminated, and the residual should be quantized with no clipping and encoded using a variable-length code, which seems to be adequate for all speech and all conditions.
Abstract: We report on the results of research to code speech at 16 kbps under the condition that the quality of the transmitted speech be equal to that of the original. Some of the original speech had been corrupted by noise and distortions typical of long distance telephone lines. The rigorous requirements of this work led to a new outlook on adaptive predictive coding. We have found that the pitch predictor is not useful on balance and should be eliminated, and that the residual should be quantized with no clipping and encoded using a variable-length code. A single coding scheme seems to be adequate for all speech and all conditions. In addition, the adaptive predictive coding system has been modified to include a noise spectral shaping filter that effectively eliminates the perception of background granular noise.

Journal ArticleDOI
TL;DR: An all digital system, labeled PCM.RR is presented, which enables the doubling of traffic capacity of PCM links, by properly using "Adaptive Quantization and Speech Interpolation" performed by means of a "Speech Detector" that works directly on the A -law compressed digital signal.
Abstract: An all digital system, labeled PCM.RR. is presented, which enables the doubling of traffic capacity of PCM links. This is obtained, although keeping the transmission quality impairment very close to the normal PCM standards, by properly using "Adaptive Quantization" and "Speech Interpolation" performed by means of a "Speech Detector" that works directly on the A -law compressed digital signal.

Proceedings ArticleDOI
01 Apr 1978
TL;DR: Preliminary tests indicate that the proposed linear mean square adaptive filtering approach improves the perceived speech quality and increases the signal to noise ratio (SNR) by 7 db in a 0 db environment.
Abstract: A linear mean square (LMS) adaptive filtering approach has been formulated for removing the deleterious effects of additive noise on the speech signal; Unlike the classical LMS adaptive filtering scheme, the proposed method is designed to cancel out the clean true speech signal. This method takes advantage of the quasi-periodic nature of the speech signal to form an estimate of the clean speech signal at time t from the value of the signal at time t minus the estimated pitch period. For additive white noise distortion, preliminary tests indicate that the method improves the perceived speech quality and increases the signal to noise ratio (SNR) by 7 db in a 0 db environment. The method has also been preliminarily shown to remove the perceived granularity of CVSD coded speech signals and to lead to an improvement in the linear prediction analysis/synthesis of noisy speech.

Proceedings ArticleDOI
01 Apr 1978
TL;DR: A statistical correlation study between 18 objective quality measures and a data base of subjective quality measures from the Paired Acceptability Rating Method (PARM) found the measure which was found to be most effective over all systems was a gain weighted L 2 spectral distance metric.
Abstract: A statistical correlation study between 18 objective quality measures and a data base of subjective quality measures from the Paired Acceptability Rating Method (PARM) was done for nine communication systems, including waveform coders, channel vocoders, linear predictive coders, and adaptive predictive coders. The results of this study show which of the candidate objective measures are most effective in predicting the subjective results. The measure which was found to be most effective over all systems was a gain weighted L 2 spectral distance metric which had a correlation coefficient of -.83. Supported by DCA/DCEC via the RADC Post Doctoral Program.

Journal ArticleDOI
TL;DR: The Parcor analysis‐synthesis method is being applied to a wide range of speech coding from 1200 bps variable frame‐rate coding to high quality 16 kbps adaptive, predictive coding.
Abstract: Since the introduction of speech analysis—synthesis based on the maximum likelihood spectrum estimation—in 1966, we have been conducting research activities on low bit rate speech coding techniques, and their aplication to audio response and low bit rate digital speech transmission. Parcor analysis‐synthesis, demonstrated in 1969, was one of the most fundamental methods, and it has formed the basis of the present development of linear predictive coding. Recently, various kinds of techniques have been proposed to improve speech quality, such as interpolation and nonlinear quantization of parameters, spectral smoothing, etc. They have been applied in the hardware realization of a 4 CH multiplexed 2400 bps Vocoder. At present, the Parcor method is being applied to a wide range of speech coding from 1200 bps variable frame‐rate coding to high quality 16 kbps adaptive, predictive coding.

Proceedings ArticleDOI
01 Apr 1978
TL;DR: A variable rate speech encoding scheme is presented which determines linear predictive models over phonetically uniform intervals instead of using a fixed analysis frame length, the analysis interval is adjusted to span an entire steady sound or set at a minimum interval for transient sounds.
Abstract: A variable rate speech encoding scheme is presented which determines linear predictive models over phonetically uniform intervals. Instead of using a fixed analysis frame length, the analysis interval is adjusted to span an entire steady sound or set at a minimum interval for transient sounds. This scheme offers the following advantages over a fixed frame-rate scheme: for the same perceived speech quality, the bit rate can be reduced or for the same bit rate, the quality of the perceived speech can be improved.

Journal ArticleDOI
D. Friedman1
TL;DR: An estimator algorithm for the pitch of voiced speech is presented, which indicates superior immunity to added noise and to bandlimiting with loss of the fundamental component.
Abstract: An estimator algorithm for the pitch of voiced speech is presented, based on the following sequence of operations: 1) linear-prediction inverse filtering; 2) short-time spectral analysis by a bank of bandpass filters; 3) envelope extraction on the filter outputs; 4) period determination on the parallel envelopes considered as a multicomponent vector signal, using an algorithm described in a previous work. Results of a comparative evaluation indicate superior immunity to added noise and to bandlimiting with loss of the fundamental component.

Proceedings ArticleDOI
01 Apr 1978
TL;DR: Mean-square-error minimizing signal compression techniques, such as Autoregressive Analysis or Linear Predictive Coding and Principal Component or Karhunen-Loeve Analysis, can be systematically characterized in terms of canonical coordinate or generalized eigenvector procedures.
Abstract: Mean-square-error minimizing signal compression techniques, such as Autoregressive Analysis or Linear Predictive Coding and Principal Component or Karhunen-Loeve Analysis, can be systematically characterized in terms of canonical coordinate or generalized eigenvector procedures. This approach provides considerable insight into the interrelationships between a variety of seemingly different signal compression methods. The approach also provides a convenient mechanism for introducing the types of non-Euclidean error measures that are needed to adjust the signal performance optimization criteria to take into account different types of a priori statistical and dynamical information relating to both the desired signal and to various interference processes.

01 Apr 1978
TL;DR: The development of a speech processing computer facility with the ultimate goal of transmitting narrowband speech in real time over the ARPA Network and a reliable method for measuring subjective speech quality are described.
Abstract: : This report describes our work in the past three years on data compression and quality evaluation of digital speech We developed and implemented linear predictive coding (LPC) techniques with the overall objective of digitally transmitting high quality speech at the lowest possible average data rates over packet-switched communication media Major techniques reported include: covariance lattice method of linear prediction analysis, adaptive lattice methods, linear predictive spectral warping, improved quantization of LPC parameters, variable frame rate transmission of LPC parameters based on a functional perceptual model of speech, and a mixed-source model for LPC synthesizer to produce more natural-sounding speech Also, we developed a reliable method for measuring subjective speech quality This method was employed to formally demonstrate the quality improvements provided by our speech analysis/synthesis techniques as well as for studying speech quality as a function of LPC parameters As subjective procedures are generally expensive and time-consuming, we developed and tested several objective procedures for speech quality evaluation The results from these objective procedures were found to be highly correlated to the corresponding subjective quality judgments Another highlight of our work is the development of a speech processing computer facility with the ultimate goal of transmitting narrowband speech in real time over the ARPA Network

Proceedings ArticleDOI
M. Baumwolspiner1
01 Apr 1978
TL;DR: The 'waveform synthesis' technique is particularly well suited for microprocessor implementation and as shown in the paper two D-A converters in conjunction with a standard microprocessor and associated ROM, RAM and I/O can be used to implement this technique.
Abstract: This paper presents a time domain technique for the generation of speech which offers significant advantages over current formant synthesis and linear predictive coder (LPC) techniques. A set of basis functions in conjunction with a time-compression (and expansion) operation is shown to span the parameter space of the vocal tract model. The relationship between these basis functions and the formant synthesis parameters is derived and graphically illustrated. The 'waveform synthesis' technique is particularly well suited for microprocessor implementation and as shown in the paper two D-A converters in conjunction with a standard microprocessor and associated ROM, RAM and I/O can be used to implement this technique.

Proceedings ArticleDOI
10 Apr 1978
TL;DR: A method for LPC analysis in a transformed domain (LPCTD) has been developed theoretically and studied experimentally in the Walsh-Hadamard domain (LPCWHD) for low-bit- rate coding of speech signals.
Abstract: A method for LPC analysis in a transformed domain (LPCTD) has been developed theoretically and studied experimentally in the Walsh-Hadamard domain (LPCWHD) for low-bit- rate coding of speech signals . Speech signals in the Walsh-Hadamard domain have been modelled by their largest variance coefficients and a few prediction coefficients which represent the remaining coefficients. Determination of the prediction coefficients has been based on the correlation between the spectral coefficients. Intelligible speech at bit-rates of 8 kb/s and 4 kb/s was achieved when 16 and 64 point Walsh-Hadamard transforms were used, respectively. At the latter bit-rate the quality was significantly improved when unvoiced sounds were coded seperately by their largest variance coefficients. The main advantage of LPCWHD system is its simplicity which can lead to a far less complex implementation than that of vocoder systems.


Proceedings ArticleDOI
01 Apr 1978
TL;DR: The problem of estimating the pitch period of a speech waveform contaminated by acoustically coupled background noise is formulated to include the properties of the spectral envelope by postulating a state-variable model for the speech generation process using the maximum likelihood estimation technique.
Abstract: The problem of estimating the pitch period of a speech waveform contaminated by acoustically coupled background noise is formulated to include the properties of the spectral envelope by postulating a state-variable model for the speech generation process. Applying the maximum likelihood estimation technique, the optimum processor uses a Kalman filter preprocessor to flatten the spectrum. The resulting signal is then passed through a bank of comb filters and the optimum pitch corresponds to the comb filter for which the output energy is smallest. The Kalman prefilter reduces to an LPC filter only when the speech is generated by an all-pole process and the signal-to-noise ratio is large. For the low signal-to-noise ratio case, a parallel formant speech generation model is more likely to lead to practical numerical algorithms for estimating the spectral coefficients.

Proceedings ArticleDOI
01 Apr 1978
TL;DR: A procedure that reduces the spectral distortion in LPC encoded speech preprocessed by a CVSD coder by low-pass filtering the CVSD speech, on a formant adaptive basis, and narrowing the bandwidths of the primary formants more closely resembles the original unprocessed signal produces a higher quality CVSD/LPC signal than previously realized.
Abstract: This paper describes a procedure that reduces the spectral distortion in LPC encoded speech preprocessed by a CVSD coder. In this type of tandem configuration (wide-band/narrowband), the CVSD process introduces extraneous wideband noise and a general broadening of the formant bandwidths. When coupled with the formant distortion introduced by the LPC process, the tandem speech appears buzzy, muffled, and of lower quality than either system considered alone. By low-pass filtering the CVSD speech, on a formant adaptive basis, and narrowing the bandwidths of the primary formants, F1 and F2, the input signal to the LPC synthesizer more closely resembles the original unprocessed signal. This spectral enhancement procedure produces a higher quality CVSD/LPC signal than previously realized.

Proceedings ArticleDOI
01 Apr 1978
TL;DR: It is shown that area parameters derived from linear prediction analysis can be linearly interpolated between dyad boundaries with very little distortion in the resultant synthesized speech.
Abstract: Recent work of Olive and Spickenagel has shown that pseudo-area parameters used for LPC synthesis can be linearly interpolated between dyad boundaries without producing excessive distortion in synthetic speech. This study investigates whether such interpolation can be done equally successfully on the power spectrum of the speech waveform. The spectrum is of special interest because speech can be synthesized in real time from spectral parameters on readily available programmable digital filters. Our results show that the distortion introduced by dyadic interpolation of spectrum is perceptually significant but it can be reduced considerably by using an additional point within the dyad boundaries for interpolation. The reasons for good quality of speech synthesized from dyadically-interpolated area parameters were also investigated. It was found that formant frequency movements are reproduced fairly accurately after dyadic interpolation. Formant bandwidths however are not reproduced accurately but the bandwidth errors are not as important subjectively.

01 Dec 1978
TL;DR: In this thesis, linear predictive coding is used to produce a set of coefficients for the characteristic polynomial of sucessive 25 msec, segments of the voice tract, in the z-domain, to encode speech waveforms at low data rates.
Abstract: : Digital encoding of speech to allow more efficient transmission at low data rates involves the decomposition of the speech waveform into various parameters which are related to the physical structure of the speech production process. In this thesis, linear predictive coding is used to produce a set of coefficients for the characteristic polynomial of sucessive 25 msec. segments of the voice tract, in the z-domain. The location of the poles in the z-plane and the excitation pitch period are then shifted and the signal reformulated to cause changes of the overall frequency characteristics of the speech waveforms, while maintaining the perceived sounds and information content. The resulting audio tapes confirm the theory and conjectures of the thesis. (Author)