scispace - formally typeset
Search or ask a question

Showing papers on "Linear predictive coding published in 1979"


Journal ArticleDOI
S. Boll1
TL;DR: A stand-alone noise suppression algorithm that resynthesizes a speech waveform and can be used as a pre-processor to narrow-band voice communications systems, speech recognition systems, or speaker authentication systems.
Abstract: A stand-alone noise suppression algorithm is presented for reducing the spectral effects of acoustically added noise in speech. Effective performance of digital speech processors operating in practical environments may require suppression of noise from the digital wave-form. Spectral subtraction offers a computationally efficient, processor-independent approach to effective digital speech analysis. The method, requiring about the same computation as high-speed convolution, suppresses stationary noise from speech by subtracting the spectral noise bias calculated during nonspeech activity. Secondary procedures are then applied to attenuate the residual noise left after subtraction. Since the algorithm resynthesizes a speech waveform, it can be used as a pre-processor to narrow-band voice communications systems, speech recognition systems, or speaker authentication systems.

4,862 citations


Journal ArticleDOI
TL;DR: Improved speech quality is obtained by efficient removal of formant and pitch-related redundant structure of speech before quantizing, and by effective masking of the quantizer noise by the speech signal.
Abstract: Predictive coding methods attempt to minimize the rms error in the coded signal. However, the human ear does not perceive signal distortion on the basis of rms error, regardless of its spectral shape relative to the signal spectrum. In designing a coder for speech signals, it is necessary to consider the spectrum of the quantization noise and its relation to the speech spectrum. The theory of auditory masking suggests that noise in the formant regions would be partially or totally masked by the speech signal. Thus, a large part of the perceived noise in a coder comes from frequency regions where the signal level is low. In this paper, methods for reducing the subjective distortion in predictive coders for speech signals are described and evaluated. Improved speech quality is obtained: 1) by efficient removal of formant and pitch-related redundant structure of speech before quantizing, and 2) by effective masking of the quantizer noise by the speech signal.

376 citations


Journal ArticleDOI
TL;DR: Based on a linear model of speech production, it is shown that both the moment of glottal closure and opening can be determined from the normalized total squared error with proper choices of analysis window length and filter order.
Abstract: Covariance analysis as a least squares approach for accurately performing glottal inverse filtering from the acoustic speech waveform is discussed. Best results are obtained by situating the analysis window within a stable closed glottis interval. Based on a linear model of speech production, it is shown that both the moment of glottal closure and opening can be determined from the normalized total squared error with proper choices of analysis window length and filter order. Results from actual speech are presented to illustrate the technique.

347 citations


Proceedings ArticleDOI
02 Apr 1979
TL;DR: It is shown that the degree of rectification does not affect the output speech, and that the high-frequency noise source may be eliminated with proper processing, and a new type of HFR based on spectral duplication of the baseband is introduced.
Abstract: The traditional method of high-frequency regeneration (HFR) of the excitation signal in baseband coders has been to rectify the transmitted baseband, followed by spectral flattening. In addition, a noise source is added at high frequencies to compensate for lack of energy during certain sounds. In this paper, we reexamine the whole HFR process. We show that the degree of rectification does not affect the output speech, and that, with proper processing, the high-frequency noise source may be eliminated. We introduce a new type of HFR based on spectral duplication of the baseband. Two types of spectral duplication are presented: spectral folding and spectral translation. Finally, in order to eliminate the problem of breaking the harmonic structure due to spectral duplication, we propose a pitch-adaptive spectral duplication scheme in the frequency domain by using adaptive transform coding to code the baseband.

198 citations


Journal ArticleDOI
TL;DR: Research to code speech at 16 kbit/s with the goal of having the quality of the coded speech be equal to that of the original is reported, finding that the pitch predictor is not cost-effective on balance and may be eliminated.
Abstract: We report on research to code speech at 16 kbit/s with the goal of having the quality of the coded speech be equal to that of the original. Some of the original speech had been corrupted by noise and distortions typical of long-distance telephone lines. The basic structure chosen for our system was adaptive predictive coding. However, the rigorous requirements of this work led to a new outlook on the different aspects of adaptive predictive coding. We have found that the pitch predictor is not cost-effective on balance and may be eliminated. Solutions are presented to deal with the two types of quantization noise: clipping and granular noise. The clipping problem is completely eliminated by allowing the number of quantizer levels to increase indefinitely. An appropriate self-synchronizing variable-length code is proposed to minimize the average data rate; the coding scheme seems to be adequate for all speech and all conditions tested. The granular noise problem is treated by modifying the predictive coding system in a novel manner to include an adaptive noise spectral shaping filter. A design for such a filter is proposed that effectively eliminates the perception of granular noise.

99 citations


Journal ArticleDOI
TL;DR: The speech synthesis from concept system converts an input concept into speech by using a transformational grammar to generate a well‐formed English sentence and a word concatenation synthesizer to generate the actual speech output.
Abstract: A synthesis method, called speech synthesis from concept, is described which has been designed specifically for providing speech output from information systems. It differs from conventional techniques in that data is passed from the information system to the speech synthesis system, not in the form of text or phonetic transcription, but in the form of an abstract structure called an input concept. The speech synthesis from concept system converts an input concept into speech by using a transformational grammar to generate a well‐formed English sentence and a word concatenation synthesizer to generate the actual speech output. The ’’top down’’ nature of this process reduces the computation required within the information system and enables high‐quality speech to be produced.

69 citations


PatentDOI
TL;DR: In this article, an adaptive filter is proposed to combine the quantizing error signal, the formant related prediction parameter signals and the difference signal to concentrate the quantising error noise in spectral peaks corresponding to the time-varying formant portions of the speech spectrum so that quantizing noise is masked by the speech signal formants.
Abstract: A predictive speech signal processor features an adaptive filter in a feedback network around the quantizer. The adaptive filter essentially combines the quantizing error signal, the formant related prediction parameter signals and the difference signal to concentrate the quantizing error noise in spectral peaks corresponding to the time-varying formant portions of the speech spectrum so that the quantizing noise is masked by the speech signal formants.

49 citations


PatentDOI
Bishnu S. Atal1
TL;DR: In this paper, a speech signal is partitioned into intervals, and a set of coded prediction parameter signals, pitch period and voicing signals, and signals corresponding to the spectrum of the prediction error signal are produced.
Abstract: In a speech processing arrangement for synthesizing more natural sounding speech, a speech signal is partitioned into intervals. For each interval, a set of coded prediction parameter signals, pitch period and voicing signals, and a set of signals corresponding to the spectrum of the prediction error signal are produced. A replica of the speech signal is generated responsive to the coded pitch period and voicing signals as modified by the coded prediction parameter signals. The pitch period and voicing signals are shaped responsive to the prediction error spectral signals to compensate for errors in the predictive parameter signals whereby the speech replica is natural sounding.

48 citations



Journal ArticleDOI
TL;DR: The multipath tree-encoding of speech at 8 kbits/s is investigated, and coding results for a stationary speech-like source are found to agree well with rate-distortion theoretic ideas, and when applied to speech, tree coding at 8000 bits/s yielded frequency-weighted SNR's of 15-20 dB.
Abstract: The multipath tree-encoding of speech at 8 kbits/s is investigated. Tree coding proceeds along the lines of Anderson, et al, but at this lower bit rate, frequency weighting of the error process and adaptation of the coding process are found to be beneficial. Coding results for a stationary speech-like source are found to agree well with rate-distortion theoretic ideas, and when applied to speech, tree coding at 8000 bits/s yielded frequency-weighted SNR's of 15-20 dB.

44 citations


Proceedings ArticleDOI
B. Atal1, N. David
01 Apr 1979
TL;DR: A modified analysis-synthesis procedure which, although relying on the basic LPC technique for analysis and synthesis, avoids spectral amplitude and phase distortions introduced by these techniques.
Abstract: In speech analysis and synthesis based on linear prediction, it is a common assumption that predictor coeffcients contain all the necessary spectral and phase information for accurate synthesis of the speech signal. However, even under the best circumstances, the synthetic speech sounds unnatural to the critical listener. Subjective tests reveal that spectral errors introduced by the linear prediction analysis techniques are a major source of unnatural sound quality in synthetic speech. This paper describes a modified analysis-synthesis procedure which, although relying on the basic LPC technique for analysis and synthesis, avoids spectral amplitude and phase distortions introduced by these techniques. In new method, proper reproduction of speech spectrum at the receiver is ensured by transmitting the short-time spectrum of prediction residual to the receiver.

Proceedings ArticleDOI
01 Apr 1979
TL;DR: This paper describes a unique design that attacks two problem areas of LPC: noise suppression input level control and real time simulation/ test.
Abstract: This paper describes a unique design that attacks two problem areas of LPC: noise suppression input level control and real time simulation/ test The noise level design uses algorithms to digitally process speech data before input to the LPC algorithm processor The LPC processor described in the paper is based on a microprocessor design conceived specifically for speech The noise suppression and level control algorithms are performed in a separate front end processor that detects noise patterns and deletes them from the normal voice input The operational hardware system is shown to the block diagram level as well as the particular simulation/test scheme Test results are also described in this paper

Proceedings ArticleDOI
01 Apr 1979
TL;DR: A new distance measure based on the derivative of linear prediction (LP) phase spectrum is proposed for comparison of speech spectra and the advantages and an efficient method of computing it are discussed.
Abstract: A new distance measure based on the derivative of linear prediction (LP) phase spectrum is proposed for comparison of speech spectra. Relationships among several distance measures based on the linear prediction coefficients (LPCs) are discussed. The advantages of the new measure and an efficient method of computing it are also discussed.

PatentDOI
TL;DR: A time-frequency representation for linear time-varying systems is applied to a model for speech production to formulate a quasi-stationary representation for the speech waveform, which has the property that simple time scaling of the parameters of the representation corresponds to changing the rate of the speech.
Abstract: Representation of a speech signal by its short-time Fourier transform and the application of this representation to the problem of time compression and expansion of speech are presented. A time-frequency representation for linear time-varying systems is applied to a model for speech production to formulate a quasi-stationary representation for the speech waveform. This representation has the property that simple time scaling of the parameters of the representation corresponds to changing the rate of the speech. Given a real speech signal, short-time Fourier analysis provides a technique for estimating and modifying these parameters. The results of the theoretical analysis are used to design a high-quality speech rate-change system which are simulated on a general-purpose digital mini-computer.

Proceedings ArticleDOI
R. Preuss1
02 Apr 1979
TL;DR: A spectral subtraction technique is described, which includes a biased estimate of the noise, that does not present musical tones at the output, and an automatic speech activity detector is described and used to adapt the noise estimate to changing noise environments.
Abstract: Performance of narrowband speech communications systems, such as Linear Predictive Coding (LPC), is often severely degraded by the presence of ambient acoustic noise in the input speech signal. Spectral subtraction techniques show promise in improving the overall performance of LPC in acoustic noise environments, but typically present annoying musical tones at the output. A spectral subtraction technique is described, which includes a biased estimate of the noise, that does not present musical tones at the output. In addition, an automatic speech activity detector is described and used to adapt the noise estimate to changing noise environments.

Journal ArticleDOI
TL;DR: This study obtained several objective measures of speech quality which, for the most part, show relatively little correlation with subjective quality and shows that the most successful objective predictor of subjective ratings is a linear combination of linear predictive coding distances.
Abstract: In a recently proposed communication system, there would be tandem connections of 16 kb/s delta modulators and 2.4 kb/s vocoders. Preliminary work has indicated that such tandem links would be of substantially lower quality than either the delta modulator link or the vocoder link alone. The present study, which includes an elaborate subjective speech quality experiment, confirms this preliminary conclusion. It also shows that two other differential waveform coders are no better than the proposed delta modulator in tandem links. On the other hand, a 5-band sub-band coder does offer substantially higher quality than the delta modulator. Still, its performance in tandem with the vocoder is poorer than that of the vocoder or the sub-band coder alone and is probably of only marginal value for practical communication. We have obtained several objective measures of speech quality which, for the most part, show relatively little correlation with subjective quality. The most successful objective predictor of subjective ratings is a linear combination of linear predictive coding distances.

Proceedings ArticleDOI
01 Apr 1979
TL;DR: A new source coding technique called MODULO-PCM (MPCM) is presented and it shown that this new scheme has essentially the same performance as linear predictive coding or transform coding.
Abstract: A new source coding technique called MODULO-PCM (MPCM) is presented and it shown that this new scheme has essentially the same performance as linear predictive coding or transform coding. In contrast with the conventional schemes, MPCM employs a simple memoryless encoder and a moderately complex decoder incorporating the Viterbi algorithm. Bounds for distortion in MPCM systems fora first-order Gauss-Markov process are numerically calculated.

Proceedings ArticleDOI
01 Apr 1979
TL;DR: Development of a pitch predictive ADPCM residual encoder and preliminary results on new harmonic generation techniques are discused and it is indicated that it is possible to remove the hoarseness currently associated with low data rate RELP speech.
Abstract: A new version of the Residual Excited Linear Predictive (RELP) vocoder has been simulated. The objective has been to reduce the data rate required for good quality speech to 4.8 kbps. Results have indicated that it is possible to remove the hoarseness currently associated with low data rate RELP speech. Development of a pitch predictive ADPCM residual encoder and preliminary results on new harmonic generation techniques are discused. Taped demonstrations will be played at the conference.

Proceedings ArticleDOI
01 Apr 1979
TL;DR: LPC vocoder performance in high acoustic noise environments and when the speaker is subjected to stress, vibrations and accelerations is described.
Abstract: Although 2400 BPS vocoders based upon Linear Predictive Coding have produced speech intelligibility scores as high as 90% in a quiet laboratory setting, few actual system measurements have been made in noisy, stressful, military environments This paper describes LPC vocoder performance in high acoustic noise environments and when the speaker is subjected to stress, vibrations and accelerations Measurements were made on military platforms which included ships, conventional aircraft, helicopters, tracked vehicles and wheeled vehicles; acoustic noise levels varied from 70 to 125dB Sound Pressure Level (1)

Proceedings ArticleDOI
01 Apr 1979
TL;DR: A speaker dependent system for recognizing carefully articulated continuous speech that accepts English sentences composed from a 127 word vocabulary appropriate to an airline information reservation task and achieves 75% sentence recognition.
Abstract: A speaker dependent system for recognizing carefully articulated continuous speech is described. The system accepts English sentences composed from a 127 word vocabulary appropriate to an airline information reservation task. The system is controlled by a finite state parser which generates word candidates and established their temporal locations in hypothetical sentences. The word candidates are evaluated by an LPC distance measure and a dynamic programming algorithm which nonlinearly time aligns isolated word reference templates with the input speech stream. The input is recognized as the hypothetical sentence having the lowest distance according to a well-defined criterion. In a preliminary test based on 100 sentences spoken over dialed up telephone lines by two male talkers, 90% word accuracy, resulting in 75% sentence recognition, was achieved.

Proceedings ArticleDOI
01 Apr 1979
TL;DR: This paper describes an alternative approach which involves modifying the time-series model at the outset to account for the presence of noise, and discusses the development of the model, the estimation algorithm, and some representative experimental results.
Abstract: Linear predictive coding (LPC) has been successfully applied to the encoding of speech and other time series. It has been widely observed, however, that the performance of an LPC algorithm deteriorates rapidly in the presence of background noise. In this paper, we describe and discuss one approach to the identification of a time series corrupted by additive white noise. A common approach to this problem is to prefilter the noisy time series, and then to apply an estimation algorithm which treats the time series as if it were noise-free. We describe an alternative approach which involves modifying the time-series model at the outset to account for the presence of noise. An estimation algorithm is then developed for this modified model. We discuss the development of the model, the estimation algorithm, and some representative experimental results.

Journal ArticleDOI
TL;DR: In this paper, a speech analysis-synthesis system using spectral parameters (samples of power spectra at different frequencies) was simulated, and the performance of area parameter interpolation between dyad boundaries was evaluated.
Abstract: A recent study [Olive and Spickenagel, J. Acoust. Soc. Am. 59, 993–996 (1976)] has shown that area parameters derived from linear prediction analysis can be linearly interpolated between dyad boundaries with very little distortion in the resultant synthesized speech. The success of area parameter interpolation raises a question: can other acoustic parameters, such as the power spectrum of the speech waveform, be similarly interpolated? The spectrum is of special interest because speech can be synthesized in real time from spectral parameters on a programmable digital filter. To study this question a speech analysis–synthesis system using spectral parameters (samples of power spectra at different frequencies) was simulated. These parameters were determined from the speech signal at every dyad boundary, and interpolated for intermediate values. Dyad boundaries (representing the limits of transition regions between phonemes) were determined manually. Informal listening tests comparing synthetic speech with a...

Proceedings ArticleDOI
L. Nebbia1, P. Lucchini1
02 Apr 1979
TL;DR: An automatic vocal response system for the Italian language has been implemented at CSELT, consisting of a hardware speech synthesizer controlled by a programmed device (mini or micro computer) and two excitation generators for voiced and unvoiced sounds.
Abstract: An automatic vocal response system for the Italian language has been implemented at CSELT, consisting of a hardware speech synthesizer controlled by a programmed device (mini or micro computer). The synthesizer exploits a speech production model composed of a 10th order digital lattice filter and two excitation generators for voiced and unvoiced sounds. The hardware includes also a module, which controls the updating and transfer of the parameters, and an output module which provides the analog speech signal. The synthesizer configuration is modular and expandible up to 8 channels. For each channel, the minicomputer supplies the synthesizer with the start-stop command plus 13 parameters: 10 filter coefficients, a gain factor, the pitch period and voiced-unvoiced information and the updating interval. For each channel, every 125 µs, 20 multiplications, 9 addition and 10 subtractions are executed. The filter and the source generator are time-shared among the 8 channels. The complete digital equipment is implemented by TTL-LS integrated circuits.

Proceedings ArticleDOI
01 Apr 1979
TL;DR: An effective and computationally inexpensive method of enhancing the linear prediction analysis/synthesis of noisy speech by proposing a preprocessing filter that is capable of perfectly removing the "expected" noise signal when the input speech spectrum is closely approximated by the noisy speech spectrum.
Abstract: The goal of this study was to develop an effective and computationally inexpensive method of enhancing the linear prediction analysis/synthesis of noisy speech. To this end, a preprocessing filter has been proposed that is capable of perfectly removing the "expected" noise signal when the input speech spectrum is closely approximated by the noisy speech spectrum. The proposed filter has been evaluated by the linear prediction distance measure, perceptual listening, and spectrograms. This evaluation has demonstrated the effectiveness of the filter for broadband noise removal. The filter has also been implemented as a preprocessing filter in a real time LPC system. The total processing time for the filtering is only 2.6 msec per 22.5 msec frame. In this system, the LPC analysis and synthesis takes a combined time of 13 msec.

Proceedings ArticleDOI
01 Apr 1979
TL;DR: A training sequence of speech data is used to design a two-step speech compression system, based upon either single speakers or multiple speakers, leading to an identification step using linear prediction techniques followed by a vector quantizer.
Abstract: A training sequence of speech data is used to design a two-step speech compression system, based upon either single speakers or multiple speakers. The system is designed to minimize an average spectral distortion over the training sequence, leading to an identification step using linear prediction techniques followed by a vector quantizer. The system is then used to compress test sequences of speech data, leading to much lower bit rates than obtained using scalar quantization for equivalent distortions. For the same numerical distortion, 20-bits/frame were required using "optimal" scalar bit allocation and quantization, whereas 8-bits/frame were required using vector quantization. Results are presented in the form of numerical distortion measures and analog tapes of synthesized speech.

Proceedings ArticleDOI
01 Apr 1979
TL;DR: A low bit-rate vocoder designed for improved speech reproduction quality and robustness is described, which includes a new algorithm, the Spectral Envelope Estimator, which forms the nucleus of the spectral analyzer.
Abstract: This paper describes a low bit-rate vocoder designed for improved speech reproduction quality and robustness. The vocoder includes a new algorithm, the Spectral Envelope Estimator, which forms the nucleus of the spectral analyzer. In addition to estimating the speech spectrum, the spectral analyzer also allows determination of a continuous estimate of the background noise spectrum which may be used for noise suppression. A maximum-likelihood pitch estimator, which shares the signal processing of the spectral envelope estimator, has been integrated into the vocoder to yield accurate pitch estimates of noisy speech. This system yields high quality speech reproduction at bit rates of 2.4 and 8.8 kbps.

Proceedings ArticleDOI
04 Sep 1979
TL;DR: This paper describes a linear predictive coder (LPQ and its microprocessor fabrication) that has an audio bandwidth of 3200 Hz, uses the autocorrelation formulation of LPC to determine the short term spectrum, and an Average Magni- tude Difference Function (AMDF) to extract pitch.
Abstract: This paper describes a linear predictive coder (LPQ and its microprocessor fabrication. The LPC has an audio bandwidth of 3200 Hz, uses the autocorrelation formulation of LPC to determine the short term spectrum, and an Average Magni- tude Difference Function (AMDF) to extract pitch. A two multiplier/stage lattice filter at the receiver recreates the speech. The 2400 b/s full duplex LPC is implemented in the firmware of a horizontally coded microprocessor having a 48-bit instruction word and 16-bit data word. The processor architecture uses a 4-bit TTL ALU slice and a hardwared 16 x 16 bit parallel multiplier to rapidly process the data with relatively slow multiplication circuitry. With the chosen architecture, the LPC requires only 60 percent of the proces sor's capacity, while the processor itself has fewer than 150 integrated components. The low cost of the voice digitizer has resulted in commercial sales in a market that appears to be growing.

Journal ArticleDOI
TL;DR: A recently developed two integrated circuit speech synthesis system represents a significant advance in large scale integration in both random logic and data storage functions.
Abstract: A recently : developed two integrated circuit speech synthesis system represents a significant advance in large scale integration in both random logic and data storage functions.

Proceedings ArticleDOI
01 Apr 1979
TL;DR: This paper describes continuing efforts which have concentrated on minimizing loss of synchronization between the receiver and the transmitter, and applies constraints which guarantee synchronization at a cost of some freedom in the selection of data for transmission.
Abstract: Recently we described a variable-frame-rate LPC vocoder designed to transmit good quality speech over 2400 bps fixed-rate noisy channels with bit-error probabilities ranging up to 5% [3]. The basic idea was to lower the data rate by transmitting LPC parameters only when speech characteristics have changed sufficiently since the last transmission, and to employ the resulting bit-rate savings for protecting important transmission data against channel noise. This paper describes our continuing efforts which have concentrated on minimizing loss of synchronization between the receiver and the transmitter. In one approach, we emphasize heavy protection of header, and rapid resynchronization. Alternatively, we apply constraints which guarantee synchronization at a cost of some freedom in the selection of data for transmission. Results from the first approach are presented; results from both methods will be compared at the conference.

Proceedings ArticleDOI
E. Vivalda1, S. Sandri, C. Miotti
01 Apr 1979
TL;DR: The paper describes the software architecture of an Italian text-to-speech synthesis system based on the joining of LPC coded diphones, which is designed according to multichannel and real time criteria.
Abstract: The paper describes the software architecture of an Italian text-to-speech synthesis system based on the joining of LPC coded diphones. The automatic voice response system is designed according to multichannel and real time criteria. For each output channel, the following operations are performed: pre-processing of the input string of characters, translation into the proper sequence of diphones, generation of prosodic contours and real-time control of a hardware speech synthesizer.