scispace - formally typeset
Search or ask a question

Showing papers on "Linear predictive coding published in 1974"


Journal ArticleDOI
TL;DR: An algorithm is presented which finds the frequency and amplitude of the first three formants during all vowel-like segments of continuous speech, using as input the peaks of the linear prediction spectra and a segmentation parameter to indicate energy and voicing.
Abstract: An algorithm is presented which finds the frequency and amplitude of the first three formants during all vowel-like segments of continuous speech. It uses as input the peaks of the linear prediction spectra and a segmentation parameter to indicate energy and voicing. Ideally, the first three peaks are the first three formants. Frequently, however, two peaks merge, or spurious peaks appear, and the difficult part is to recognize such situations and deal with them. The general method is to fill formant slots with the available peaks at each frame, based on frequency position relative to an educated guess. Then, if a peak is left over and/or a slot is unfilled, special routines are called to decide how to deal with them. Included is a formant enhancement technique, analogous to a similar technique which has been implemented via the chirp-z transform [8], which usually succeeds in separating two merged formants. Processing begins at the middle of each high volume voiced segment, where formants are most likely to be correct, and branches outward from there in both directions in time, using the most recently found formant frequencies as the educated guess for the current frame. The algorithm has been implemented at Lincoln Laboratory on the Univac 1219 and the Fast Digital Processor, a programmable processor [9], and has been tested on a large number of unrestricted sentences.

247 citations


Journal ArticleDOI
TL;DR: In this article, a spectral-flatness measure is introduced to give a quantitative measure of "whiteness" of a spectrum, and it is shown that maximizing the spectral flatness of an inverse filter output or linear predictor error is equivalent to the autocorrelation method of linear prediction.
Abstract: The purpose of this paper is to introduce a spectral-flatness measure into the study of linear prediction analysis of speech. A spectral-flatness measure is introduced to give a quantitative measure of "whiteness," of a spectrum. It is shown that maximizing the spectral flatness of an inverse filter output or linear predictor error is equivalent to the autocorrelation method of linear prediction. Theoretical properties of the flatness measure are derived, and compared with experimental results. It is shown that possible ill-conditioning of the analysis problem is directly related to the spectral-flatness measure and that prewhitening by a simple first-order linear predictor to increase spectral flatness can greatly reduce the amount of ill-conditioning.

160 citations


Journal ArticleDOI
TL;DR: Experimental results are presented which illustrate both the capabilities and limitations of linear prediction vocoders.
Abstract: A detailed discussion of the computer simulation of a linear prediction vocoder system is presented. The basic technique used for analysis is the autocorrelation method of linear prediction. New results include modifications to the simplified inverse filter tracking (SIFT) algorithm for more efficient pitch extraction, coding algorithms for low-bit rate transmission, a simplified synthesizer gain calculation, and a bias correction for the synthesizer driving function. Experimental results are presented which illustrate both the capabilities and limitations of linear prediction vocoders.

89 citations


Journal ArticleDOI
TL;DR: For pitch synchronous analysis, nonstationarity is a better assumption than stationarity, but for pitch asynchronous analysis and large analysis segment size the performance of both formulations in representing the speech waveform is practically the same.
Abstract: The purpose of this paper is to present the theoretical differences and results of experimental comparison of the stationary (autocorrelation) and nonstationary (covariance) linear prediction formulations when applied to voiced speech analysis. In this experimental study three criterion used for comparison purposes are: 1) total minimum normalized squared error, 2) accuracy in estimating speech spectrum, and 3) accuracy in estimating formant parameters. The results of linear prediction pitch synchronous as well as pitch asynchronous analyses of synthetic and natural speech are given. Influence of analysis segment size and its position on the estimated formant parameters and total minimum normalized squared error have been investigated. For pitch synchronous analysis, nonstationarity is a better assumption than stationarity, but for pitch asynchronous analysis and large analysis segment size (20-25 ms) the performance of both formulations in representing the speech waveform is practically the same.

51 citations


Journal ArticleDOI
Harvey F. Silverman1, N. Dixon
TL;DR: The parametrically controlled analyzer (PCA) is a large PL/I program which has been designed to perform spectral analysis of speech signals and features parametric selection of several analysis methods, including discrete Fourier transformation and linear predictive coding.
Abstract: The parametrically controlled analyzer (PCA) is a large PL/I program which has been designed to perform spectral analysis of speech signals. PCA features parametric selection of several analysis methods, including discrete Fourier transformation and linear predictive coding. Also, selection may be made among various smoothing, normalization, and interpolation methods. PCA develops high-quality spectrographic representations of speech for standard line printers and CRT displays. The PCA is described and numerous examples of various parameter settings are presented and discussed.

45 citations


ReportDOI
01 Apr 1974
TL;DR: This report is a collection of Working Papers in Speech Recognition on the following topics: Organization of the HEARSAY II speech understanding system, The DRAGON system, Parameter-independent machine segmentation and labeling, and a new time-domain analysis of fricatives and stop consonants.
Abstract: : The report is a collection of Working Papers in Speech Recognition on the following topics: Organization of the HEARSAY II speech understanding system; The DRAGON system -- an overview; Parameter-independent machine segmentation and labeling; A new time-domain analysis of fricatives and stop consonants; Sub-lexical levels in the HEARSAY II speech understanding system; Inference and use of simple predictive grammars; Real-time linear predictive coding of speech on the SPS-41 microprogrammed triple-processor system; A 16-bit A-D-A conversion system for high-fidelity audio research.

10 citations


Journal ArticleDOI
TL;DR: In analysis/synthesis systems for the digital coding of speech, the synthesis control information is normally required in ‘frames’ arriving at a constant rate, so a considerable reduction of frame rate is possible by transmitting appropriately selected frames, and deriving intermediate frames from those transmitted.
Abstract: In analysis/synthesis systems for the digital coding of speech, the synthesis control information is normally required in ‘frames’ arriving at a constant rate. At the expense of a small delay, a considerable reduction of frame rate is possible by transmitting appropriately selected frames, and deriving intermediate frames from those transmitted.

7 citations


Journal ArticleDOI
TL;DR: Two preprocessing methods by which the spectral dynamic range of the speech signal is reduced, thereby improving quantization properties are described and experiments indicate that an appropriate set of preemphasis filters can be pre‐selected.
Abstract: In any vocoder, the process of parameter quantization is crucial for reducing the transmission rate while maintaining the quality of the synthesized speech. We have observed that, for linear predictive vocoders, the short‐time spectral dynamic range of the speech signal is the single most important factor that affects the quantization properties of transmission parameters, irrespective of the particular choice of parameters. We describe two preprocessing methods by which the spectral dynamic range is reduced, thereby improving quantization properties. In the first method, adaptive optimal (linear predictive) preemphasis is applied either directly to the speech signal or more easily to the autocorrelation coefficients through convolution. Experiments indicate that an appropriate set of preemphasis filters can be pre‐selected; for any speech segment being analyzed, the filter from this set that is closest to the optimal choice can be used for preemphasis. The advantages of such a scheme as well as the effect of first‐ and second‐order preemphasis on quantization and speech quality are discussed. The second method of preprocessing (the SIGMA method) involves multiplying the impulse response of the inverse linear prediction filter by an exponential window. Some preliminary results obtained using this method are reported.

5 citations


01 Dec 1974
TL;DR: The authors have developed several methods for reducing the redundancy in the speech signal without sacrificing speech quality, including preemphasis of the incoming speech signal, adaptive optimal selection of predictor order, optimal selection and quantization of transmission parameters, variable frame rate transmission, optimal encoding, and improved synthesis methodology.
Abstract: : This report describes work in developing a linear predictive speech compression system that transmits high quality speech at low bit rates. The authors have developed several methods for reducing the redundancy in the speech signal without sacrificing speech quality. Included among these methods are preemphasis of the incoming speech signal, adaptive optimal selection of predictor order, optimal selection and quantization of transmission parameters, variable frame rate transmission, optimal encoding, and improved synthesis methodology. When all of these were incorporated a floating point simulation of a pitch-excited linear predictive vocoder, synthesized speech with high quality at average transmission rates as 1500 bps was obtained.

4 citations


Journal ArticleDOI
TL;DR: The RELP vocoder combines all the attractive concepts of linear predictive coding (LPC), voice excited vocoder (VEV), and adaptive delta modulation (ADM), and employs LPC for a better spectral representation of the vocal tract, the concept of VEV for bandwidth compression of excitation signal, and ADM for its simplicity of implementation and ability of accurate coding of a low‐frequency signal.
Abstract: We describe a linear predictive vocoder excited by the speech residual with the transmission bit rate below 8k bits/sec. The features of the residual excited linear predictive (RELP) vocoder system are as follows: (1) no pitch extraction is needed; (2) the bit rate is relatively low; and (3) the system is simple for hardware implementation. The RELP vocoder combines all the attractive concepts of linear predictive coding (LPC), voice excited vocoder (VEV), and adaptive delta modulation (ADM); that is, the system employs LPC for a better spectral representation of the vocal tract, the concept of VEV for bandwidth compression of excitation signal, and ADM for its simplicity of implementation and ability of accurate coding of a low‐frequency signal. The LPC residual is generated by a feed‐forward LPC analyzer, is low‐pass filtered at cutoff frequency 800 Hz, and then is coded by ADM. At the synthesizer the decoded residual is passed through a spectral flattener and its output is finally fed as an excitation signal to the LPC synthesizer to obtain synthesized speech. The quality of synthesized speech is, in our opinion, reasonably good compared with those of other vocoders with the same bit rate. [The study was supported by the Defense Communications Agency.]

4 citations


Journal ArticleDOI
TL;DR: Current scientific efforts in the field of digital processing of speech are focused at improving the efficiency in the present state of the art, and of developing new digital speech communication systems.
Abstract: Current scientific efforts in the field of digital processing of speech are focused at the aims of improving the efficiency in the present state of the art, and of developing new digital speech communication systems. Therefore, thorough studies on the statistical characteristics of speech signals, speech coding, speech recognition, and speech synthesis are necessary. Recent results and actual trends are reviewed in this paper.

ReportDOI
01 Apr 1974
TL;DR: It is found that linear prediction offers computational advantages over analysis-by- synthesis, as well as better modeling properties if the variations of the signal spectrum from the desired spectral model are large, and a suboptimal solution to the problem of all-zero modeling using linear prediction is given.
Abstract: : Linear prediction is presented as a spectral modeling technique in which the signal spectrum is modeled by an all-pole spectrum. The method allows for arbitrary spectral shaping in the frequency domain, and for modeling of continuous as well as discrete spectra (such as filter bank spectra). In addition, using the method of selective linear prediction, all-pole modeling is applied to selected portions of the spectrum, with applications to speech recognition and speech compression. Linear prediction is compared with traditional analysis-by-synthesis techniques for spectral modeling. It is found that linear prediction offers computational advantages over analysis-by- synthesis, as well as better modeling properties if the variations of the signal spectrum from the desired spectral model are large. For relatively smooth spectra and for filter bank spectra, analysis-by-synthesis is judged to give better results. Finally, a suboptimal solution to the problem of all-zero modeling using linear prediction is given.

ReportDOI
01 Nov 1974
TL;DR: In this paper, the authors developed two generalizations of the standard Linear Predictive Coding (LPC) implementation of a narrow band speech compression system to improve the pitch excited system.
Abstract: : This report develops two generalizations of the standard Linear Predictive Coding (LPC) implementation of a narrow band speech compression system The purpose of each method is to improve the speech quality that is available from a standard LPC system Attention is focused primarily upon the pitch excited system and therefore, the improvements considered focus upon the improved estimation of the reflection coefficients and the pitch period Specifically, a parameter filtering algorithm is developed for dynamically smoothing the reflection coefficients to both increase naturalness in synthetic speech as well as eliminate the possibility of synthesis filter instabilities Secondly, a new method for calculating the k-parameters of an LPC inverse filtering algorithm is developed, STREAK

Journal ArticleDOI
TL;DR: An economical (< 600-dollar) hardware realization of a 4-kHz digital linear predictive speech synthesizer which requires, at most, a CPU overhead of about 40 percent real time and permits the utilization of formant concatenation techniques and reduces the coefficient storage required to specify vowels/voiced consonants by about 60 percent.
Abstract: Speech analysis/synthesis algorithms utilizing linear prediction coefficients have certain advantages over those employing formantbased techniques. For example, 4-kHz speech samples may be synthesized using a basic sequence of 10 multiply/adds followed by a single addition of the current sample of the excitation function. Real-time software synthesis of 4-kHz speech is possible (using this technique) on certain 16-b minicomputers, but the central processing unit (CPU) overhead may approach 100 percent. We describe an economical (< 600-dollar) hardware realization of a 4-kHz digital linear predictive speech synthesizer which requires, at most, a CPU overhead of about 40 percent real time. The device is constructed of standard TTL/MOS logic and consists (essentially) of a high speed 2's complement multiplier/adder capable of calculating a 26-b product (10-b speech samples, 16-b coefficients) in 0.33 μs, and a dual shift register. In addition, a procedure is discussed which enables the device to be used both as a formant synthesizer for vowels or voiced consonant production, and as a predictive synthesizer for other speech sounds. This procedure, hybrid synthesis, permits the utilization of formant concatenation techniques and reduces the coefficient storage required to specify vowels/voiced consonants by about 60 percent.

Journal ArticleDOI
TL;DR: In this paper, a linear predictive speech compression system based on stationary autocorrelation formulation is described, which employs pitch synchronous analysis of 4-kHz bandlimited input speech sampled at 10-kHz rate.
Abstract: A linear predictive speech compression system based on stationary autocorrelation formulation is described. This system employs pitch synchronous analysis of 4‐kHz bandlimited input speech sampled at 10‐kHz rate. An all‐pole speech production model has been assumed and a 10th‐order linear predicter is used. Quantization properties of k parameters [Markel and Gray, “On Autocorrelation Equations as Applied to Speech Analysis,” IEEE Trans. Audio Electroacoust. (April 1973)], pitch and energy from a large speech data base, and phonetically balanced (PB) words and sentences are discussed. This paper emphasizes the updating of speech transmission parameters based on relative changes in normalized energy between two consecutive analysis intervals. The transmission rate is assumed to be constant but the frame size, for which the transmission parameters are used, is time variant. Thresholds have been defined for relative changes in normalized energy and changes in analysis interval size between two consecutive ana...