scispace - formally typeset
Search or ask a question

Showing papers by "John Makhoul published in 1978"


Journal ArticleDOI
John Makhoul1
TL;DR: In this paper, a class of minimum- or maximum-phase all-zero lattice digital filters, based on the two-multiplier lattice of Itakura and Saito, is developed.
Abstract: A class of minimum- or maximum-phase all-zero lattice digital filters, based on the two-multiplier lattice of Itakura and Saito, is developed. Different lattice forms with different numbers of multipliers are derived, including two one-multiplier forms. Many of the properties of these lattice filters are given, including the important orthogonalization and decoupling properties of successive stages in optimal inverse filtering of signals. These properties lead to important applications in the areas of adaptive linear prediction and adaptive Wiener filtering. As a specific example, the design of a new fast start-up equalizer is presented.

181 citations


Journal ArticleDOI
TL;DR: In this article, an excitation source model for speech compression and synthesis is presented that allows the degree of voicing to be varied continuously by mixing voiced and unvoiced excitations in a frequency-selective manner.
Abstract: This paper presents an excitation source model for speech compression and synthesis that allows the degree of voicing to be varied continuously by mixing voiced (pulse) and unvoiced (noise) excitations in a frequency‐selective manner. The mix is achieved by dividing the speech spectrum into two regions, with the pulse source exciting the low‐frequency region and the noise source exciting the high‐frequency region. The degree of voicing is specified by a parameter Fc, which corresponds to the cut‐off frequency between the voiced and unvoiced regions. For speech compression applications, Fc can be extracted automatically from the speech spectrum and transmitted. Experiments performed with the new model indicate its power in synthesizing natural sounding voiced fricatives and in largely eliminating the ’’buzzy’’ quality of vocoded speech. A functional definition of buzziness and naturalness is given in terms of the model.

83 citations


Proceedings ArticleDOI
10 Apr 1978
TL;DR: An excitation source model for speech compression and synthesis is presented, which allows for a degree of voicing by mixing voiced (pulse) and unvoiced (noise) excitations in a frequency-selective manner.
Abstract: This paper presents an excitation source model for speech compression and synthesis, which allows for a degree of voicing by mixing voiced (pulse) and unvoiced (noise) excitations in a frequency-selective manner. The mix is achieved by dividing the speech spectrum into two regions, with the pulse source exciting the low-frequency region and the noise source exciting the high-frequency region. A parameter F c determines the degree of voicing by specifying the cut-off frequency between the voiced and unvoiced regions. For speech compression applications, F c can be extracted automatically from the speech spectrum and transmitted. Experiments using the new model indicate its power in synthesizing natural sounding voiced fricatives, and in largely eliminating the "buzzy" quality of vocoded speech. A functional definition of buzziness and naturalness is given in terms of the model.

58 citations


Proceedings ArticleDOI
10 Apr 1978
TL;DR: A general method for adaptive updating of lattice coefficients in the linear predictive analysis of nonstationary signals is presented and a new fast start-up equalizer structure is presented, which results in a reduction of computations.
Abstract: A general method for adaptive updating of lattice coefficients in the linear predictive analysis of nonstationary signals is presented. The method is given as one of two sequential estimation methods, the other being a block sequential estimation method. The fast convergence of adaptive lattice algorithms is seen to be due to the orthogonalization and decoupling properties of the lattice. These properties are useful in adaptive Wiener filtering. As an application, a new fast start-up equalizer structure is presented. In addition, a one-multiplier form of the lattice is presented, which results in a reduction of computations.

49 citations



Journal ArticleDOI
TL;DR: A new set of regeneration based on duplication of the baseband spectrum at high frequencies is introduced, including a three‐point pitch predictor, a pitch‐adaptive quantizer, and adaptive shaping of the quantization‐noise spectrum.
Abstract: This paper surveys recent developments in adaptive predictive coding (APC) of speech. Prominent among these developments are the use of a three‐point pitch predictor, a pitch‐adaptive quantizer, entropy coding of the residual, and adaptive shaping of the quantization‐noise spectrum. APC systems produce high quality speech at around 16 kbit/s; their quality diminishes rapidly at 9.6 kbit/s or less. For those lower data rates, some form of baseband coding system becomes desirable. In such systems, a low‐frequency baseband is transmitted. The high‐frequency regeneration of the excitation spectrum from the baseband is of special importance. Traditional regeneration techniques have used some form of nonlinear distortion (usually rectification) of the baseband, followed by spectral flattening. We introduce a new set of regeneration based on duplication of the baseband spectrum at high frequencies. The audible signal distortions in rectification and spectral folding are compared.

15 citations


Proceedings ArticleDOI
01 Apr 1978
TL;DR: Several methods are presented for the objective speech quality evaluation of narrowband LPC vocoders, based on a framework that was proposed at the 1976 ICASSP conference, and high correlations obtained indicate the usefulness of these methods.
Abstract: Several methods are presented for the objective speech quality evaluation of narrowband LPC vocoders, based on a framework that we proposed at the 1976 ICASSP conference. In each method, the error in short-term spectral behavior between vocoded speech and the original is computed once every 10 ms. These errors are appropriately weighted and averaged over an utterance to produce a single objective score. Several short-term error measures, and time-weighting and averaging techniques are investigated. We evaluate the objective methods by correlating the resulting objective scores with formal subjective speech quality judgments. High correlations obtained indicate the usefulness of these methods.

11 citations


01 Apr 1978
TL;DR: The development of a speech processing computer facility with the ultimate goal of transmitting narrowband speech in real time over the ARPA Network and a reliable method for measuring subjective speech quality are described.
Abstract: : This report describes our work in the past three years on data compression and quality evaluation of digital speech We developed and implemented linear predictive coding (LPC) techniques with the overall objective of digitally transmitting high quality speech at the lowest possible average data rates over packet-switched communication media Major techniques reported include: covariance lattice method of linear prediction analysis, adaptive lattice methods, linear predictive spectral warping, improved quantization of LPC parameters, variable frame rate transmission of LPC parameters based on a functional perceptual model of speech, and a mixed-source model for LPC synthesizer to produce more natural-sounding speech Also, we developed a reliable method for measuring subjective speech quality This method was employed to formally demonstrate the quality improvements provided by our speech analysis/synthesis techniques as well as for studying speech quality as a function of LPC parameters As subjective procedures are generally expensive and time-consuming, we developed and tested several objective procedures for speech quality evaluation The results from these objective procedures were found to be highly correlated to the corresponding subjective quality judgments Another highlight of our work is the development of a speech processing computer facility with the ultimate goal of transmitting narrowband speech in real time over the ARPA Network

4 citations


Journal ArticleDOI
TL;DR: In this paper, the cutoff frequency of the filters, a continuous variable, replaces the usual binary voiced/voiceless decision, and the results show that the new source model greatly reduces perceived buzziness, occasionally at a cost of slightly increased breathiness.
Abstract: Our source model, reported at an earlier meeting, excites the LPO speech spectrum with a low‐frequency band of pulses mixed with a high‐frequency band of noise. Pulses are low‐pass filtered and noise is high‐pass filtered at the same frequency, to yield a flat source spectrum. The cutoff frequency of the filters, a continuous variable, replaces the usual binary voiced/voiceless decision. Thirty‐six phoneme‐specific test sentences were processed through a single high‐quality vocoder (5 kHz bandwidth, 11 poles, no quantization, 100 frames/s), which was excited in turn by both the usual pulse/noise source and by the new source. Subjects rated the resulting speech separately on eight‐point buzziness and breathiness scales. The results show that the new source model greatly reduces perceived buzziness, occasionally at a cost of slightly increased breathiness. Any remaining inadequacies can probably be ascribed to the algorithm that extracts the cutoff frequency during analysis, rather than to the model itself. [Work supported by ARPA‐IPTO.]

3 citations