scispace - formally typeset
Search or ask a question

Showing papers on "Speech coding published in 1974"


Journal ArticleDOI
N.S. Jayant1
01 May 1974
TL;DR: It is pointed out that error waveforms in speech quantization cannot be regarded as additive white noise, in general, and that for finer assessments of speech coders, either relative or absolute, one needs to supplement SNR-based observations with corrections for subjective and perceptual factors.
Abstract: A study is presented on the digital coding of speech by means of a straightforward approximation of the time waveform. In particular, the closely related discrete-time discrete-amplitude signal representations that are rather well known as pulse-code modulation (PCM), differential pulse-code modulation (DPCM), and delta modulation (DM) are discussed. Speech is recognized as a nonstationary signal, and emphasis is therefore placed on "companding" and "adaptive" strategies for waveform quantization and prediction. With signal-to-quantization-error ratio SNR as a performance measure, techniques are suggested which are most likely to be appropriate for given specifications of information rate. It is pointed out that error waveforms in speech quantization cannot be regarded as additive white noise, in general. This means that for finer assessments of speech coders, either relative or absolute, one needs to supplement SNR-based observations with corrections for subjective and perceptual factors. The latter seem to defy quantification as a rule. Invaluable, therefore, are explicit preference tests for direct comparisons of coders from a perceptual standpoint, and notions such as isopreference and multidimensional scaling are naturally appropriate in interpreting the results of such tests. Final points of concern are communication questions such as multiple encodings of speech by tandem coder-decoder pairs; conversions among different digital code formats; and the effects of additive and multiplicative noise in the communication channel, as manifest in the erroneous reception of speech-carrying bits. Information on these topics tends to be heterogeneous and nontheoretical, and the present digression into the subject is cursory by intent. The gramophone record accompanying this paper demonstrates some of the manipulations of speech that are discussed.

271 citations


Journal ArticleDOI
TL;DR: The residual encoding system with a Kalman filter or a stochastic approximation algorithm for identifying the predictor coefficients has produced good quality speech at a data rate of 16 kbit/s.
Abstract: A new method of speech digitization called residual encoding is introduced, and its application to the speech digitization problem is studied. The residual encoding system is a form of differential pulse code modulation which utilizes both an adaptive quantizer and an adaptive predictor. The residual encoder differs from previous systems in two ways. First, a sequential estimation method is used to continuously update the predictor coefficients, and second, the predictor coefficients are not transmitted, but are extracted from the estimate of the speech signal at both the transmitter and receiver. No form of pitch extraction is employed. The residual encoding system with a Kalman filter or a stochastic approximation algorithm for identifying the predictor coefficients has produced good quality speech at a data rate of 16 kbit/s.

69 citations


Journal ArticleDOI
Hisashi Kobayashi1, L. R. Bahl1
TL;DR: Predictive coding techniques for efficient transmission or storage of two-level (black and white) digital images and techniques for encoding the prediction error pattern to achieve compression of data are presented.
Abstract: This paper deals with predictive coding techniques for efficient transmission or storage of two-level (black and white) digital images. Part I discusses algorithms for prediction. A predictor transforms the two-dimensional dependence in the original data into a form which can be handled by coding techniques for one-dimensional data. The implementation and performance of a fixed predictor, an adaptive predictor with finite memory, and an adaptive linear predictor are discussed. Results of experiments performed on various types of scanned images are also presented. Part II deals with techniques for encoding the prediction error pattern to achieve compression of data.

61 citations


Journal ArticleDOI
Harvey F. Silverman1, N. Dixon
TL;DR: The parametrically controlled analyzer (PCA) is a large PL/I program which has been designed to perform spectral analysis of speech signals and features parametric selection of several analysis methods, including discrete Fourier transformation and linear predictive coding.
Abstract: The parametrically controlled analyzer (PCA) is a large PL/I program which has been designed to perform spectral analysis of speech signals. PCA features parametric selection of several analysis methods, including discrete Fourier transformation and linear predictive coding. Also, selection may be made among various smoothing, normalization, and interpolation methods. PCA develops high-quality spectrographic representations of speech for standard line printers and CRT displays. The PCA is described and numerous examples of various parameter settings are presented and discussed.

45 citations


Journal ArticleDOI
TL;DR: In this article, the propagation of errors in identifying constant coefficient parameters of a discrete time linear system, using stochastic approximation algorithms, is investigated, and error and sensitivity analysis algorithms are derived for the cases when there is structural modeling error as well as when the a priori statistics of identified parameters, and plant and measurement noise, are incorrectly specified.
Abstract: Propagation of errors in identifying constant coefficient parameters of a discrete time linear system, using stochastic approximation algorithms, is investigated. Error and sensitivity analysis algorithms are derived for the cases when there is structural modeling error as well as when the a priori statistics of identified parameters, and plant and measurement noise, are incorrectly specified. The error and sensitivity analysis algorithms are useful as a design tool to better specify appropriate identification algorithms for actual implementation. The error and sensitivity analysis algorithms are applied to several examples including identification of eight predictor coefficients for adaptive digitized speech transmission.

8 citations


Journal ArticleDOI
TL;DR: In analysis/synthesis systems for the digital coding of speech, the synthesis control information is normally required in ‘frames’ arriving at a constant rate, so a considerable reduction of frame rate is possible by transmitting appropriately selected frames, and deriving intermediate frames from those transmitted.
Abstract: In analysis/synthesis systems for the digital coding of speech, the synthesis control information is normally required in ‘frames’ arriving at a constant rate. At the expense of a small delay, a considerable reduction of frame rate is possible by transmitting appropriately selected frames, and deriving intermediate frames from those transmitted.

7 citations


Journal ArticleDOI
TL;DR: The neural networks investigated could produce output excitation patterns in which frequency and intensity information was coded with position, and such a system of speech coding, if not a true analogue of the auditory system, has immense potential in the field of speech recognition.

7 citations


01 Dec 1974
TL;DR: The authors have developed several methods for reducing the redundancy in the speech signal without sacrificing speech quality, including preemphasis of the incoming speech signal, adaptive optimal selection of predictor order, optimal selection and quantization of transmission parameters, variable frame rate transmission, optimal encoding, and improved synthesis methodology.
Abstract: : This report describes work in developing a linear predictive speech compression system that transmits high quality speech at low bit rates. The authors have developed several methods for reducing the redundancy in the speech signal without sacrificing speech quality. Included among these methods are preemphasis of the incoming speech signal, adaptive optimal selection of predictor order, optimal selection and quantization of transmission parameters, variable frame rate transmission, optimal encoding, and improved synthesis methodology. When all of these were incorporated a floating point simulation of a pitch-excited linear predictive vocoder, synthesized speech with high quality at average transmission rates as 1500 bps was obtained.

4 citations


Journal ArticleDOI
TL;DR: Current scientific efforts in the field of digital processing of speech are focused at improving the efficiency in the present state of the art, and of developing new digital speech communication systems.
Abstract: Current scientific efforts in the field of digital processing of speech are focused at the aims of improving the efficiency in the present state of the art, and of developing new digital speech communication systems. Therefore, thorough studies on the statistical characteristics of speech signals, speech coding, speech recognition, and speech synthesis are necessary. Recent results and actual trends are reviewed in this paper.

3 citations


ReportDOI
01 Apr 1974
TL;DR: It is found that linear prediction offers computational advantages over analysis-by- synthesis, as well as better modeling properties if the variations of the signal spectrum from the desired spectral model are large, and a suboptimal solution to the problem of all-zero modeling using linear prediction is given.
Abstract: : Linear prediction is presented as a spectral modeling technique in which the signal spectrum is modeled by an all-pole spectrum. The method allows for arbitrary spectral shaping in the frequency domain, and for modeling of continuous as well as discrete spectra (such as filter bank spectra). In addition, using the method of selective linear prediction, all-pole modeling is applied to selected portions of the spectrum, with applications to speech recognition and speech compression. Linear prediction is compared with traditional analysis-by-synthesis techniques for spectral modeling. It is found that linear prediction offers computational advantages over analysis-by- synthesis, as well as better modeling properties if the variations of the signal spectrum from the desired spectral model are large. For relatively smooth spectra and for filter bank spectra, analysis-by-synthesis is judged to give better results. Finally, a suboptimal solution to the problem of all-zero modeling using linear prediction is given.

3 citations


ReportDOI
01 Nov 1974
TL;DR: In this paper, the authors developed two generalizations of the standard Linear Predictive Coding (LPC) implementation of a narrow band speech compression system to improve the pitch excited system.
Abstract: : This report develops two generalizations of the standard Linear Predictive Coding (LPC) implementation of a narrow band speech compression system The purpose of each method is to improve the speech quality that is available from a standard LPC system Attention is focused primarily upon the pitch excited system and therefore, the improvements considered focus upon the improved estimation of the reflection coefficients and the pitch period Specifically, a parameter filtering algorithm is developed for dynamically smoothing the reflection coefficients to both increase naturalness in synthetic speech as well as eliminate the possibility of synthesis filter instabilities Secondly, a new method for calculating the k-parameters of an LPC inverse filtering algorithm is developed, STREAK

Journal ArticleDOI
TL;DR: An economical (< 600-dollar) hardware realization of a 4-kHz digital linear predictive speech synthesizer which requires, at most, a CPU overhead of about 40 percent real time and permits the utilization of formant concatenation techniques and reduces the coefficient storage required to specify vowels/voiced consonants by about 60 percent.
Abstract: Speech analysis/synthesis algorithms utilizing linear prediction coefficients have certain advantages over those employing formantbased techniques. For example, 4-kHz speech samples may be synthesized using a basic sequence of 10 multiply/adds followed by a single addition of the current sample of the excitation function. Real-time software synthesis of 4-kHz speech is possible (using this technique) on certain 16-b minicomputers, but the central processing unit (CPU) overhead may approach 100 percent. We describe an economical (< 600-dollar) hardware realization of a 4-kHz digital linear predictive speech synthesizer which requires, at most, a CPU overhead of about 40 percent real time. The device is constructed of standard TTL/MOS logic and consists (essentially) of a high speed 2's complement multiplier/adder capable of calculating a 26-b product (10-b speech samples, 16-b coefficients) in 0.33 μs, and a dual shift register. In addition, a procedure is discussed which enables the device to be used both as a formant synthesizer for vowels or voiced consonant production, and as a predictive synthesizer for other speech sounds. This procedure, hybrid synthesis, permits the utilization of formant concatenation techniques and reduces the coefficient storage required to specify vowels/voiced consonants by about 60 percent.