scispace - formally typeset
Search or ask a question

Showing papers on "Speech coding published in 1971"


Journal ArticleDOI
TL;DR: Additional coding of the DPCM output using entropy coding techniques (Huffman or Shannon-Fano coding) can result in a further increase in the signal-to-quantizing-noise ratio of 5.6 dB without increasing the transmission rate.
Abstract: Much of the redundancy in a speech or television signal is eliminated when it is encoded into digital form by a differential pulse-code-modulation (DPCM) encoder. Additional coding of the DPCM output using entropy coding techniques (Huffman or Shannon-Fano coding) can result in a further increase in the signal-to-quantizing-noise ratio of 5.6 dB without increasing the transmission rate.

29 citations


Journal ArticleDOI
TL;DR: Application of a type of predictive coding to the channel signals of a homomorphic vocoder has produced sizable bit rate reduction and a technique for obtaining the formant frequencies from the predictive coding parameters is described; this approach promises further bit rate reductions.
Abstract: Application of a type of predictive coding to the channel signals of a homomorphic vocoder has produced sizable bit rate reductions. With only slight degradation in speech quality, reduction (for the spectral envelope information) from 7800 to 4000 bits/s was achieved. A technique for obtaining the formant frequencies from the predictive coding parameters is described; this approach promises further bit rate reductions. As a by-product of this study of predictive coding, direct and cascade form speech synthesizers are compared on the basis of differing quantization effects.

18 citations


Journal ArticleDOI
TL;DR: A large set of vocoded speech signals has been evaluated in terms of preference and it is shown that, in certain respects, reliable system evaluations pose formidable problems.
Abstract: Starting from an IEEE Recommended Practice for Speech Quality Measurements and from previous work of the authors, a large set of vocoded speech signals has been evaluated in terms of preference. The set of speech samples has been taken from the vocoder survey of the 1967 Conference on Speech Communication and Processing, Boston, Mass. The test samples are evaluated by several methods: direct comparisons, the isopreferenee method, the relative preference method, the category judgment method, and the absolute preference judgment method. Due to the size of the test material, not all the test samples could be evaluated by all these methods. The test results are discussed and it is shown that, in certain respects, reliable system evaluations pose formidable problems. An effort to rank order the systems, which are described by small sets of test samples of frequently very different quality, for good reasons shows only limited success. The majority of the systems are of about equal preference with only insignificant differences. There are only a few systems that are outside this group and are either significantly better or worse than the rest.

4 citations


Journal ArticleDOI
TL;DR: The use of a small digital computer in processing the speech signal to achieve the intelligibility in speech signals by converting them into dichotic signals with an interaural time delay is described with illustrations.
Abstract: An increase in the rate and the intelligibility of sound is highly desirable in speech communication. Also, it is useful to have an accurate and efficient method of obtaining desired segments of a speech sample. In this paper, the use of a small digital computer in processing the speech signal to achieve the above purposes is described with illustrations. On‐line simulation of the method of Fairbanks et al. [G. Fairbanks et al., IRE Trans. Audio 2, 7–12, (1954)] of increasing the speech rate has been achieved with flexible speed‐up ratios and sampling intervals. Increase of intelligibility in speech signals by converting them into dichotic signals with an interaural time delay is discussed. These dichotic signals have been obtained from the computer for time delays between 0 and 1 sec. To obtain different segments of a speech sample, the computer is programmed to store the speech sample and display its waveform on an oscilloscope, so that various segments of the speech sample can be extracted and also joi...

3 citations


Journal ArticleDOI
TL;DR: Some elementary properties of complex BIFORE transform (CBT) have been reported recently and additional properties are now developed.
Abstract: Some elementary properties of complex BIFORE transform (CBT) have been reported recently. Additional properties of CBT are now developed.

2 citations



Journal ArticleDOI
TL;DR: In this article, a method of encoding speech for transmission at low bit rates is described, where the current sample of the speech wave digitized at 10 kHz is predicted as a linear combination of the 12 previous samples.
Abstract: A method of encoding speech for transmission at low bit rates is described. At the transmitter, the current sample of the speech wave digitized at 10 kHz is predicted as a linear combination of the 12 previous samples. The optimum linear combination is determined by minimizing the mean‐squared error between the actual and the predicted values of the speech samples. The pitch period and a binary voiced‐unvoiced parameter are determined by performing a short‐time autocorrelation analysis of the speech wave. Fifteen parameters, namely, the 12 predictor coefficients, the pitch period, the rms value of the speech signal, and the binary voiced‐unvoiced parameter are encoded into 72‐bit frames and transmitted to the receiver at uniform intervals. Different transmission rates are obtained by varying the interval between adjacent frames. At the receiver, the decoded transmission parameters are used to control a speech synthesizer consisting of a linear recursive filter excited by a suitable combination of quasiper...

1 citations