scispace - formally typeset
Search or ask a question

Showing papers on "Speech coding published in 1979"


Journal ArticleDOI
TL;DR: Improved speech quality is obtained by efficient removal of formant and pitch-related redundant structure of speech before quantizing, and by effective masking of the quantizer noise by the speech signal.
Abstract: Predictive coding methods attempt to minimize the rms error in the coded signal. However, the human ear does not perceive signal distortion on the basis of rms error, regardless of its spectral shape relative to the signal spectrum. In designing a coder for speech signals, it is necessary to consider the spectrum of the quantization noise and its relation to the speech spectrum. The theory of auditory masking suggests that noise in the formant regions would be partially or totally masked by the speech signal. Thus, a large part of the perceived noise in a coder comes from frequency regions where the signal level is low. In this paper, methods for reducing the subjective distortion in predictive coders for speech signals are described and evaluated. Improved speech quality is obtained: 1) by efficient removal of formant and pitch-related redundant structure of speech before quantizing, and 2) by effective masking of the quantizer noise by the speech signal.

376 citations


Journal Article
TL;DR: The performance of a one-bit digital matched i t e r (DMF) responding to binary signaling through a noisy multipath channel is analyzed and the output signal-to-noise ratio (SNR,) plays a key role in the analysis.
Abstract: We analyze the performance of a one-bit digital matched i t e r (DMF) responding to binary signaling through a noisy multipath channel. The output signal-to-noise ratio (SNR,) plays a key role in the analysis. The S N b performance is fully studied for a two-path channel, both fading and non-fading. For the general N-path channel, results are obtained only for the fading case. All results are conditioned on the knowledge of the time delays of all paths, and are valid only for smaIl input signal-to-noise ratio (SNRi). Most of the results have been verified by computer simulations.

208 citations


Journal ArticleDOI
TL;DR: In this article, the authors present a theoretical framework for the design of subband and transform coder for low bit-rate speech decoding, which is based on spectral estimation and models of speech production and perception.
Abstract: Frequency domain techniques for speech coding have recently received considerable attention. The basic concept of these methods is to divide the speech into frequency components by a filter bank (sub-band coding), or by a suitable transform (transform coding), and then encode them using adaptive PCM. Three basic factors are involved in the design of these coders: 1) the type of the filter bank or transform, 2) the choice of bit allocation and noise shaping properties involved in bit allocation, and 3) the control of the step-size of the encoders. This paper reviews the basic aspects of the design of these three factors for sub-band and transform coders. Concepts of short-time analysis/synthesis are first discussed and used to establish a basic theoretical framework. It is then shown how practical realizations of subband and transform coding are interpreted within this framework. Principles of spectral estimation and models of speech production and perception are then discussed and used to illustrate how the "side information" can be most efficiently represented and utilized in the design of the coder (particularly the adaptive transform coder) to control the dynamic bit allocation and quantizer step-sizes. Recent developments and examples of the "vocoder-driven" adaptive transform coder for low bit-rate applications are then presented.

207 citations


Proceedings ArticleDOI
02 Apr 1979
TL;DR: It is shown that the degree of rectification does not affect the output speech, and that the high-frequency noise source may be eliminated with proper processing, and a new type of HFR based on spectral duplication of the baseband is introduced.
Abstract: The traditional method of high-frequency regeneration (HFR) of the excitation signal in baseband coders has been to rectify the transmitted baseband, followed by spectral flattening. In addition, a noise source is added at high frequencies to compensate for lack of energy during certain sounds. In this paper, we reexamine the whole HFR process. We show that the degree of rectification does not affect the output speech, and that, with proper processing, the high-frequency noise source may be eliminated. We introduce a new type of HFR based on spectral duplication of the baseband. Two types of spectral duplication are presented: spectral folding and spectral translation. Finally, in order to eliminate the problem of breaking the harmonic structure due to spectral duplication, we propose a pitch-adaptive spectral duplication scheme in the frequency domain by using adaptive transform coding to code the baseband.

198 citations


Journal ArticleDOI
TL;DR: Research to code speech at 16 kbit/s with the goal of having the quality of the coded speech be equal to that of the original is reported, finding that the pitch predictor is not cost-effective on balance and may be eliminated.
Abstract: We report on research to code speech at 16 kbit/s with the goal of having the quality of the coded speech be equal to that of the original. Some of the original speech had been corrupted by noise and distortions typical of long-distance telephone lines. The basic structure chosen for our system was adaptive predictive coding. However, the rigorous requirements of this work led to a new outlook on the different aspects of adaptive predictive coding. We have found that the pitch predictor is not cost-effective on balance and may be eliminated. Solutions are presented to deal with the two types of quantization noise: clipping and granular noise. The clipping problem is completely eliminated by allowing the number of quantizer levels to increase indefinitely. An appropriate self-synchronizing variable-length code is proposed to minimize the average data rate; the coding scheme seems to be adequate for all speech and all conditions tested. The granular noise problem is treated by modifying the predictive coding system in a novel manner to include an adaptive noise spectral shaping filter. A design for such a filter is proposed that effectively eliminates the perception of granular noise.

99 citations


Journal ArticleDOI
TL;DR: The speech synthesis from concept system converts an input concept into speech by using a transformational grammar to generate a well‐formed English sentence and a word concatenation synthesizer to generate the actual speech output.
Abstract: A synthesis method, called speech synthesis from concept, is described which has been designed specifically for providing speech output from information systems. It differs from conventional techniques in that data is passed from the information system to the speech synthesis system, not in the form of text or phonetic transcription, but in the form of an abstract structure called an input concept. The speech synthesis from concept system converts an input concept into speech by using a transformational grammar to generate a well‐formed English sentence and a word concatenation synthesizer to generate the actual speech output. The ’’top down’’ nature of this process reduces the computation required within the information system and enables high‐quality speech to be produced.

69 citations


PatentDOI
Bishnu S. Atal1
TL;DR: In this paper, a speech signal is partitioned into intervals, and a set of coded prediction parameter signals, pitch period and voicing signals, and signals corresponding to the spectrum of the prediction error signal are produced.
Abstract: In a speech processing arrangement for synthesizing more natural sounding speech, a speech signal is partitioned into intervals. For each interval, a set of coded prediction parameter signals, pitch period and voicing signals, and a set of signals corresponding to the spectrum of the prediction error signal are produced. A replica of the speech signal is generated responsive to the coded pitch period and voicing signals as modified by the coded prediction parameter signals. The pitch period and voicing signals are shaped responsive to the prediction error spectral signals to compensate for errors in the predictive parameter signals whereby the speech replica is natural sounding.

48 citations


Journal ArticleDOI
TL;DR: It is shown that by careful design the algorithm can be made to be as robust to channel errors as that of a fixed rate adpcm coder.
Abstract: In this paper, we examine a number of concepts and issues concerning variable-rate coding of speech. We formulate the problem as a multistate coder (i.e., a coder that can operate at several bit rates) coupled with a time buffer. We first analyze the theoretical aspects of the problem by examining it in the context of a block processing formulation. We then suggest practical methods for implementing a variable rate coder based on a dynamic buffering approach. We also allude to a multiple user configuration of variable-rate coding for tasi-type applications. A practical example of a variable rate adpcm coder is presented and applied to speech coding. It is shown that by careful design the algorithm can be made to be as robust to channel errors as that of a fixed rate adpcm coder.

46 citations


Journal ArticleDOI
TL;DR: The multipath tree-encoding of speech at 8 kbits/s is investigated, and coding results for a stationary speech-like source are found to agree well with rate-distortion theoretic ideas, and when applied to speech, tree coding at 8000 bits/s yielded frequency-weighted SNR's of 15-20 dB.
Abstract: The multipath tree-encoding of speech at 8 kbits/s is investigated. Tree coding proceeds along the lines of Anderson, et al, but at this lower bit rate, frequency weighting of the error process and adaptation of the coding process are found to be beneficial. Coding results for a stationary speech-like source are found to agree well with rate-distortion theoretic ideas, and when applied to speech, tree coding at 8000 bits/s yielded frequency-weighted SNR's of 15-20 dB.

44 citations


Proceedings ArticleDOI
B. Atal1, N. David
01 Apr 1979
TL;DR: A modified analysis-synthesis procedure which, although relying on the basic LPC technique for analysis and synthesis, avoids spectral amplitude and phase distortions introduced by these techniques.
Abstract: In speech analysis and synthesis based on linear prediction, it is a common assumption that predictor coeffcients contain all the necessary spectral and phase information for accurate synthesis of the speech signal. However, even under the best circumstances, the synthetic speech sounds unnatural to the critical listener. Subjective tests reveal that spectral errors introduced by the linear prediction analysis techniques are a major source of unnatural sound quality in synthetic speech. This paper describes a modified analysis-synthesis procedure which, although relying on the basic LPC technique for analysis and synthesis, avoids spectral amplitude and phase distortions introduced by these techniques. In new method, proper reproduction of speech spectrum at the receiver is ensured by transmitting the short-time spectrum of prediction residual to the receiver.

39 citations


Journal ArticleDOI
TL;DR: Objective and subjective performance reductions, like low-pass filtering effects as one of the main sources of perceptual distortion, are investigated and proposals are made how to improve the performance of the coder at low and medium bit rates.
Abstract: This paper discusses problems of adaptive transform coding schemes at bit rates of 12 kbit/s and below. Objective and subjective performance reductions, like low-pass filtering effects as one of the main sources of perceptual distortion, are investigated and proposals are made how to improve the performance of the coder at low and medium bit rates. Additionally, the needed transmission of side information reduces the efficiency of the scheme. Various methods to lower the rate of this supplementary data signal are given as well as modifications of the scheme which lead to a more easily implemented coder structure.

Proceedings ArticleDOI
A. Barabell1, R. Crochiere
01 Apr 1979
TL;DR: Quadrature mirror filter techniques are applied to provide cancellation of aliasing between sub-bands, thus permitting a reduction in the order of the filters in sub-band coders for low bit-rate speech coding.
Abstract: In this paper we discuss several new issues in the design of sub-band coders for low bit-rate speech coding. We use unequally spaced filter banks to match perceptual criteria for speech. In particular, we apply quadrature mirror filter techniques to provide cancellation of aliasing between sub-bands, thus permitting a reduction in the order of the filters. Preliminary investigations have been made into the use of pitch prediction within the sub-bands. Designs will be discussed for coder bit rates of 9.6 and 16 kb/s and their performance will be compared with earlier sub-band coder designs.

PatentDOI
TL;DR: A time-frequency representation for linear time-varying systems is applied to a model for speech production to formulate a quasi-stationary representation for the speech waveform, which has the property that simple time scaling of the parameters of the representation corresponds to changing the rate of the speech.
Abstract: Representation of a speech signal by its short-time Fourier transform and the application of this representation to the problem of time compression and expansion of speech are presented. A time-frequency representation for linear time-varying systems is applied to a model for speech production to formulate a quasi-stationary representation for the speech waveform. This representation has the property that simple time scaling of the parameters of the representation corresponds to changing the rate of the speech. Given a real speech signal, short-time Fourier analysis provides a technique for estimating and modifying these parameters. The results of the theoretical analysis are used to design a high-quality speech rate-change system which are simulated on a general-purpose digital mini-computer.

Dissertation
18 Jun 1979
TL;DR: The design of the encoding system and specifications of system parameters are developed from the perceptual requirements and digital signal processing techniques, and the system is designed to exploit the limited detection ability of the auditory system.
Abstract: : The development of a digital encoding system for speech and audio signals is described. The system is designed to exploit the limited detection ability of the auditory system. Existing digital encoders are examined. Relevant psychoacoustic experiments are reviewed. Where the literature is lacking, a simple masking experiment is performed and the results reported. The design of the encoding system and specifications of system parameters are then developed from the perceptual requirements and digital signal processing techniques. The encoder is a multi-channel system, each channel approximately of critical bandwidth. The input signal is filtered via the quadrature mirror filter technique. An extensive development of this technique is presented. Channels are quantized with an adaptive PCM scheme. The encoder is evaluated for speech and audio signal inputs. For 4.1-kHz bandwidth speech, the differential threshold of encoding degradation occurs at a bit rate of 34.4 kbps. At 16 kbps, the encoder produces toll-quality speech output. Audio signals of 15-kHz bandwidth can be encoded at 123.8 kbps without audible degradation.

Proceedings ArticleDOI
01 Apr 1979
TL;DR: Development of a pitch predictive ADPCM residual encoder and preliminary results on new harmonic generation techniques are discused and it is indicated that it is possible to remove the hoarseness currently associated with low data rate RELP speech.
Abstract: A new version of the Residual Excited Linear Predictive (RELP) vocoder has been simulated. The objective has been to reduce the data rate required for good quality speech to 4.8 kbps. Results have indicated that it is possible to remove the hoarseness currently associated with low data rate RELP speech. Development of a pitch predictive ADPCM residual encoder and preliminary results on new harmonic generation techniques are discused. Taped demonstrations will be played at the conference.

Proceedings ArticleDOI
01 Apr 1979
TL;DR: LPC vocoder performance in high acoustic noise environments and when the speaker is subjected to stress, vibrations and accelerations is described.
Abstract: Although 2400 BPS vocoders based upon Linear Predictive Coding have produced speech intelligibility scores as high as 90% in a quiet laboratory setting, few actual system measurements have been made in noisy, stressful, military environments This paper describes LPC vocoder performance in high acoustic noise environments and when the speaker is subjected to stress, vibrations and accelerations Measurements were made on military platforms which included ships, conventional aircraft, helicopters, tracked vehicles and wheeled vehicles; acoustic noise levels varied from 70 to 125dB Sound Pressure Level (1)

Patent
30 Aug 1979
TL;DR: In this paper, a digital speech interpolation system is combined with an adaptive differential PCM (ADPCM), employing a speech detector for detecting speech signals and for discriminating voiced and unvoiced sounds.
Abstract: A digital speech interpolation system is combined with an adaptive differential PCM (ADPCM), employing a speech detector for detecting speech signals and for discriminating voiced and unvoiced sounds. An adaptive quantization bit assignment to the speech is adopted to cope with any freeze-out condition. And further PCM speech signals with 8 KHz sampling are applied to ADPCM after shifted 250 Hz down and then converted into 6 KHz sampling frequency, thereby attaining a total gain of about 7 without degrading speech quality.

Journal ArticleDOI
TL;DR: The author expects such methods to be available to the business world before too long including rapid speech synthesis from printed test inputs able to accommodate an unlimited vocabulary.
Abstract: Reviews the techniques for producing synthetic speech from a computer. The author expects such methods to be available to the business world before too long including rapid speech synthesis from printed test inputs able to accommodate an unlimited vocabulary. Topics include: analogue recording; human speech; compressed digital speech; speech synthesis from text; software; conversion to sound; synthetic speech for business.

Proceedings ArticleDOI
01 Apr 1979
TL;DR: A speaker dependent system for recognizing carefully articulated continuous speech that accepts English sentences composed from a 127 word vocabulary appropriate to an airline information reservation task and achieves 75% sentence recognition.
Abstract: A speaker dependent system for recognizing carefully articulated continuous speech is described. The system accepts English sentences composed from a 127 word vocabulary appropriate to an airline information reservation task. The system is controlled by a finite state parser which generates word candidates and established their temporal locations in hypothetical sentences. The word candidates are evaluated by an LPC distance measure and a dynamic programming algorithm which nonlinearly time aligns isolated word reference templates with the input speech stream. The input is recognized as the hypothetical sentence having the lowest distance according to a well-defined criterion. In a preliminary test based on 100 sentences spoken over dialed up telephone lines by two male talkers, 90% word accuracy, resulting in 75% sentence recognition, was achieved.

Proceedings ArticleDOI
B. Atal1, M. Schroeder
01 Apr 1979
TL;DR: Detailed procedures for minimizing the subjective loudness or audibility of quantizing noise in linear predictive coders take account of the frequency analysis by the human ear and its auditory masking properties.
Abstract: Adaptive shaping of the quantizing noise spectrum is essential for optimal subjective performance of speech coders. In this paper, we discuss detailed procedures for minimizing the subjective loudness or audibility of quantizing noise in linear predictive coders. The procedure takes account of the frequency analysis by the human ear and its auditory masking properties.

Proceedings ArticleDOI
01 Apr 1979
TL;DR: This paper describes an alternative approach which involves modifying the time-series model at the outset to account for the presence of noise, and discusses the development of the model, the estimation algorithm, and some representative experimental results.
Abstract: Linear predictive coding (LPC) has been successfully applied to the encoding of speech and other time series. It has been widely observed, however, that the performance of an LPC algorithm deteriorates rapidly in the presence of background noise. In this paper, we describe and discuss one approach to the identification of a time series corrupted by additive white noise. A common approach to this problem is to prefilter the noisy time series, and then to apply an estimation algorithm which treats the time series as if it were noise-free. We describe an alternative approach which involves modifying the time-series model at the outset to account for the presence of noise. An estimation algorithm is then developed for this modified model. We discuss the development of the model, the estimation algorithm, and some representative experimental results.

Proceedings ArticleDOI
02 Apr 1979
TL;DR: An algorithm which allows to find out the M-output level quantizer characteristics minimizing the distortion with respect to both the Mean-Squared- Error (MSE) and Mean-Absolute-Error (MAE) criterions is described.
Abstract: This paper describes an algorithm which allows to find out the M-output level quantizer characteristics minimizing the distortion with respect to both the Mean-Squared-Error (MSE) and Mean-Absolute-Error (MAE) criterions This algorithm can be used with any kind of signal amplitude distributions ranging from analytical probability density functions (pdf) to experimental 'density functions In the particular case of known pdf like uniform, Gaussian, Laplacian and Gamma densities, it gives results which agree or are better than those previously published /2,4,5,6/ The same type of algorithmic procedure may also be used for block-of-samples quantization; in this case the statistics average must be replaced by the time average In addition, the simplicity of the proposed algorithm allows to envisage a real-time, microprocessor based, block adaptive quantizer implementation in which the quantizer parameters are periodically updated and transmitted with each data block This technique can be used, for instance, to optimally quantize the speech coding parameters derived from the low bit rates speech compression algorithm as described in [7]

Proceedings ArticleDOI
01 Apr 1979
TL;DR: It is shown that by careful design the algorithm can be made to be as robust to channel errors as that of a fixed rate ADPCM coder.
Abstract: In this paper we examine a number of concepts and issues concerning variable rate coding of speech. We formulate the problem as a multistate coder (i.e. a coder that can operate at several bit rates) coupled with a time buffer. We first analyze the theoretical aspects of the problem by examining it in the context of a block processing formulation. We also allude to a multiple user configuration of variable rate coding for TASI type applications. A practical example of a variable rate ADPCM coder is presented and applied to speech coding. It is shown that by careful design the algorithm can be made to be as robust to channel errors as that of a fixed rate ADPCM coder.

Patent
25 May 1979
TL;DR: In this article, a system for time segment scrambling speech encoding is described, where an internal key code is provided for control of the scrambling and reset units by start and reset clock pulses, and the scrambling unit can also scramble individual segments by an inversion process in accordance with, for example, time inversion and frequency inversion.
Abstract: A system is provided for time segment scrambling speech encoding, wherein an internal key code is provided for control thereof. The scrambling and reset units are synchronized by start and reset clock pulses. The scrambling unit can also scramble individual segments by an inversion process in accordance with a second internal key code, such as, for example, time inversion and frequency inversion. A related system for unscrambling scrambled speech transmission and synchronization therefore is set forth as another embodiment of the invention.

Journal ArticleDOI
TL;DR: This paper describes a speech digitizer that is capable of transmitting and receiving at 2400 bits/s and typical applications of such digitizers are described.
Abstract: This paper describes a speech digitizer that is capable of transmitting and receiving at 2400 bits/s. Comparisons are made between this implementation and past approaches. Typical applications of such digitizers are also described.

01 May 1979
TL;DR: A new technique to reduce the effect of quantization noise in PCM speech coding is proposed using dither noise to ensure that the quantization errors can be modeled as additive signal-independent noise, and then reducing this noise through the use of a noise reduction system.
Abstract: A new technique to reduce the effect of quantization noise in PCM speech coding is proposed. The procedure consists of using dither noise to ensure that the quantization errors can be modeled as additive signal-independent noise, and then reducing this noise through the use of a noise reduction system. The procedure is illustrated with examples.

Journal ArticleDOI
TL;DR: A recently developed two integrated circuit speech synthesis system represents a significant advance in large scale integration in both random logic and data storage functions.
Abstract: A recently : developed two integrated circuit speech synthesis system represents a significant advance in large scale integration in both random logic and data storage functions.

Patent
06 Sep 1979
TL;DR: In this article, the disclosed weighting circuit gradually changes the weight or value of signal bits constituting the PCM audio signal in accordance with a command signal so as to vary gradually a level of a reproduced analog audio signal corresponding to the audio signal.
Abstract: Upon reproducing a PCM audio signal, the disclosed weighting circuit gradually changes the weight or value of signal bits constituting the PCM audio signal in accordance with a command signal so as to vary gradually a level of a reproduced analog audio signal corresponding to the PCM audio signal.

Proceedings ArticleDOI
01 Apr 1979
TL;DR: This paper describes continuing efforts which have concentrated on minimizing loss of synchronization between the receiver and the transmitter, and applies constraints which guarantee synchronization at a cost of some freedom in the selection of data for transmission.
Abstract: Recently we described a variable-frame-rate LPC vocoder designed to transmit good quality speech over 2400 bps fixed-rate noisy channels with bit-error probabilities ranging up to 5% [3]. The basic idea was to lower the data rate by transmitting LPC parameters only when speech characteristics have changed sufficiently since the last transmission, and to employ the resulting bit-rate savings for protecting important transmission data against channel noise. This paper describes our continuing efforts which have concentrated on minimizing loss of synchronization between the receiver and the transmitter. In one approach, we emphasize heavy protection of header, and rapid resynchronization. Alternatively, we apply constraints which guarantee synchronization at a cost of some freedom in the selection of data for transmission. Results from the first approach are presented; results from both methods will be compared at the conference.

Proceedings ArticleDOI
01 Apr 1979
TL;DR: Examples of designs and performance of adaptive transform coders will be presented for bit rates in the range of 8 to 16 kbs.
Abstract: Adaptive Transform coding techniques for speech communication have recently received considerable attention. The basic concept of these methods is to divide the speech into frequency components by a suitable transform and then encode them using adaptive PCM. Examples of designs and performance of adaptive transform coders will be presented for bit rates in the range of 8 to 16 kbs.