scispace - formally typeset
Search or ask a question

Showing papers on "Speech coding published in 1978"


Book
05 Sep 1978
TL;DR: This paper presents a meta-modelling framework for digital Speech Processing for Man-Machine Communication by Voice that automates the very labor-intensive and therefore time-heavy and expensive process of encoding and decoding speech.
Abstract: 1. Introduction. 2. Fundamentals of Digital Speech Processing. 3. Digital Models for the Speech Signal. 4. Time-Domain Models for Speech Processing. 5. Digital Representation of the Speech Waveform. 6. Short-Time Fourier Analysis. 7. Homomorphic Speech Processing. 8. Linear Predictive Coding of Speech. 9. Digital Speech Processing for Man-Machine Communication by Voice.

3,103 citations


Journal ArticleDOI
TL;DR: This paper considers the estimation of speech parameters in an all-pole model when the speech has been degraded by additive background noise and develops a procedure based on maximum a posteriori (MAP) estimation techniques which is related to linear prediction analysis of speech.
Abstract: This paper considers the estimation of speech parameters in an all-pole model when the speech has been degraded by additive background noise. The procedure, based on maximum a posteriori (MAP) estimation techniques is first developed in the absence of noise and related to linear prediction analysis of speech. The modification in the presence of background noise is shown to be nonlinear. Two suboptimal procedures are suggested which have linear iterative implementations. A preliminary illustration and discussion based both on a synthetic example and real speech data are given.

590 citations


Journal ArticleDOI
TL;DR: New results of masking and loudness reduction of noise are reported and the design principles of speech coding systems exploiting auditory masking are described.
Abstract: In any speech coding system that adds noise to the speech signal, the primary goal should not be to reduce the noise power as much as possible, but to make the noise inaudible or to minimize its subjective loudness. ’’Hiding’’ the noise under the signal spectrum is feasible because of human auditory masking: sounds whose spectrum falls near the masking threshold of another sound are either completely masked by the other sound or reduced in loudness. In speech coding applications, the ’’other sound’’ is, of course, the speech signal itself. In this paper we report new results of masking and loudness reduction of noise and describe the design principles of speech coding systems exploiting auditory masking.

434 citations


Journal ArticleDOI
TL;DR: Preliminary tests indicate that the least mean-square adaptive filtering approach for removing the deleterious effects of additive noise on the speech signal improves the perceived speech quality and increases the signal-to-noise ratio (SNR) by 7 dB in a 0 dB environment.
Abstract: A least mean-square (LMS) adaptive filtering approach has been formulated for removing the deleterious effects of additive noise on the speech signal. Unlike the classical LMS adaptive filtering scheme, the proposed method is designed to cancel out the clean speech signal. This method takes advantage of the quasi-periodic nature of the speech signal to form an estimate of the clean speech signal at time t from the value of the signal at time t minus the estimated pitch period. For additive white noise distortion, preliminary tests indicate that the method improves the perceived speech quality and increases the signal-to-noise ratio (SNR) by 7 dB in a 0 dB environment. The method has also been shown to partially remove the perceived granularity of CVSD coded speech signals and to lead to an improvement in the linear prediction analysis/synthesis of noisy speech.

207 citations


Proceedings ArticleDOI
10 Apr 1978
TL;DR: This paper presents the results of a pilot study comparing four different speech waveform coding techniques of varying complexity, and conclusions are drawn concerning the quality and complexity, of different coding techniques.
Abstract: This paper presents the results of a pilot study comparing four different speech waveform coding techniques of varying complexity. Coder transmission rates of 24, 16, and 9.6 Kb/s were used in the experiment. Subjective ratings and objective measurements of quality are obtained and compared. A number of conclusions are drawn concerning the quality and complexity, of different coding techniques. By comparing the objective measurements to the subjective ratings a number of conclusions are also drawn concerning the strengths and weaknesses of various (objective) quality measures of speech waveform coders.

143 citations


Proceedings ArticleDOI
10 Apr 1978
TL;DR: Improved speech quality is obtained a) by efficient removal of formant and pitch related redundant structure of speech before quantizing and b) by effective masking of the quantizer noise by the speech signal.
Abstract: Predictive coding methods attempt to minimize the r.m.s. error in the coded signal. However, the human ear does not perceive signal distortion on the basis of r.m.s. error regardless of its spectral shape relative to the signal spectrum. Specifically, for speech signals, the locations of the formant frequencies and their rates of change with time influence the audibility, and thus the subjective distortion of any quantizing noise. In this paper, methods for reducing the subjective distortion in predictive coders for speech siganls are described and evaluated. Improved speech quality is obtained a) by efficient removal of formant and pitch related redundant structure of speech before quantizing and b) by effective masking of the quantizer noise by the speech signal.

94 citations


PatentDOI
TL;DR: A system and method for speech recognition provides a means of printing phonemes in response to received speech signals utilizing inexpensive components and an algorithm for detecting major slope transitions of the analog speech signals.
Abstract: A system and method for speech recognition provides a means of printing phonemes in response to received speech signals utilizing inexpensive components. The speech signals are inputted into an amplifier which provides negative feedback to normalize the amplitude of the speech signals. The normalized speech signals are delta modulated at a first sampling rate to produce a corresponding first sequence of digital pulses. The negative feedback signal of the amplifier is delta modulated at a second sampling rate to produce a second sequence of digital pulses corresponding to amplitude information of the speech signals. The speech signals are filtered and utilized to produce a digital pulse corresponding to high frequency components of the speech signals having magnitudes in excess of a threshold voltage. A microprocessor contains an algorithm for detecting major slope transitions of the analog speech signals in response to the first sequence of digital signals by detecting information corresponding to presence and absence of predetermined numbers of successive slope reversals in the delta modulator producing the first sequence of digital pulses. The algorithm computes cues from the high frequency digital pulse and the second sequence of pulses. The algorithm computes a plurality of speech waveform characteristic ratios of time intervals between various slope transitions and compares the speech waveform characteristic ratios with a plurality of stored phoneme ratios representing a set of phonemes to detect matching therebetween. The order of comparing is determined on the basis of the cues and a configuration of a phoneme decision tree contained in the algorithm. When a matching occurs, a signal corresponding to the matched phoneme is produced and utilized to cause the phoneme to be printed. In one embodiment of the invention, the speech signals are produced by the earphone of a standard telephone headset.

60 citations


Proceedings ArticleDOI
10 Apr 1978
TL;DR: An excitation source model for speech compression and synthesis is presented, which allows for a degree of voicing by mixing voiced (pulse) and unvoiced (noise) excitations in a frequency-selective manner.
Abstract: This paper presents an excitation source model for speech compression and synthesis, which allows for a degree of voicing by mixing voiced (pulse) and unvoiced (noise) excitations in a frequency-selective manner. The mix is achieved by dividing the speech spectrum into two regions, with the pulse source exciting the low-frequency region and the noise source exciting the high-frequency region. A parameter F c determines the degree of voicing by specifying the cut-off frequency between the voiced and unvoiced regions. For speech compression applications, F c can be extracted automatically from the speech spectrum and transmitted. Experiments using the new model indicate its power in synthesizing natural sounding voiced fricatives, and in largely eliminating the "buzzy" quality of vocoded speech. A functional definition of buzziness and naturalness is given in terms of the model.

58 citations


PatentDOI
TL;DR: Using a parameter interpolator permits the data rate to the speech synthesis circuit to be lowered inasmuch as the incoming speech data is used to slowly charge the data previously inputted to the values of the incoming data.
Abstract: Disclosed is a parameter interpolator for a speech synthesis circuit. Using a parameter interpolator permits the data rate to the speech synthesis circuit to be lowered inasmuch as the incoming speech data is used to slowly charge the data previously inputted to the values of the incoming data. The speech synthesis circuit includes an input circuit for receiving the target values of the speech data and a memory for stored interpolated values of the speech data. The interpolator includes a circuit coupled to the input circuit and the memory which calculates the difference between the target values and the stored values. Another circuit is used to add a portion of the difference to the values stored in the memory; the particular portion of the difference is equal to 1/2N where N=0, 1, 2 . . . Further, the interpolator is arranged to inhibit the normal interpolation upon certain conditions, such as changes from voiced speech to unvoiced speech, and visa versa.

45 citations


Journal ArticleDOI
TL;DR: This paper shows the utility of using adaptive quantizers in the tree-encoding of speech waveforms based on the ( M, L ) algorithm, which can provide useful speech outputs at bit rates in the order of 24 kbits/s.
Abstract: This paper shows the utility of using adaptive quantizers in the tree-encoding of speech waveforms based on the ( M, L ) algorithm [1]. Resulting adaptive differential PCM (ADPCM) and adaptive delta modulation (ADM) encoders, with time-invariant prediction networks, can provide useful speech outputs at bit rates in the order of 24 kbits/s; at 16 kbits/s, on the other hand, the encoders exhibit clearly perceptible amounts of quantization noise.

41 citations


Journal ArticleDOI
TL;DR: This paper proposes two dynamic-type speech detectors based on the same operational principle: the presence of the speech signal is detected by analyzing the dynamic variations of the short-time-power of the channel signal.
Abstract: This paper proposes two dynamic-type speech detectors; their performances are described also by means of in-field experimental results. The two detectors are based on the same operational principle: the presence of the speech signal is detected by analyzing the dynamic variations of the short-time-power of the channel signal.

Journal ArticleDOI
TL;DR: A new method of digitising speech waveforms is described, based on the comparison of successive segments of the waveform with a suitably stored catalogue of possible distinct shapes.
Abstract: A new method of digitising speech waveforms is described, based on the comparison of successive segments of the waveform with a suitably stored catalogue of possible distinct shapes.

Proceedings ArticleDOI
D. Esteban1, C. Galand
01 Apr 1978
TL;DR: It has been observed that transparency and apparent channel performances are not affected and that the behavior of the proposed 32 kbps coder is considerably less affected by transmission errors than the 64 kbps.
Abstract: This paper deals with the application of the SVCS (Split band Voice Coding Scheme) concept to the coding of a PCM channel at half the rate of the presently used 64 kbps CCITT standard. This 32 kbps coder is shown to meet the specifications as recommended by the CCITT for a PCM channel operating at 64 kbps (8kHz sampling, 8 bits/sample A-Law). In addition, channel performances have been evaluated with and without transmission errors for different types of signals ranging from tones and modem signals to voice signals. It has been observed that transparency and apparent channel performances are not affected and that the behavior of the proposed 32 kbps is considerably less affected by transmission errors than the 64 kbps. Examples of taped results assuming different error rates on a voice signal with both the 32 kbps SVCS and the 64 kbps A-Law will be played at the conference.

Proceedings ArticleDOI
01 Apr 1978
TL;DR: Overall subjective quality of speech processed by adaptive differential PCM is well predicted by segmental signal-to-noise ratio and even better by a linear combination of measures of granular distortion and overload distortion.
Abstract: An experiment has been performed to study the perceptual characteristics of speech processed by ADPCM. We created 18 three-bit and four-bit coders spanning a wide range of quantizer adaptation parameters. Subjects judged the difference between each pair of coders and rated the quality of each coder individually. The difference data reveal three important perceptual dimensions (overall clarity, signal vs. background distortion, muffled vs. hoarse) which are related to various objective measures of coder performance. Overall subjective quality is well predicted by segmental SNR and even better by a linear combination of measures of granular distortion and overload distortion.

Proceedings ArticleDOI
01 Apr 1978
TL;DR: This coding scheme, in addition to the baseband excitation concepts, takes advantage of the association of recently published digital speech processing techniques such that transversal predictive coding, splitband coding by signal decimation/interpolation and adaptive block quantization.
Abstract: This paper describes a common voice coding architecture based on a Voice Excited Predictive Coding (VEPC) scheme allowing operation at different bit rates : 9600, 7200 bps or below by simply modifying the bandwidth allocated to the coding of the baseband excitation signal. This coding scheme, in addition to the baseband excitation concepts, takes advantage of the association of recently published digital speech processing techniques such that transversal predictive coding, splitband coding by signal decimation/interpolation and adaptive block quantization. Simulations have shown that the proposed architecture allows to obtain a 'standard telephone quality' assuming a 300-3400 Hz telephone bandwidth at transmission rates below 9600 bps.

Proceedings ArticleDOI
10 Apr 1978
TL;DR: It is demonstrated that it is possible to achieve pattern recognition classification with much less computational effort by adopting a scheme based on the concept of variable decision space, using only three features and by avoiding the time consuming linear prediction analysis.
Abstract: A pattern recognition approach for deciding whether a given segment of speech should be classified as voiced speech, unvoiced speech or silence based on a set of five measurements of the signal is given by Atal and Rabiner [1]. In this paper, we demonstrate that it is possible to achieve this classification with much less computational effort. These computational savings are mainly achieved by adopting a scheme based on the concept of variable decision space, using only three features and by avoiding the time consuming linear prediction analysis.

Journal ArticleDOI
TL;DR: Weighted digital modulation schemes which provide bit error probabilities matched to the PCM bits with respect to their sensitivity to digital errors are analyzed and a channel signal to noise ratio gain in threshold extension of 2 dB is obtained for standard 8 bit PCM.
Abstract: Weighted digital modulation schemes which provide bit error probabilities matched to the PCM bits with respect to their sensitivity to digital errors are analyzed. The channel is additive, white Gaussian. The PCM system has arbitrary code, companding law and input signal density function. Especially optimum weighted PSK/PCM and QAM/PCM are given for speech signals. The average channel signal to noise ratio is kept constant when schemes are compared. We obtain a channel signal to noise ratio gain in threshold extension of 2 dB for standard 8 bit PCM. The performance of suboptimum schemes, where the number of different bit error probability levels are smaller than the number of PCM bits are also studied. Two levels per 8 bit PCM word yield more than half of the achievable gain (in dB) and 4 levels is almost equal to optimum.

15 Dec 1978
TL;DR: The usefulness of the new approach for speech modeling has been successfully established after several parameter quantization methods were considered to achieve the desired low bit rates.
Abstract: : This constitutes our final report on a research program aimed at the development of a high quality low data rate speech transmission system based on new types of speech modeling algorithms. Several such algorithms were developed and tested on simulated and real speech data. These algorithms have many desirable features including the capability of rapidly tracking time-varying model parameters. The best algorithm was used as the basis of a speech transmission system in order to test the quality of the speech models. The model parameters (reflection coefficients) together with pitch information and speech energy form a speech parameter vector to be transmitted and used to reconstruct the original speech. Several parameter quantization methods were considered to achieve the desired low bit rates. The various algorithms as well as the complete transmission system were coded and tested. Simulation results are very promising and the usefulness of our new approach for speech modeling has been successfully established. (Author)

Journal ArticleDOI
M. Orceyre1, R. Heller
TL;DR: The matter of secure voice communication-enabling speakers to converse naturally over telephone media without fear that their conversation can be usefully intercepted-poses special problems and is receiving close attention within both the commercial and the Government sectors.
Abstract: Telephone communications have been understood from their beginnings to be vulnerable to interception (unauthorized reception). In recent years, with increasing public and private sector reliance upon electronic media for communicating sensitive technical, financia’l, military, political, economic, and personal information, and with the rapidly increasing use of microwave and satellite telephone carrier media, concern about these vulnerabilities .has mounted dramatically. Starting in mid-1977 there has been considerable attention given in the news media to the matter of wholesale interception by foreign governments of American private and commercial voice and data communications. Publicly available documents note he ase with which such ommon carrier transmissions can be “captured” for subsequent analysis arid use by unauthorized listeners. Fig. 1 illustrates the many vulnerabilities of a typical public switched telephone network. Within this broad framework, the matter of secure voice communication-enabling speakers to converse naturally over telephone media without fear that their conversation can be usefully intercepted-poses special problems and is receiving close attention within both the commercial and the Government sectors.

Journal ArticleDOI
C.-E. Sundberg1
TL;DR: This work has analyzed soft decision demodulation schemes for standard PCM encoded speech signals transmitted over the Gaussian channel with coherent PSK (phase shift keying) and obtained a signal to noise ratio gain in E_{b}/N_{0} of the order of 1-2 dB.
Abstract: The effect of digital errors in PCM encoded speech signals transmitted over a noisy channel is reduced by using soft decision demodulation at the receiver. The reliability information supplied by the soft decision demodulator is used to point out likely transmission errors, especially in the most significant PCM bits. When a likely transmission error is identified, the corresponding PCM word is rejected by the receiver and replaced by a predictor estimate or an interpolation estimate if delayed decisions are used. We have analyzed soft decision demodulation schemes for standard PCM encoded speech signals transmitted over the Gaussian channel with coherent PSK (phase shift keying). A signal to noise ratio gain in E_{b}/N_{0} of the order of 1-2 dB is Obtained at low input signal levels. The gain depends on the performance of the predictor or, alternatively, the interpolator. No modifications of the transmitter are required to obtain this improvement. The suggested soft decision schemes are optional at the receiver. The comparisons are made with hard decision demodulation.

Journal ArticleDOI
TL;DR: Waveforms of the response of the delta modulators to channel errors are given and performance data that shows the relationship between channel errors and word intelligibility are included.
Abstract: Two algorithms for voice encoding are described. One, the modified Abate, is a simplified version that is the type designed for the Space Shuttle. Waveforms of the response of the delta modulators to channel errors are given and performance data that shows the relationship between channel errors and word intelligibility are included. An analytic derivation yielding a comparison between PCM and adaptive delta modulation with respect to channel errors is also given.

Journal ArticleDOI
TL;DR: An adaptive coding system capable of representing audio frequency signals in a wide variety of digital code formats and sine wave signal-to-noise ratio measurements demonstrate important relationships among the main code classifications.
Abstract: We describe an adaptive coding system capable of representing audio frequency signals in a wide variety of digital code formats. To provide adaptive quantization, the step size is adjusted by voltage controlled amplifiers operating with analog control signals. Sine wave signal-to-noise ratio measurements demonstrate important relationships among the main code classifications: pcm, apcm, dpcm, adpcm. The multipurpose coder has proved a useful laboratory aid in the design of special purpose coders. After adjustment and evaluation of parameters using the multipurpose coder one proceeds to an economical, specialized design.

PatentDOI
TL;DR: In this article, a method of communicating Digital Speech Data to a speech synthesis circuit is described. But the data is stored in a memory which is coupled to the speech synthesis circuits.
Abstract: A method of communicating Digital Speech Data to a speech synthesis circuit. The data is compressed to on the order of 1000-1200 bits, per second for normal human speech. The speech synthesis circuit utilizes linear predictive coding techniques for producing high quality speech or other sounds. The data is preferably stored in a memory which is coupled to the speech synthesis circuit. The data has variable frame lengths; in the disclosed embodiment, four different frame lengths are described having frame lengths from four bits to forty-nine bits. The memory stores the variable frame length data and communicates the same to the speech synthesis circuit in response to certain control signals.

Proceedings ArticleDOI
10 Apr 1978
TL;DR: This paper describes a method of speech coding in a high ambient noise environment and shows that the spectral envelope of speech signal is a most reliable information when the noise reduction method proposed in this paper is used.
Abstract: Preservation of both the spectral distribution and the periodicity of speech signals are essential in speech processing. This paper describes a method of speech coding in a high ambient noise environment and shows that the spectral envelope of speech signal is a most reliable information when the noise reduction method proposed in this paper is used. Also reported in this paper comparisons of several pitch extraction methods with extensive experimental data, based on which a pitch extraction method suited for noisy speech signals is proposed.

Dissertation
01 Aug 1978
TL;DR: This paper presents a meta-modelling system that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of computer programming called “ CAD/CAM”.
Abstract: Thesis. 1978. Sc.D.--Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.

Proceedings ArticleDOI
01 Apr 1978
TL;DR: The pitch predictor is not useful on balance and should be eliminated, and the residual should be quantized with no clipping and encoded using a variable-length code, which seems to be adequate for all speech and all conditions.
Abstract: We report on the results of research to code speech at 16 kbps under the condition that the quality of the transmitted speech be equal to that of the original. Some of the original speech had been corrupted by noise and distortions typical of long distance telephone lines. The rigorous requirements of this work led to a new outlook on adaptive predictive coding. We have found that the pitch predictor is not useful on balance and should be eliminated, and that the residual should be quantized with no clipping and encoded using a variable-length code. A single coding scheme seems to be adequate for all speech and all conditions. In addition, the adaptive predictive coding system has been modified to include a noise spectral shaping filter that effectively eliminates the perception of background granular noise.

Journal ArticleDOI
TL;DR: An all digital system, labeled PCM.RR is presented, which enables the doubling of traffic capacity of PCM links, by properly using "Adaptive Quantization and Speech Interpolation" performed by means of a "Speech Detector" that works directly on the A -law compressed digital signal.
Abstract: An all digital system, labeled PCM.RR. is presented, which enables the doubling of traffic capacity of PCM links. This is obtained, although keeping the transmission quality impairment very close to the normal PCM standards, by properly using "Adaptive Quantization" and "Speech Interpolation" performed by means of a "Speech Detector" that works directly on the A -law compressed digital signal.

Patent
14 Mar 1978
TL;DR: In this paper, the frequency range of each speech channel is broken into sub-channels and each of these is considered separately for operational activity, and composite speech signals are then formed from the active frequency subchannels of individual speech channels and these are transmitted with coding signals indicative of their composition.
Abstract: To transmit a number of individual speech channels over a smaller number of transmission channels, the frequency range of each speech channel is broken into sub-channels and each of these is considered separately for operational activity. Composite speech signals are then formed from the active frequency sub-channels of the individual speech channels and these are transmitted with coding signals indicative of their composition.

Proceedings ArticleDOI
01 Apr 1978
TL;DR: Preliminary tests indicate that the proposed linear mean square adaptive filtering approach improves the perceived speech quality and increases the signal to noise ratio (SNR) by 7 db in a 0 db environment.
Abstract: A linear mean square (LMS) adaptive filtering approach has been formulated for removing the deleterious effects of additive noise on the speech signal; Unlike the classical LMS adaptive filtering scheme, the proposed method is designed to cancel out the clean true speech signal. This method takes advantage of the quasi-periodic nature of the speech signal to form an estimate of the clean speech signal at time t from the value of the signal at time t minus the estimated pitch period. For additive white noise distortion, preliminary tests indicate that the method improves the perceived speech quality and increases the signal to noise ratio (SNR) by 7 db in a 0 db environment. The method has also been preliminarily shown to remove the perceived granularity of CVSD coded speech signals and to lead to an improvement in the linear prediction analysis/synthesis of noisy speech.

Journal ArticleDOI
TL;DR: The basic aspects of the design of these four operations particularly as they apply to low bit‐rate adaptive transform coding are reviewed.
Abstract: Frequency domain techniques for speech coding have recently received considerable attention. The basic concept of these methods is to divide the speech into frequency components by a filter bank (subband coding) or by a suitable transform (transform coding) and then encode them using adaptive PCM. Four basic operations are involved in the design of these coders: (1) the type of transform or filter bank (analysis/synthesis), (2) the adaptive quantizer design (quantization theory), (3) the choice of bit allocation used by the quantizers (noise shaping and auditory masking), and (4) the control of the step‐size of the quantizers (spectral estimation). This paper briefly reviews the basic aspects of the design of these four operations particularly as they apply to low bit‐rate adaptive transform coding.