scispace - formally typeset
Search or ask a question

Showing papers on "Speech coding published in 1975"


Journal ArticleDOI
01 Apr 1975
TL;DR: This paper presents several digital signal processing methods for representing speech, including simple waveform coding methods; time domain techniques; frequency domain representations; nonlinear or homomorphic methods; and finaIly linear predictive coding techniques.
Abstract: This paper presents several digital signal processing methods for representing speech. Included among the representations are simple waveform coding methods; time domain techniques; frequency domain representations; nonlinear or homomorphic methods; and finaIly linear predictive coding techniques. The advantages and disadvantages of each of these representations for various speech processing applications are discussed.

238 citations


Journal ArticleDOI
TL;DR: Improvements in excess of analytical estimates suggest that tree coding methods perform better with real-life sources than previously thought.
Abstract: Recently developed methods of tree source coding with a fidelity criterion are applied to speech coding. We first demonstrate that tree codes are inherent in A-D speech convertors of the waveform following type and point to ordinary and adaptive delta modulation and differential pulse code modulation (DPCM) as examples. Insights of coding theory improve these trees at low rates; we offer two new code classes, one obtained by smoothing the DPCM tree and one using the rate-distortion theory of autoregressive sources. Using these codes, we study the performance of a simple synchronous tree searching algorithm called the M -algorithm that maintains a small fixed number of paths in contention. 1 and 2 bit/sample code trees, used to encode actual speech at 8, 10, and 16 kbits/s, yield improved dynamic range and channel error resistance, and 4-8 dB improvement in mean-square error (mse) over ordinary single-path searched DPCM. These improvements in excess of analytical estimates suggest that tree coding methods perform better with real-life sources than previously thought.

143 citations


Journal ArticleDOI
Chong Un1, D. Magill
TL;DR: The concept of the RELP vocoder combines the advantages of linear predictive coding (LPC) and voice-excited vocoding and is robust in any operating environment.
Abstract: In this paper we present a new vocoder called the residual-excited linear prediction (RELP) vocoder. The concept of the RELP vocoder combines the advantages of linear predictive coding (LPC) and voice-excited vocoding. In the RELP system, vocal tract modeling is done by the LPC technique, and the LPC residual signal is used as the excitation signal. After low-pass filtering the residual signal is coded by adaptive delta modulation and is spectrally flattened before being fed in the LPC synthesizer. The range of the transmission rate is typically between 6 and 9.6 kbits/s; the synthetic speech in this range is quite good. As the transmission rate is lowered, the synthetic speech quality degrades very gradually. Since no pitch extraction is required, the vocoder is robust in any operating environment.

92 citations


Journal ArticleDOI
TL;DR: An improved system for speech digitization using adaptive differential pulse-code modulation (ADPCM) is described, which uses an adaptive predictor, an adaptive quantizer, and a variable length source coding scheme to achieve a 4-5 dB increase in signal-to-noise ratio over previous ADPCM.
Abstract: An improved system for speech digitization using adaptive differential pulse-code modulation (ADPCM) is described. The system uses an adaptive predictor, an adaptive quantizer, and a variable length source coding scheme to achieve a 4-5 dB increase in signal-to-noise ratio over previous ADPCM. The increase can be used to improve speech quality at moderate data rates on the order of 16 kbits/s or to retain the same quality and reduce the data rate to 9.6 kbits/s. The latter alternative permits the use of narrow-band channels. The implementation complexity is on the same order as other ADPCM systems.

63 citations


Patent
07 Jul 1975
TL;DR: In this article, a plurality of specialized digital signal processing techniques are employed to analyze in real time four speech channels in parallel and multiplex speech frame parameters of the channels into a single data output channel for transmission through a suitable media.
Abstract: Method and apparatus for speech analysis and synthesis adapted for analyzing and multiplexing speech signals from a plurality of voice grade telephone lines for further transmission through a single voice grade telephone line. A plurality of specialized digital signal processing techniques are employed to analyze in real time four speech channels in parallel and multiplex speech frame parameters of the channels into a single data output channel for transmission through a suitable media. The received data channel is demultiplexed and the speech frame parameters for the individual channels are utilized to synthesize, in parallel, the four speech signals. Certain of the digital processing techniques utilize the characteristics of speech signals to truncate conventional signal processing time while other processing techniques are substantially statistical analyses of speech to resolve ambiguities, particularly in making the voiced/unvoiced decision for a frame of analyzed speech data.

60 citations


Journal ArticleDOI
TL;DR: The performance limits, as given by the signal-to-noise ratio (s/n), are described for different speech-encoding schemes including adaptive quantization and (linear) adaptive prediction schemes.
Abstract: In this paper, the performance limits, as given by the signal-to-noise ratio (s/n), are described for different speech-encoding schemes including adaptive quantization and (linear) adaptive prediction schemes. The comparison is made on the basis of computer simulations using 8-kHz-sampled speech signals of one speaker. Different bit rates (two bits per sample–five bits per sample) have been used. A three-bit-per-sample pcm scheme with a nonadaptive μ100 quantizer leads to an s/n value of approximately 9 dB. A maximum s/n value of approximately 25 dB has been reached using an encoding scheme including both adaptive quantization and adaptive prediction. Entropy coding of the quantizer output symbols leads to an additional gain in s/n of nearly 3 dB.

59 citations


Journal ArticleDOI
TL;DR: The signal-to-noise ratios of different speech-encoding schemes have been measured in the case where the channel contains errors and upper bounds of the improvements that can be reached with error protection of the most significant bits are included.
Abstract: The signal-to-noise ratios of different speech-encoding schemes have been measured in the case where the channel contains errors. Those types and probabilities of errors have been considered that are of interest for mobile telephone applications. Most encoding schemes use an adaptive three-bit quantizer with an explicit transmission of the step-size information. A scheme with an adaptive prediction algorithm has also been studied. It has been assumed in all cases that the side information about the quantizer step size and the predictor coefficients is transmitted in an error-protected format. Measurements were made by simulating the coding schemes and the noisy channel on a digital computer. The results include upper bounds of the improvements that can be reached with error protection of the most significant bits.

26 citations



Journal ArticleDOI
TL;DR: In this paper, a mathematical formulation for each of several zero-crossing feature extraction techniques is derived and related (where possible) to each of the other zero-Crossing methods.
Abstract: Zero-crossing analysis techniques have long been applied to speech analysis, to automatic speech recognition, and to many other signal-processing and pattern-recognition tasks. In this paper, a mathematical formulation for each of several zero-crossing feature extraction techniques is derived and related (where possible) to each of the other zero-crossing methods. Based upon this mathematical formulation, a physical interpretation of each analysis technique is effected, as is a discussion of the properties of each method. It is shown that four of these methods are a description of a short-time waveform in which essentially the same information is preserved. Each turns out to be a particular normalization of a count of zero-crossing intervals method. The effects of the various forms of normalization are discussed. A fifth method is shown to be a different type of measure; one which preserves information concerning the duration of zero-crossing intervals rather than their absolute number. Although reference is made as to how each of the zero-crossing methods has been applied to automatic speech recognition, an attempt is made to enumerate general characteristics of each of the techniques so as to make the mathematical analysis generally applicable.

25 citations


Patent
14 Nov 1975
TL;DR: In this paper, a speech analyzer and synthesizer features a digital adaptive linear predictor, using a recursive (rather than transversal) filter in a negative feedback loop which develops both feedforward and feedback filter coefficients.
Abstract: A speech analyzer and synthesizer features a digital adaptive linear predictor, using a recursive (rather than transversal) filter in a negative feedback loop which develops both feedforward and feedback filter coefficients. An input circuit is responsive to an input speech signal and to a first synthesized speech signal for developing an error signal. An output circuit is responsive to the error signal and to first state signals for developing multiplexed speech data signals. The multiplexed speech data signals are fed back, demultiplexed and applied to a first recursive filter to control the development of the first synthesized speech signal and the first state signals by the first recursive filter. The multiplexed speech data signals from the output circuit are also transmitted to a receiver which demultiplexes and applies the demultiplexed received speech data signals to a second recursive filter to control the development of a second synthesized speech signal by the second recursive filter. This second synthesized speech signal is then converted into an output speech signal which substantially sounds like the input speech signal.

20 citations


Patent
07 Oct 1975
TL;DR: In this paper, a plurality of speech channels uses only one speech analyzer-synthesizer by Time-Multiplexing-Demultiplexing (sampling and processing) sampling and processing the speech channels sequentially.
Abstract: A plurality of speech channels uses only one speech analyzer-synthesizer by Time-Multiplexing-Demultiplexing (sampling and processing) the speech channels sequentially. On the transmission side, speech signals of a plurality of channels are multiplexed by a pulse code modulation system, a partial autocorrelation coefficient and an excitation signal which constitute a feature parameter of the speech are extracted from the multiplexed signals by means of a speech analyzer for respective digital outputs corresponding to respective speech signals, and the extracted feature parameter is multiplexed again and then transmitted to the receiving side. On the receiving side, the received multiplexed signal of the feature parameter is applied to a speech synthesizer on a time division basis for reproducing a multiplex pulse code modulation signal of the speech wave, and the reproduced signal is distributed among respective channels.

Journal ArticleDOI
TL;DR: The extraction of acoustic cues pertinent to a phonetic feature can be tuned to classes of sounds separated on the basis of other cues, and this serves to increase the reliability of segment labeling.
Abstract: This paper considers a sequential strategy for acoustic-phonetic speech analysis. Each analysis process is applied to an appropriately labeled speech segment and results in a possible sub-segmentation of the original segment. The segments resulting from the analysis are labeled according to the analysis results. The advantages of the strategy are that no more segments are considered than those actually differentiated by the analysis steps. The extraction of acoustic cues pertinent to a phonetic feature can be tuned to classes of sounds separated on the basis of other cues, and this serves to increase the reliability of segment labeling. The analysis sequence yields a structure for the syllabic units of the speech signal that may be used to retrieve similar syllabic units for detailed comparison.

Journal ArticleDOI
TL;DR: A real-time computer implementaation of a fourth-order adaptive predictive coder (APC) which transmits speech at 6400 bits/s and achieves an intelligibility score of 87 percent on the diagnostic rhyme tests (DRT) and produces speech with acceptable voice naturalness even in the presence of acoustic background noise.
Abstract: This paper describes a real-time computer implementaation of a fourth-order adaptive predictive coder (APC) which transmits speech at 6400 bits/s. To reduce the demands on the central processor, the terminal uses the average magnitude difference function (AMDF) for pitch extraction and calculates the predictor coefficients and reflection coefficients assuming stationarity of the input data. With 16-bit fixed-point arithmetic processing, the coder achieves an intelligibility score of 87 percent on the diagnostic rhyme tests (DRT) and produces speech with acceptable voice naturalness even in the presence of acoustic background noise.

Journal ArticleDOI
M. Knudsen1
TL;DR: The present program converts frames of 256 16-bit samples into 14 coefficients and then into 128 points of logarithmic power spectrum at 100 frames/s and introduces a new autocorrelation scheme with several valuable properties.
Abstract: The autocorrelation method for linear-predictive coding of speech [1] has been implemented in real time on the SPS-41, a commercially available system composed of three dissimilar microprocessors working in parallel. Using user-written microcode, one processor performs I/O and master control, the second handles loop indexing and counting, and the third does the actual arithmetic on data. Such parallelism allows 2 × 106I/O operations and 4 × 106multiplications/s, but actually realizing this potential requires fresh approaches to some old algorithms. Most important is a new autocorrelation scheme with several valuable properties. Using 16-bit fixed-point single-precision arithmetic to accumulate autocorrelation sums and invert the autocorrelation matrix presents problems which have been solved reasonably well. The present program converts frames of 256 16-bit samples into 14 coefficients and then into 128 points of logarithmic power spectrum at 100 frames/s.


25 Aug 1975
TL;DR: An ultra-high performance programmable speech processor, consisting in the main of a custom designed 55-nsec microcomputer, has been designed and constructed.
Abstract: : An ultra-high performance programmable speech processor, consisting in the main of a custom designed 55-nsec microcomputer, has been designed and constructed. To date, five real-time speech compression programs have been implemented and evaluated.

Journal ArticleDOI
R. Becker1, F. Poza1
TL;DR: This paper describes the acoustic processing in a syntactically guided natural language speech understanding system that through the interaction of pragmatic, semantic, and syntactic information, candidate words are proposed to exist at specific points in the acoustic stream.
Abstract: This paper describes the acoustic processing in a syntactically guided natural language speech understanding system. A major characteristic of this system is that through the interaction of pragmatic, semantic, and syntactic information, candidate words are proposed to exist at specific points in the acoustic stream. The purpose of the acoustic processor is to verify or reject these hypotheses. This verification is done in two stages. First, digital filtering is done to classify each 10-ms segment as one of ten primitive classes. If the proposed word is consistent with the pattern of primitive classes at the corresponding point in the acoustic stream, further analysis is done using linear predictive coding and other digital filters. The results of this analysis are used to segment the acoustic signal and to further classify the voiced segments. Because this segmentation and classification can be tailored for each word, difficult analysis problems caused by coarticulation between adjacent sounds can be successfully solved. When combined with a sophisticated linguistic processor, these acoustical processing methods yield correct understanding of natural language utterances.

01 Mar 1975
TL;DR: The objective of the work described in this report was to make it less tiring to monitor speech at low signal-to-noise ratios over long periods of time by selecting a method that intrinsically is independent of the spectrum characteristics of the communication channel or tape being monitored.
Abstract: : The objective of the work described in this report was to make it less tiring to monitor speech at low signal-to-noise ratios over long periods of time Two approaches to reducing auditory fatigue were studied: (1) automatic detection of speech and (2) automatic enhancement of the S/N of speech The first approach was aimed at reducing the amount of time spent in simply listening for speech to occur After examining several methods of detecting speech, we selected a method that intrinsically is independent of the spectrum characteristics of the communication channel or tape being monitored, of the speech characteristics of the talker, and of the language The technique proved to be capable of detecting speech in wideband noise at an S/N of -6 dB Its major disadvantage appears to be that the complexity of the required computations demands the use of a computer to implement the method

Journal ArticleDOI
TL;DR: A well-known, real-time computational algorithm for use in adaptive linear prediction of speech waveforms is discussed and is related to known research in other fields.
Abstract: A well-known, real-time computational algorithm for use in adaptive linear prediction of speech waveforms is discussed and is related to known research in other fields.

Journal ArticleDOI
TL;DR: Several versions of TRIVOC processing are presented, together with some results of subjective assessment of performance, which confirm the superior speech quality attainable by this method.
Abstract: The first‐formant region of the speech spectrum must be characterized with a high degree of accuracy in order to attain good speech quality and naturalness in a narrow‐band speech processing system. This consideration led to the concept of using linear predictive coding for the first‐formant region, vocoder channels for the upper spectrum, and detection and coding of the excitation function, to establish a narrow‐band voice digitizer. This concept has advantages in the tradeoff of speech quality versus data rate, in comparison with the use of either channel vocoding or linear prediction alone. Such a triple‐function voice coder (TRIVOC) has been implemented and investigated using a CSP‐30 Digital Signal Processor. Processing algorithms have been established to permit comparison of various combinations of baseband and vocoder channels, LPC coefficients, and data rates ranging from 2400 to 4800 bits per second. Subjective results have confirmed the superior speech quality attainable by this method. Several versions of TRIVOC processing are presented, together with some results of subjective assessment of performance.

Proceedings ArticleDOI
01 Dec 1975
TL;DR: Both differential pulse code modulation (DPCM) and adaptive predictive coding (APC) systems have been used somewhat successfully for low data rate digital voice transmission.
Abstract: Both differential pulse code modulation (DPCM) and adaptive predictive coding (APC) systems have been used somewhat successfully for low data rate digital voice transmission. A DPCM system has as its goal the removal of signal redundancy prior to transmission by a linear prediction of the incoming signal with a weighted combination of past signal estimates. The error in the prediction process is then quantized and transmitted to the receiver. An identical prediction loop is used at the receiver to reinsert the redundancy, and hence, to reconstruct the speech signal. Both the quantizer and predictor may or may not be adaptive.

Journal ArticleDOI
TL;DR: Methods by which one high-speed DM codec can be timeshared over a large number of channels are described and step-size adaptation determined by the recent past history of the coded digital signal is described.
Abstract: Current practice for multichannel delta modulation (DM) terminals uses one DM codec per channel-end. This paper describes methods by which one high-speed DM codec can be timeshared over a large number of channels. The methods are also applied to multichannel differential PCM (DPCM) terminals. In both cases the time-shared codecs include step-size adaptation determined by the recent past history of the coded digital signal. A method for digital conversions between multichannel linear PCM and adaptive DPCM (ADPCM) formats is also described.

ReportDOI
10 Jan 1975
TL;DR: This study, using as a research vehicle the homomorphic vocoder algorithm, is directed toward determining the time resolution and the frequency resolution required to faithfully reproduce a speech signal.
Abstract: : A problem basic to the development of all digital telecommunication systems is the efficient digital encoding of speech signals for transmission. Many speech digitization algorithms have been proposed. This study, using as a research vehicle the homomorphic vocoder algorithm, is directed toward determining the time resolution and the frequency resolution required to faithfully reproduce a speech signal. The results reported are fundamental in that they are applicable to any speech digitization algorithm. The resolutions required for typical speaker situations are summarized.