scispace - formally typeset
Search or ask a question

Showing papers on "Speech coding published in 1980"


Journal ArticleDOI
TL;DR: An efficient and intuitive algorithm is presented for the design of vector quantizers based either on a known probabilistic model or on a long training sequence of data.
Abstract: An efficient and intuitive algorithm is presented for the design of vector quantizers based either on a known probabilistic model or on a long training sequence of data. The basic properties of the algorithm are discussed and demonstrated by examples. Quite general distortion measures and long blocklengths are allowed, as exemplified by the design of parameter vector quantizers of ten-dimensional vectors arising in Linear Predictive Coded (LPC) speech compression with a complicated distortion measure arising in LPC analysis that does not depend only on the error vector.

7,935 citations


Journal ArticleDOI
TL;DR: In this paper, a spectral decomposition of a frame of noisy speech is used to attenuate a particular spectral line depending on how much the measured speech plus noise power exceeds an estimate of the background noise.
Abstract: One way of enhancing speech in an additive acoustic noise environment is to perform a spectral decomposition of a frame of noisy speech and to attenuate a particular spectral line depending on how much the measured speech plus noise power exceeds an estimate of the background noise. Using a two-state model for the speech event (speech absent or speech present) and using the maximum likelihood estimator of the magnitude of the speech spectrum results in a new class of suppression curves which permits a tradeoff of noise suppression against speech distortion. The algorithm has been implemented in real time in the time domain, exploiting the structure of the channel vocoder. Extensive testing has shown that the noise can be made imperceptible by proper choice of the suppression factor.

854 citations


Journal ArticleDOI
TL;DR: The vector quantizing approach is shown to be a mathematically and computationally tractable method which builds upon knowledge obtained in linear prediction analysis studies and is introduced in a nonrigorous form.
Abstract: With rare exception, all presently available narrow-band speech coding systems implement scalar quantization (independent quantization) of the transmission parameters (such as reflection coefficients or transformed reflection coefficients in LPC systems). This paper presents a new approach called vector quantization. For very low data rates, realistic experiments have shown that vector quantization can achieve a given level of average distortion with 15 to 20 fewer bits/frame than that required for the optimized scalar quantizing approaches presently in use. The vector quantizing approach is shown to be a mathematically and computationally tractable method which builds upon knowledge obtained in linear prediction analysis studies. This paper introduces the theory in a nonrigorous form, along with practical results to date and an extensive list of research topics for this new area of speech coding.

754 citations


Journal ArticleDOI
TL;DR: It is argued that the Itakura-Saito and related distortions are well-suited computationally, mathematically, and intuitively for such applications.
Abstract: Several properties, interrelations, and interpretations are developed for various speech spectral distortion measures. The principle results are 1) the development of notions of relative strength and equivalence of the various distortion measures both in a mathematical sense corresponding to subjective equivalence and in a coding sense when used in minimum distortion or nearest neighbor speech processing systems; 2) the demonstration that the Itakura-Saito and related distortion measures possess a property similar to the triangle inequality when used in nearest neighbor systems such as quantization and cluster analysis; and 3) that the Itakura-Saito and normalized model distortion measures yield efficient computation algorithms for generalized centroids or minimum distortion points of groups or clusters of speech frames, an important computation in both classical cluster analysis techniques and in algorithms for optimal quantizer design. We also argue that the Itakura-Saito and related distortions are well-suited computationally, mathematically, and intuitively for such applications.

409 citations


Journal ArticleDOI
01 Apr 1980
TL;DR: The design and analysis of adaptive predictors for differential encoders employing adaptive quantization and adaptive prediction constitute one of the most promising approaches to achieving design objectives of high-quality highly intelligible speech at 6 to 16 kb/s.
Abstract: The design of speech coders that produce high-quality highly intelligible speech at 6 to 16 kb/s while retaining robustness to background and transmission impairments is an area of current research interest Differential encoding structures employing adaptive quantization and adaptive prediction constitute one of the most promising approaches to achieving these design objectives This paper focuses on the design and analysis of adaptive predictors for differential encoders Several differential encoding systems, including adaptive predictive coding, differential pulse-code modulation, noise feedback coding, direct feedback coding, and prediction error coding, are described and related Adaptive quantizers are briefly discussed and quantitative and qualitative indicators of speech coder performance are defined The channel model, the speech model, and the research problem statements used in the design of differential encoders and adaptive predictors are presented The nomenclature and theory of forward and backward adaptive prediction are developed, and several new backward adaptive algorithms based on various assumptions are presented A detailed survey of theoretical and simulation results on adaptive prediction for speech differential encoders is given, and the effects of background and transmission impairments on these systems are discussed, Finally, the impact of adaptive predictors on rate distortion theory motivated coders is indicated Numerous areas for future research are highlighted

139 citations


Journal ArticleDOI
TL;DR: Several control methodologies are described, leading to an end-to end feedback approach that achieves stable operation and efficient utilization of network resources by adaptively matching transmitted voice bit rates to prevailing network conditions.
Abstract: Integrated packet-switched networks have potential for providing improved performance by dynamically sharing transmission bandwidths between various users and user types, but new flow control methods are needed to deal with packetized voice traffic. This paper describes a packet voice flow control concept based on embedded speech coding. Results are presented from a computer simulation study of the technique in the context of a multilink wideband packet speech network. Several control methodologies are described, leading to an end-to end feedback approach that achieves stable operation and efficient utilization of network resources by adaptively matching transmitted voice bit rates to prevailing network conditions. Issues in the design of embedded speech coding algorithms are reviewed and a candidate structure based on channel vocoding principles is presented, along with the subjective results of some preliminary listening tests

125 citations


Proceedings ArticleDOI
B. Atal1, M. Schroeder
01 Apr 1980
TL;DR: This method of quantization not only improves the speech quality by accurate quantization of the predicted residual when its amplitude is large but also allows encoding of the prediction residual at bit rates below 1 bit/sample.
Abstract: Adaptive predictive coding of speech signals at bit rates lower than 10 kbits/sec often requires the use of 2-level (1 bit) quantization of the samples of the prediction residual. Such a coarse quantization of the prediction residual can produce audible quantizing noise in the reproduced speech signal at the receiver. This paper describes a new method of quantization for improving the speech quality. The improvement is obtained by center clipping the prediction residual and by fine quantization of the high-amplitude portions of the prediction residual. The threshold of center clipping is adjusted to provide encoding of the prediction residual at a specified bit rate. This method of quantization not only improves the speech quality by accurate quantization of the prediction residual when its amplitude is large but also allows encoding of the prediction residual at bit rates below 1 bit/sample.

113 citations


Proceedings ArticleDOI
09 Apr 1980
TL;DR: For very low data rates, realistic experiments have shown that vector quantization can achieve a given level of average distortion with fifteen to twenty fewer bits per frame than that required for optimized scalar quantizing approachs presently in use.
Abstract: With rare exception, all presently available narrowband speech coding systems implement scalar quantization (independent quantization) of the transmission parameters (such as reflection coefficients or transformed reflection coefficients in LPC systems). In this paper a new approach called Vector Quantizatlon is presented. For very low data rates, realistic experiments have shown that vector quantization can achieve a given level of average distortion with fifteen to twenty fewer bits per frame than that required for optimized scalar quantizing approachs presently in use.

107 citations


Proceedings ArticleDOI
01 Apr 1980
TL;DR: The development of a digital encoding system designed to exploit the limited detection ability of the auditory system is described, dynamically shaping the encoding error spectrum as a function of the input speech signal, the error is masked by the speech.
Abstract: The development of a digital encoding system designed to exploit the limited detection ability of the auditory system is described. By dynamically shaping the encoding error spectrum as a function of the input speech signal, the error is masked by the speech. Psychoacoustic experiments and results from the literature provide a basis for determining the system parameters that ensure that the error is inaudible. The encoder is a multi-channel system, each channel approximately of critical bandwidth. The input signal is filtered into 17 frequency channels via the quadrature mirror filter technique. Each channel is then coded using block-companding adaptive PCM. For 4.1 kHz bandwidth speech, the differential threshold of the encoding degradation occurs at a bit rate of 34.4 kbps. At 16 kbps, the encoder produces toll quality speech output.

103 citations


DOI
01 Feb 1980
TL;DR: In this article, the authors present a full specification of all the essential features of the JSRU vocoder configuration, with comments on the reasons for the design decisions and reference to supporting research where appropriate.
Abstract: During the period from 1956 to 1966 the UK Government's Joint Speech Research Unit was conducting research into channel vocoders, culminating in a laboratory-built design suitable for evaluation by potential users over digital transmission networks at 2400 bit/s. The success of the basic vocoder design was such that it has since been engineered in various forms for widespread operational use, using different technologies as they have evolved. In view of the JSRU vocoder's continued competitiveness with other narrow-band speech coding techniques, such as linear predictive coding, this paper has been written to give a full specification of all the essential features of the vocoder configuration, with comments on the reasons for the design decisions and reference to supporting research where appropriate. The two most important factors contributing to this vocoder's successful performance are the use of narrow-band single-resonant circuits for the synthesis filters and the use of differential coding between channels in the digitisation process.

77 citations


Journal ArticleDOI
TL;DR: A signal model based more directly upon the phsyics of of speech generation is proposed and implemented and parametric control of the synthesis model is implemented by an adaptive procedure that minimizes the spectral difference between a human speech input and the synthetic output of the model.
Abstract: A traditional model of the speech signal has provided the underpinning of vocoder technology since the inception of analysis/synthesis telephony. The model is a first‐order approximation to human speech generation in which the source of vocal sound and the resonant acoustic system are treated as linear, separable elements. This source‐system model cannot properly account for a number of acoustic factors now known to exist in speech generation. We propose and implement here a signal model based more directly upon the phsyics of of speech generation. We also implement parametric control of the synthesis model by an adaptive procedure that minimizes the spectral difference between a human speech input and the synthetic output of the model.The adapted parameters constitute a low bit‐rate representation of the input human speech. We test a preliminary form of the system by computer simulation and demonstrate that in simple inital trials the signal model is able to adapt in a realistic manner.

Journal ArticleDOI
TL;DR: Recordings from auditory-nerve fibers indicate that several factors are necessary to characterize the response of a single fiber to speech sounds, including the spectrogram and level of the stimulus, the characteristic frequency and threshold of the fiber.
Abstract: Recordings from auditory‐nerve fibers indicate that several factors are necessary to characterize the response of a single fiber to speech sounds. These are the spectrogram and level of the stimulus, the characteristic frequency and threshold of the fiber. The responses of single fibers to tones are seen as changes in the instantaneous discharge rate which can be measured in various ways. Two measures, average rate and synchronization index, have been defined for tone stimuli and have been shown to behave differently under a variety of conditions. The work with tones can lead to simplified displays of activity in model neurons, which can be used to formulate general ideas on speech coding at the nerve level.

Patent
15 Dec 1980
TL;DR: Speech compaction/replay apparatus for real-time monitoring speech and filtering out periods of relative slence from a recording of the speech are described in this article. But they do not address the problem of time code information.
Abstract: Speech compaction/replay apparatus for real time monitoring speech and filtering out periods of relative slence from a recording of the speech. The recording also containing synchronization and time code information for ensuring that on replay and in terms of real time the audio output will essentially replicates the analog speech input. The apparatus and technique minimizing the amount of storage media required to store the speech.

Patent
14 May 1980
TL;DR: A coding and decoding system for video signals includes means at the transmitter for applying sine wave amplitude modulation of approximate line frequency to the aural carrier to prevent the chrominance subcarrier from providing receiver synchronization as mentioned in this paper.
Abstract: A coding and decoding system for video signals includes means at the transmitter for applying sine wave amplitude modulation of approximate line frequency to the aural carrier to prevent the chrominance subcarrier from providing receiver synchronization.

Book ChapterDOI
01 Jan 1980
TL;DR: This paper surveys sequential filter adaptation techniques and some applications for transversal FIR, lattice and recursive filters, which span a wide spectrum of possible performance/complexity tradeoffs.
Abstract: Over the past few years a number of new adaptive filter algorithms have been developed and applied to meet demands for faster convergence and better tracking properties than earlier techniques could offer Applications include adaptive channel equalization, adaptive predictive speech coding and on-line system identification This paper surveys sequential filter adaptation techniques and some applications for transversal FIR, lattice and recursive filters The available techniques fit into two main categories: (1) gradient-type methods (exemplified by the well-known LMS algorithm) in which successive corrections to adaptive system parameters are only correct in an average sense, and (2) recursive least-squares methods, which continuously provide the solution to a numerical optimization problem, given all the preceding data The available techniques span a wide spectrum of possible performance/complexity tradeoffs

PatentDOI
Jr. Carl J. May1
TL;DR: In this article, a speech detector uses a signal classifier to identify portions of a representation of the average magnitude of a group of signal samples indicative of either speech or noise, and a level estimator uses selectively obtained signal measures from the defined portions of the representation to provide adaptively variable decision levels.
Abstract: A speech detector uses a signal classifier (19) to identify portions of a representation of the average magnitude of a group of signal samples indicative of either speech or noise. A controller (33) in the signal classifier follows a four state sequence using appropriate time constants for signal measures in a variety of signal conditions in defining the speech and noise portions of the representation. A level estimator (21) uses selectively obtained signal measures from the defined portions of the representation to provide adaptively variable decision levels. A speech definer (16) compares the representation to a first decision level and the signal samples to a higher decision level to indicate the occurrence of speech signal activity when either decision level is exceeded. In a two way transmission arrangement, a receive trunk speech detector uses a stretcher (133) to prevent adaptation of the transmit speech detector thresholds when echo signals are present.

Journal ArticleDOI
TL;DR: It is shown that the buffer control can be modeled as a second-order control system and, under adverse parameter settings, the system can be unstable, which may lead to a better understanding of other buffer control problems in variable rate transmission or packet systems.
Abstract: In this paper we examine the use of variable rate coding concepts for TASI and packet speech transmission systems. The paper is divided into three major parts. In the first part, the theoretical performance of variable rate coding is analyzed for multiple user (TASI) applications. Potential gains are experimentally determined from twoparty telephone conversation data for up to 12 shared conversations on a channel. In the second part of the paper, the buffer control mechanism for a dynamic buffer scheme for coupling a variable rate coder to a fixed rate (or slowly varying rate) channel is analyzed. It is shown that the buffer control can be modeled as a second-order control system and, under adverse parameter settings, the system can be unstable. By an appropriate design and parameter setting, the buffer control can be stabilized, The insight developed from this particular buffer control mechanism may also lead to a better understanding of other buffer control problems in variable rate transmission or packet systems. In the third part of the paper, a practical method is analyzed for implementing a variable rate ADPCM system for multiple user applications. Examples of computer simulations of the system are presented.

PatentDOI
TL;DR: In this paper, an improved apparatus for the linear predictive coding of human speech is presented, in which the speech is sampled through the use of analog filters, and the LPC computations are performed with respect to such samples using digital techniques.
Abstract: Improved apparatus for the linear predictive coding of human speech in which the speech is sampled through the use of analog filters and the linear predictive coding computations are performed with respect to such samples using digital techniques. The filters are MOS switched capacitor filters which can be implemented on a silicon chip together with the digital circuitry. Specific circuits for implementing two different linear predictive coding speech analysis techniques are disclosed.

Journal ArticleDOI
Chong Un1, Hyeong Gi Lee1
TL;DR: This paper presents a new method of voiced/unvoiced/ silence discrimination of speech based on the results of counting bit alternations of the bit stream from linear delta modulation of the speech signal and zero crossings of a band-pass filtered output of the decoded LDM signal.
Abstract: This paper presents a new method of voiced/unvoiced/ silence discrimination of speech. The decision algorithm is based on the results of counting bit alternations of the bit stream from linear delta modulation (LDM) of the speech signal and zero crossings of a band-pass filtered output of the decoded LDM signal. Computer simulation of the system with real speech has yielded accurate results. Economical realization of the discriminator hardware using standard integrated circuits is also considered.

Patent
14 May 1980
TL;DR: An audio and video signal coding system in which the normal sync pulses in the horizontal and vertical blanking intervals are suppressed, and clock and control data as well as digital audio data is inserted therein is described in this paper.
Abstract: An audio and video signal coding system in which the normal sync pulses in the horizontal and vertical blanking intervals are suppressed, and clock and control data as well as digital audio data is inserted therein. The data, both audio information and clock data in the horizontal blanking interval is further distorted by the application of a periodic waveform and/or voltage level enhancement of certain portions of the horizontal blanking interval.

Journal ArticleDOI
TL;DR: An adaptive bit allocation scheme is introduced here, in order to replace the usual form of a fixed distribution of the bit rate among the sub-bands, and highly intelligible reproduction of speech is possible at bit rates below 7 kb/s.

Proceedings ArticleDOI
09 Apr 1980
TL;DR: A conversational-mode, speech-understanding system which enables its user to make airline reservations and obtain timetable information through a spoken dialog as a three-level hierarchy consisting of an acoustic word recognizer, a syntax analyzer, and a semantic processor.
Abstract: We describe a conversational mode speech understanding system which enables its user to make airline reservations and obtain timetable information through a spoken dialog. The system is structured as a three level hierarchy consisting of an acoustic word recognizer, a syntax analyzer and a semantic processor. The semantic level controls an audio response system making two way speech communication possible. The system is highly robust and operates on-line in a few times real time on a laboratory minicomputer. The speech communication channel is a standard telephone set connected to the computer by an ordinary dialed-up line.


Journal ArticleDOI
TL;DR: In this paper, a series of computer experiments aimed to increase our understanding about the sufficiency of the short-time amplitude spectrum for speech coding, and to examine how bandpass segments of the speech spectrum might be represented parametrically.
Abstract: We report a series of computer experiments aimed to increase our understanding about the sufficiency of the short‐time amplitude spectrum for speech coding, and to examine how bandpass segments of the speech spectrum might be represented parametrically. For this purpose we utilize the absolute value of the short‐time Fourier transform and the time‐derivative of the short‐time phase, evaluated at frequency intervals chosen according to auditory criteria. We analyze and digitally encode these parameters. We find that a frequency resolution corresponding to contiguous 1/6‐octave bands, spanning the range 200 to 3200 Hz, is a perceptually satisfactory design, and permits digital coding of respectable quality at transmission rates in the range 20 to 16 K bits/s. We also find that a combination of subband coding and short‐time spectrum coding leads to comparable results and provides added economy in processing.

Proceedings ArticleDOI
R. Viswanathan1, W. Russell, A. Higgins, M. Berouti, John Makhoul 
01 Apr 1980
TL;DR: This paper considers the optimization of speech quality of adaptive predictive coding (APC) systems for transmission over a synchronous 16 kb/s channel, and presents several optimized APC systems.
Abstract: This paper considers the optimization of speech quality of adaptive predictive coding (APC) systems for transmission over a synchronous 16 kb/s channel. Among the important issues included in this on-going optimization study are: comparative evaluation of several methods for adaptive coding of the APC residual; comparative testing of several methods for adaptive shaping of the spectrum of the quantization noise; and optimization of various parameter values and their bit allocation. In addition to reporting the results from this study, we report on the occurrence of "limit cycles" or regions of excessive quantization noise build-up, offer an explanation for their cause in terms of the feedback gain (or "loop gain") of the APC transmitter, and present experimental results obtained using several remedial means for this problem. Also, we report on the relative properties of different APC coder configurations. Based on the results of our optimization study to date, we present several optimized APC systems.

Journal ArticleDOI
TL;DR: A metric-first tree coding algorithm, similar to the stack algorithm, is used to encode voiced speech and it is shown how to optimize the algorithm's performance with respect to its storage, execution time, and number of tree branches searched per symbol released as output.
Abstract: A metric-first tree coding algorithm, similar to the stack algorithm, is used to encode voiced speech. From experimental evidence, it is shown how to optimize the algorithm's performance with respect to the algorithm's storage, execution time, and number of tree branches searched per symbol released as output. For each of these, the optimal parameterizing of the algorithm differs markedly. Similarities are pointed out between our results for speech and earlier theoretical results for the binary i.i.d, source with Hamming distortion measure. Comparisons to the M algorithm are made.

Proceedings ArticleDOI
01 Apr 1980
TL;DR: Preliminary results obtained with a speech encoding scheme which combines time-domain harmonic compression (TDHC) and CVSD for digital transmission of speech signals at 7.2 kbit/s suggest that the reconstructed speech has a quality close to that achieved by a 14.4 k bit/s plain CVSD.
Abstract: In this presentation we report on preliminary results obtained with a speech encoding scheme which combines time-domain harmonic compression (TDHC) and CVSD for digital transmission of speech signals at 7.2 kbit/s. The harmonic compression at the transmitter and the harmonic expansion at the receiver are performed with the recently developed time-domain harmonic scaling (TDHS) algorithms. By using a frequency-scaling factor of 2 and the encoding of the frequency-divided signal with a 7.2 kbit/s CVSD coder, the reconstructed speech has a quality close to that achieved by a 14.4 kbit/s plain CVSD. The complexity of the proposed system is significantly lower than more elaborate waveform coders or vocoders which can operate at 7.2 kbit/s. Issues involved in the proposed combination are discussed and computer simulation results are presented.

Journal ArticleDOI
TL;DR: Kalman backward adaptive predictor coefficient identification is combined with a modified pitch-compensating quantizer (MPCQ) to produce a high-performance adaptive differential pulse code modulation (ADPCM) system for operation at data rates of 12-16 kbits/s.
Abstract: Kalman backward adaptive predictor coefficient identification is combined with a modified pitch-compensating quantizer (MPCQ) to produce a high-performance adaptive differential pulse code modulation (ADPCM) system for operation at data rates of 12-16 kbits/s. The Kalman/MPCQ system is compared to an ADPCM system using a Kalman algorithm and robust Jayant qnantization and to a system with a fixed-tap predictor and MPCQ. The performance indicators are signal-to-quantization noise ratio (SNR), sound spectrogram analyses, and formal subjective listening tests. The SNR comparisons indicate that the Kalman/ MPCQ system has the highest SNR, followed by the fixed-tap/MPCQ system, and then the Kalman/robust Jayant system. Subjective listening test results show that the Kalman/MPCQ system is preferred over the fixed-tap/MPCQ system 100 percent of the time and over the Kalman/ robust Jayant system 80 percent of the time. Kalman adaptation thus provides an important perceptual effect not evident in the SNR's. The previously catastrophic effects of transmission errors on backward adaptive prediction are eliminated by simple ADPCM system modifications that do not affect the SNR or subjective quality of the output in the absence of errors for the five sentences studied. The problem of tandeming with a linear predictive coder (LPC) is investigated by using LPC processed speech as input to the three ADPCM systems and by using the output of the three ADPCM systems as input to an LPC analysis algorithm. For the LPC to ADPCM connection, the two systems with the MPCQ produce good quality output speech, while the system with robust Jayant quantization exhibits a fading phenomenon. For the ADPCM into LPC analysis, all three systems produce speech of approximately the same quality, with the fixedtap system being slightly, noisier. Using a distance measure proposed by Itakura, the predictor coefficients computed from the three ADPCM system outputs are compared with the predictor coefficients calculated from the uncontaminated speech. According to this distance measure, the coefficients computed from the Kalman/MPCQ system output are much closer to the desired coefficients than are those computed by the other two systems.

Proceedings ArticleDOI
01 Apr 1980
TL;DR: Experiments discussed here show that LPC synthesis is generally very close to natural speech in the high frequency region and that most of the degradation is in the low frequency reglon.
Abstract: Results of past studies on the quality problems of LPC speech are reviewed. The causes of the quality problems are found to lie within the basic model assumptions as well as inaccuracies in LPC analysis and errors introduced in pitch and voicing detection and parameter quantization. Experiments discussed here show that LPC synthesis is generally very close to natural speech in the high frequency region and that most of the degradation is in the low frequency reglon (approximately less than 1500 Hz).

Proceedings ArticleDOI
09 Apr 1980
TL;DR: Improvements on the classical model of speech are presented which produces speech that is significantly better than currently available systems and an efficient encoding of the prediction residuals of the two components.
Abstract: This paper presents improvements on the classical model which produces speech that is significantly better than currently available systems. The first major improvement results from treating speech as a two source phenomenon that can be separated for parallel but independent analysis/ synthesis. This two component decomposition is accomplished by making use of the quasi-periodic nature of 'voiced' speech. The second major improvement in bit compression and robustness of operation results from an efficient encoding of the prediction residuals of the two components. The key step is to encode the residual of the periodic component by picking out and transmitting the essential information for only one cycle (pitch period) of the residual.