scispace - formally typeset
Search or ask a question

Showing papers on "Speech coding published in 1981"


Journal ArticleDOI
TL;DR: Perceptual considerations indicate that packet lengths most robust to losses are in the range 16-32 ms, irrespective of whether interpolation is used or not, whereas tolerable P L values can be as high as 2 to 5 percent without interpolation and 5 to 10 percent with interpolation.
Abstract: We have studied the effects of random packet losses in digital speech systems based on 12-bit PCM and 4-bit adaptive DPCM coding. The effects are a function of packet length B and probability of packet loss P L . We have also studied tbe benefits of an odd-even sample-interpolation procedure that mitigates these effects (at the cost of increased decoding delay). The procedure is based on arranging a 2B -block of codewords into two B -sample packets, an odd-sample packet and an even-sample packet. If one of these packets is lost, the odd (or even) samples of the 2B -block are estimated from the even (or odd) samples by means of adaptive interpolation. Perceptual considerations indicate that packet lengths most robust to losses are in the range 16-32 ms, irrespective of whether interpolation is used or not. With these packet lengths, tolerable P L values, which are strictly input-speech-dependent, can be as high as 2 to 5 percent without interpolation and 5 to 10 percent with interpolation. These observations are based on a computer simulation with three sentence-length speech inputs, and on informal listening tests.

254 citations


Journal ArticleDOI
TL;DR: An information theory approach to the theory and practice of linear predictive coded speech compression systems is developed and it is shown that a traditional LPC system can be viewed as a minimum distortion or nearest-neighbor system where the distortion measure is a minimum discrimination information between a speech process model and an observed frame of actual speech.
Abstract: An information theory approach to the theory and practice of linear predictive coded (LPC) speech compression systems is developed. It is shown that a traditional LPC system can be viewed as a minimum distortion or nearest-neighbor system where the distortion measure is a minimum discrimination information between a speech process model and an observed frame of actual speech. This distortion measure is used in an algorithm for computer-aided design of block source codes subject to a fidelity criterion to obtain a 750-bits/s speech compression system that resembles an LPC system but has a much lower rate, a larger memory requirement, and requires no on-line LPC analysis. Quantitative and informal subjective comparisons are made among our system and LPC systems.

217 citations


01 Jun 1981

108 citations


Journal ArticleDOI
TL;DR: This paper explores the use of the Bell Laboratories digital signal processing integrated circuit for digitally encoding speech or audio signals based on the sub-band coding technique, and considers some general issues involved in implementing multirate signal processing algorithms of this type on the digital signal processor.
Abstract: This paper explores the use of the Bell Laboratories digital signal processing integrated circuit for digitally encoding speech or audio signals based on the sub-band coding technique. Sub-band coding represents a next level in algorithmic complexity over that of adaptive differential pulse-code modulation, discussed in a companion paper, and it has a corresponding advantage in performance. We discuss the details of a real-time, two-band sub-band coding implementation on the digital signal processor. We then comment on how this approach can be extended to more than two band designs for greater bit rate compression capability. In connection with this, we also consider some general issues involved in implementing multirate signal processing algorithms of this type on the digital signal processor.

71 citations


Journal ArticleDOI
R. Cox1, R. Crochiere
TL;DR: It is shown that the choice of the homomorphic "side-information" model leads to a convenient form of the ATC algorithm for real-time block processing using array processing techniques.
Abstract: Adaptive transform coding (ATC) has recently been proposed as a technique for speech coding at bit rates in the range of 9.6- 16 kbits/s. In this paper we report on two new developments: 1) the use of a homomorphic vocoder model for the "side-information" channel in ATC and 2) a real-time simulation of ATC on an array processing computer. It is shown that the choice of the homomorphic "side-information" model leads to a convenient form of the ATC algorithm for real-time block processing using array processing techniques. It is also shown that the log spectrum output of the homomorphic model is in a convenient form for input to both the bit assignment algorithm in ATC (which becomes a straightforward quantization operation) and the quantization of the transform coefficients (which may be done in the log domain). An array processor simulation of this form of the algorithm has been implemented and it serves as a highly useful and convenient tool for studying the ATC algorithm in real time in a Fortran programming environment. It has allowed us, for the first time, to perform actual telephone conversations over a transform coder. The quality of this ATC algorithm was found to be essentially equivalent to that of a previous version using an LPC vocoder model for the side information.

51 citations


PatentDOI
Jonathan Allen1
TL;DR: A system for interpolating digital data signals to a frequency band above analog speech signals in a common transmission channel is disclosed and is compatible with digital signal processing techniques using Fast Fourier Transform Technology in conjunction with solid state logic elements.
Abstract: A system for interpolating digital data signals to a frequency band above analog speech signals in a common transmission channel is disclosed. The system utilizes short time frequency analysis techniques to determine the cutoff frequency of the speech signal. Data signals temporarily held in storage within the system are thereafter modulated into an unused frequency band of the transmission channel above that needed for speech signals. The combined speech and data signals in the system are sent to a receiver which relays the respective speech and data signals to their appropriate locations. This system is compatible with digital signal processing techniques using Fast Fourier Transform Technology in conjunction with solid state logic elements.

35 citations


Journal ArticleDOI
TL;DR: The SBC/HS system emerges as a particularly attractive method for speech encoding at the data rate of 9.6 kbits/s since its quality is comparable to that of ATC/HS (or SBC at 16 k bits/s), yet, its complexity is lower than ATC and the system is amenable to real-time hardware implementation using current technology.
Abstract: In this study an approach for improving the performance of waveform coders, based on coding a frequency scaled speech signal, is examined and subjectively evaluated for specific subband and transform coding systems. The recently developed simple and efficient time-domain harmonic scaling (TDHS) algorithms are used to frequency scale the speech signal. The underlying frequency-domain model of the pitch-adaptive TDHS algorithms provides insight and guidelines for their use in this application, as outlined in this work. The subjective evaluation is based on an A-B comparison test involving 12 listeners and shows a meaningful improvement in quality for the waveform coders used at low bit rates. In particular, subband coding (SBC) combined with TDHS (SBC/HS) at 9.6 kbits/s was found to provide a quality equivalent to that of SBC alone at 16 kbits/s, i.e., a bit-rate advantage of about 7 kbits/s was realized. For the speech specific adaptive transform coder (ATC) used, the combined system (ATC/HS) achieves a bit-rate advantage of 4 kbits/s at 7.2 kbits/s. The SBC/HS system emerges as a particularly attractive method for speech encoding at the data rate of 9.6 kbits/s since its quality is comparable to that of ATC/HS (or SBC at 16 kbits/s). Yet, its complexity is lower than ATC and the system is amenable to real-time hardware implementation using current technology.

28 citations


Proceedings ArticleDOI
01 Apr 1981
TL;DR: Non-instantaneous, tree-coding methods that allow the attainment of even lower bit rates (near the theoretical rate-distortion limit) with the precise optimum noise spectrum are described.
Abstract: In previous papers on digital coding we have stressed the importance of taking proper account of the masking properties of the human ear in order to minimize the subjective loudness of the quantizing noise. The resulting optimal quantizing noise spectrum is in general not flat and requires the use of noise-shaping filters. This masking of the quantizing noise by the speech signal itself has allowed us to use very low bit rates (less than 1 bit/sample for the prediction residual in aa adaptive predictive coder) while maintaining high speech quality. However, if the low bit rates are realized by a (coarse) instantaneous qnantizer, the quantizing error is not white and the noise-shaping filter (in the feedback loop around the quantizer) does not produce the intended noise spectrum. In this paper, we therefore describe non-instantaneous, tree-coding methods that allow the attainment of even lower bit rates (near the theoretical rate-distortion limit) with the precise optimum noise spectrum.

21 citations


Journal ArticleDOI
TL;DR: A low-rate wave form coder for speech compression is designed using techniques from universal source coding, fake process tree encoding, and linear predictive coding to yield a fidelity that compares well with the best existing adaptive-waveform coder of the same rate.
Abstract: A low-rate (about one bit per sample) waveform coder for speech compression is designed using techniques from universal source coding, fake process tree encoding, and linear predictive coding (LPC). The system does not require on-line adaptation or LPC analysis, yet it yields a fidelity that compares well with the best existing adaptive-waveform coder of the same rate.

19 citations


Journal ArticleDOI
TL;DR: The relative frequency of path switching for the single symbol release rule is investigated for the (M,L) and truncated Viterbi tree search algorithms, various search depths, and different code generators.
Abstract: Tree coding of speech has been investigated by several workers. Virtually all of these investigations have involved incremental tree coding in that no matter how deep the tree is searched, only a single path map symbol is released at a time. As noted by Gray, even if a good long-term fit is found, the first step in the fit may be a poor one, thus yielding large sample distortions. Hence, it is important to stay on a path long enough to achieve the promised long-term distortion value. The relative frequency of path switching for the single symbol release rule is investigated for the (M,L) and truncated Viterbi tree search algorithms, various search depths, and different code generators. In addition, two multiple symbol release rules are investigated. One rule releases a fixed number of path symbols at a time, while the other rule releases a variable number of path symbols, the exact number depending on how many symbols are required for the average sample distortion to be less than or equal to the L -depth path average distortion. Speech sources are considered exclusively.

19 citations


Proceedings ArticleDOI
01 Jan 1981
TL;DR: The panel will explore the many voices of the new IC speech synthesizers, including: 'rules'-generated speech, synthesis-by-analysis and Mozer waveform encoding.
Abstract: The panel will explore the many voices of the new IC speech synthesizers, including: 'rules'-generated speech, synthesis-by-analysis and Mozer waveform encoding. Advantages of analog sampled-data filters versus digital filters, linear prediction (LPC) versus formant encoding and ways to beat the quality/ bit rate trade-offs will also be covered.

Journal ArticleDOI
TL;DR: A review of the progress made on cochlear implant worldwide is presented in this paper, which contrasts the results obtained in different labs on similar tests: threshold detection, pitch and loudness scaling, chronaxie, and difference limen tests.
Abstract: The cochlear implant has recently seemed useful enough to be considered by some as a clinical procedure. This paper presents a review of the progress made on cochlear implants worldwide. Data are presented which contrast the results obtained in different labs on similar tests: threshold detection, pitch and loudness scaling, chronaxie, and difference limen tests. Models to account for some of these results are given. Four centers (San Francisco, Vienna, Salt Lake City, and Melbourne) have reported surprisingly good speech comprehension scores. We discuss the psychoacoustic characteristics which may relate to these scores and demonstrate that many implant patients show results which are expected quite typically in cochlear dysfunction. Thus the limits on speech processing in the hard‐of‐hearing may apply as well to implant patients, regardless of the method of speech coding. We also stress the lack of evidence confirming the assumed greater usefulness of multi‐channel implants over single channel devices. Finally, a careful look is given to the risks involved in these procedures, specifically those of bone and tumor growth, device replacement, and the psychological effects from device failure.

Journal ArticleDOI
Chung Un1, Hwang Lee1, Joo Song
TL;DR: A reasonably complete account of an improved adaptive delta modulation system called hybrid companding delta modulation (HCDM) that is far superior to continuously variable slope DM (CVSD) or constant factor DM (CFDM) is presented.
Abstract: We present a reasonably complete account of an improved adaptive delta modulation (ADM) system called hybrid companding delta modulation (HCDM). The HCDM system that is far superior to continuously variable slope DM (CVSD) or constant factor DM (CFDM) is advantageous, particularly for speech coding. It employs both syllabic and instantaneous companding schemes. Performance analysis of the system has been done and verified by computer simulation. In getting the mathematical formula for HCDM granular noise, a new method based on amplitude distribution is proposed. Optimization of the system parameter values by simulation is also discussed. In addition, an efficient method of hardware implementation is considered.

Proceedings ArticleDOI
01 Apr 1981
TL;DR: An algorithm for the design of locally optimum vector quantizer relative to a distortion measure is used to design and simulate vector quantizers for both real sampled speech and for speech-like waveforms produced by a tenth order autoregressive random process with matching autocorrelation.
Abstract: An algorithm for the design of locally optimum vector quantizers relative to a distortion measure is used to design and simulate vector quantizers for both real sampled speech and for speech-like waveforms produced by a tenth order autoregressive random process with matching autocorrelation. Both squared-error and a weighted squared error were considered. The experimental results were compared with performance bounds from rate distortion theory based on the autoregressive model.

Journal ArticleDOI
TL;DR: A differential pulse code modulation (DPCM) system having adaptive quantization with forward (AQF) transmission of step-size, and second-order predictors that are adaptive and operate on the locally decoded speech signal, is proposed.
Abstract: A differential pulse code modulation (DPCM) system having adaptive quantization with forward (AQF) transmission of step-size, and second-order predictors that are adaptive and operate on the locally decoded speech signal, is proposed. For a transmission rate of 40 kbits/s, a block size of 256 speech samples, the DPCM-AQF system using the sequential gradient estimation predictor (SGEP) has segmental signal-to-noise ratio (SNR) gains of 3 and 9 dB compared to the stochastic approximation predictor (SAP) and the leaky integrator, respectively. The dynamic range of the DPCM-AQF using SGEP for an SNR of 35 dB is 30 dB, and it is insensitive to block size (<512). When transmission errors are introduced, it has a higher SNR than that achieved with the leaky integrator for bit error rates <0.08 percent.

PatentDOI
Leon W. Cox1
TL;DR: The speech synthesizer is capable of electronically synthesizing human speech from coded speech data including parameters as stored either in a solid state memory on a permanent basis or alternatively as temporarily stored in another memory, wherein the codedspeech data is made available from an external source, such as a central processing unit of a commercial or home-type computer, as coupled to the speech synthesizers.
Abstract: Speech synthesizer and a computer system having the speech synthesizer operably coupled thereto to provide speech capability for the computer system. The speech synthesizer is capable of electronically synthesizing human speech from coded speech data including parameters as stored either in a solid state memory on a permanent basis or alternatively as temporarily stored in another memory, wherein the coded speech data is made available from an external source, such as a central processing unit of a commercial or home-type computer, as coupled to the speech synthesizer. The speech synthesizer may be in the form of a speech module including a speech synthesizer processor for converting coded speech data into digital speech signals in combination with a mode selector which selectively applies either the coded speech data from a read-only-memory within the speech module or the coded speech data obtained from the external source to the speech synthesizer processor in response to a control signal provided by the external source for determining which of the two alternative operating modes will be employed in a given instance. The computer system is provided with speech capability by including the speech module as a component thereof in combination with a computer input device, the central processing unit of the computer, and an audio amplifier and speaker connected to a digital-to-analog converter of the speech module so as to generate audible human speech from the digital speech signals provided by the speech synthesizer processor of the speech module.

Patent
24 Jul 1981
TL;DR: In this article, a speech path memory having both forward and backward time switches has addresses corresponding to accommodated lines; and a speech signal and a non-speech signal (a signal indicating an on-hook or off-hook state and other control signals) are transmitted over a highway between each line module and a switch module.
Abstract: In a time division switching system of the time-space-time (T-S-T) arrangement, a speech path memory having both forward and backward time switches has addresses corresponding to accommodated lines; and a speech signal and a non-speech signal (a signal indicating an on-hook or off-hook state and other control signals) are transmitted over a highway between each line module and a switch module. When a speech channel is busy, the speech signal is read out from a speech path memory and written therein and transmitted to a remote station. Also when the speech channel is idle, the non-speech signal is written into the speech path memory but, in this case, the non-speech signal is read out into a signal processor or a non-speech signal is written into the speech path memory from the signal processor or a non-speech signal written into the speech path memory from the signal processor is read out onto a line. With this arrangement, a memory for the exclusive use of the non-speech signal, which is required in the prior art for non-speech signal transmission can be disposed of and transmission control of control signals of various line modules can be simplified. The speech path memory is provided with a speech signal storage area and a non-speech signal storage area, by which it is possible to transmit a large amount of information as the non-speech signal in accordance with the requirements of various line modules.

Patent
30 Apr 1981
TL;DR: In this paper, the spectrum of the original signal extending over a frequency range 0-4000 Hz is divided into 16 sub-bands each containing coded samples to 12 bits, and samples of the 16 subbands are requantized BCPCM dynamically variable bit rate.
Abstract: A method of encoding a speech signal origin. The spectrum of the original signal extending over a frequency range 0-4000 Hz is divided into 16 sub-bands each containing coded samples to 12 bits. Samples of the 16 subbands are requantized BCPCM dynamically variable bit rate. To do this, the stream of samples of the subbands is divided into fixed-length blocks, and each block is processed to derive two scale factors whose difference is analyzed to characterize the sample block considered transient block or stationary. For stationary block, a single scale factor is preserved while for any transient block the two scale factors are preserved. Samples of the 16 subbands are then requantized signal dynamically with an overall number of bits depending on the transient or stationary nature of the considered block

Proceedings ArticleDOI
01 Apr 1981
TL;DR: A new approach to speech digitization at mediumband bit rates of 9.6 to 16 Kb/s is described, based on a combination of Time Domain Harmonic Scaling and an Adaptive Residual Coder.
Abstract: This paper describes the study of a new approach to speech digitization at mediumband bit rates of 9.6 to 16 Kb/s. The technique is based on a combination of Time Domain Harmonic Scaling and an Adaptive Residual Coder. Computer simulation studies have shown that this technique is able to produce excellent quality speech at the bit rates in question.

PatentDOI
Kazuhiko Maeba1
TL;DR: Prestored volume level data in a speech synthesis system controls the reference voltage to the resistor-divider network of the D/A converter to provide a volume-controlled output speech signal as mentioned in this paper.
Abstract: Prestored volume level data in a speech synthesis system controls the reference voltage to the resistor-divider network of the D/A converter to provide a volume-controlled output speech signal.

Proceedings ArticleDOI
01 Apr 1981
TL;DR: The application of a new source coding scheme called Modulo-PCM with side information to speech coding is studied and the performance characteristics of an adaptive and a non-adaptive scheme are evaluated using two speech utterances.
Abstract: The application of a new source coding scheme called Modulo-PCM with side information to speech coding is studied. The performance characteristics of an adaptive and a non-adaptive scheme are evaluated using two speech utterances.

Proceedings ArticleDOI
B. Atal1, J. Remde
01 Apr 1981
TL;DR: A split-band adaptive predictive coding system for digital transmission of speech signals that division of the prediction residue signal into many frequency bands results in more accurate pitch prediction - particularly, at low frequencies.
Abstract: We describe a split-band adaptive predictive coding system for digital transmission of speech signals. In this system, the prediction residue signal obtained after spectral prediction is filtered into 2 or more frequency bands. Each of the filtered signals is reduced further by pitch prediction and is quantized by a 15-level noise feedback quantizer. The input to the quantizer is severely center-clipped to produce a quantized signal with low entropy. The division of the prediction residue signal into many frequency bands results in more accurate pitch prediction - particularly, at low frequencies. The split-band system uses separate quantizers for each frequency band. The step size of the quantizer and the center-clipping threshold can thus be adjusted to optimize speech quality in each band.


Book
01 Jan 1981
TL;DR: Three new techniques for designing and simulating low rate speech compression systems based on vector quantization (VQ) are described, combining ideas from the first two to obtain a residual-excited linear predictive (RELP) speech compression system using VQ in both model selection and residual digitization.
Abstract: Three new techniques for designing and simulating low rate speech compression systems based on vector quantization (VQ) are described. The first is a rate-distortion speech coder that resembles a linear predictive coded (LPC) speech compression system, but has much lower rate (under 800 bits per second (bps)) and a much larger memory requirement. The encoder performs a minimum distortion rule using the Itakura-Saito distortion measure. The speech quality provided at such low rate is comparable to that of 2400 bps and 4800 bps standard LPC systems. The second system is a waveform coder consisting of a minimum (weighted and unweighted) mean-square error VQ of one or two bits per sample (6500 and 13000 bps, respectively). It can be considered as a multidimensional pulse code modulation (PCM) system. The speech quality provided is considered at least as good as that of other standard waveform coders. The third system combines ideas from the first two to obtain a residual-excited linear predictive (RELP) speech compression system using VQ in both model selection and residual digitization. The working rates of our RELP system are 7000 and 13500 bps providing, among the RELP systems that we know of, the best speech quality.

Proceedings ArticleDOI
01 Apr 1981
TL;DR: Most impressive of the several interesting and useful results reported in the paper is that the final optimized, robust coder design yields speech quality that degrades only slightly as the bit-error rate is increased from 0% to 1%.
Abstract: This paper presents the results of our optimization study of 16 kb/s APC coders operating over noisy channels. With the objective of achieving good speech quality and a robust coder performance for channel bit-error rates of up to 1%, we have investigated a number of issues including: the tradeoff between voice-data bandwidth and error-protection bandwidth; the amount of error protection for individual transmission parameters; comparison of several residual coding methods; the relative performance of the two ways of sequencing the spectral and pitch predictors; use of folded binary code for encoding the quantized residual; and smoothing of the decoded residual. Most impressive of the several interesting and useful results reported in the paper is that the final optimized, robust coder design yields speech quality that degrades only slightly as the bit-error rate is increased from 0% to 1%.

Proceedings ArticleDOI
01 Apr 1981
TL;DR: This paper discusses a form of non-linear prediction, namely, the prediction of the phase of speech signals, based upon a new treatment of the classical speech production model within a short-time analysis/synthesis framework.
Abstract: Prediction plays a key role in many signal processing applications. Linear Prediction has, in particular, been extremely useful to the development of digital speech processing techniques and applications. There is however a growing need for improved forms of prediction. We discuss, in this paper, a form of non-linear prediction, namely, the prediction of the phase of speech signals. This study is conducted within a short-time analysis/synthesis framework and is based upon a new treatment of the classical speech production model. Experimental data are presented confirming the theoretical results. Finally the use of phase prediction to low-bit rate, high-quality coding applications is discussed.


Proceedings ArticleDOI
01 Apr 1981
TL;DR: The paper addresses the problem of the judicious choice of distance measures and the design of a good dictionary and experimental results are provided for dictionary coding of the signal using a distance measure combining time and frequency domain features.
Abstract: The paper presents a new approach to the problem of speech coding in the range of 4 to 8 kbits/sec along with experimental tests performed on a MAP 200 array processor. The basic idea consists of using a dictionary of K indexed waveforms of fixed duration T (where T lies between 1 and 4 ms). The input signal is broken down into blocks of duration T. For each segment the Dictionary is searched for that prototype waveform which is, with respect to some distance measure, the closest to the input block. The digital representation of the signal is thereafter the index of the prototype. The paper addresses the problem of the judicious choice of distance measures and the design of a good dictionary. Experimental results are provided for dictionary coding of the signal using a distance measure combining time and frequency domain features. Experimental results are further presented for the coding of the LPC residual.

Proceedings ArticleDOI
01 Apr 1981
TL;DR: This paper presents some results of an experimental study of the use of segmental preclassification as a technique for improving the performance of objective speech quality measures.
Abstract: This paper presents some results of an experimental study of the use of segmental preclassification as a technique for improving the performance of objective speech quality measures. In all such measures tested, the distorted speech was first divided into segments, and each segment was classified by an objective classification procedure into one of four classes: "silence," "fricative," "vocalic," and "nasal." Separate objective measures were then computed for each class, and a final overall objective quality measure was computed as a weighted sum of the individual classified measures. Figures-of-merit for the objective measures tested were computed using correlation analysis between a data base of objective quality measures computed using the classified measures and subjective speech quality measures across the same set of distorted and coded speech data. The subjective quality test which was used was the DAM test.

Proceedings ArticleDOI
R. Cox1, D. Malah
01 Apr 1981
TL;DR: The recently developed time domain harmonic sealing (TDHS) algorithm has been found to be the basis for an effective enhancement technique and a class of windows for its implementation is established.
Abstract: Periodically structured noise is noise which occurs randomly but with a fixed or slowly varying period. The noise periodicity is usually due to some underlying process, such as block processing of the speech where discontinuities between successive blocks result. This type of noise permeates the entire speech spectrum and is not removable by standard filtering techniques. The recently developed time domain harmonic sealing (TDHS) algorithm has been found to be the basis for an effective enhancement technique. In this paper we discuss the underlying theory of this technique and establish a class of windows for its implementation. As an example the frame rate noise of adaptive transform coding was perceptually reduced using this technique. Results from a subjective testing experiment using ATC coded speech with bit rates of 7.2 to 16 Kb/s indicated an improvement in quality equivalent to an increase in code rate of 2.4 to 3 Kb/s for speech originally coded at 7.2 to 12 Kb/s.