scispace - formally typeset
Search or ask a question

Showing papers on "Speech coding published in 1984"


Journal Article
TL;DR: During the past few years several design algorithms have been developed for a variety of vector quantizers and the performance of these codes has been studied for speech waveforms, speech linear predictive parameter vectors, images, and several simulated random processes.
Abstract: A vector quantizer is a system for mapping a sequence of continuous or discrete vectors into a digital sequence suitable for communication over or storage in a digital channel. The goal of such a system is data compression: to reduce the bit rate so as to minimize communication channel capacity or digital storage memory requirements while maintaining the necessary fidelity of the data. The mapping for each vector may or may not have memory in the sense of depending on past actions of the coder, just as in well established scalar techniques such as PCM, which has no memory, and predictive quantization, which does. Even though information theory implies that one can always obtain better performance by coding vectors instead of scalars, scalar quantizers have remained by far the most common data compression system because of their simplicity and good performance when the communication rate is sufficiently large. In addition, relatively few design techniques have existed for vector quantizers. During the past few years several design algorithms have been developed for a variety of vector quantizers and the performance of these codes has been studied for speech waveforms, speech linear predictive parameter vectors, images, and several simulated random processes. It is the purpose of this article to survey some of these design techniques and their applications.

2,743 citations


Journal ArticleDOI
TL;DR: Several algorithms are presented for the design of shape-gain vector quantizers based on a traning sequence of data or a probabilistic model, and their performance is compared to that of previously reported vector quantization systems.
Abstract: Memory and computation requirements imply fundamental limitations on the quality that can be achieved in vector quantization systems used for speech waveform coding and linear predictive voice coding (LPC). One approach to reducing storage and computation requirements is to organize the set of reproduction vectors as the Cartesian product of a vector codebook describing the shape of each reproduction vector and a scalar codebook describing the gain or energy. Such shape-gain vector quantizers can be applied both to waveform coding using a quadratic-error distortion measure and to voice coding using an Itakura-Saito distortion measure. In each case, the minimum distortion reproduction vector can be found by first selecting a shape code-word, and then, based on that choice, selecting a gain codeword. Several algorithms are presented for the design of shape-gain vector quantizers based on a traning sequence of data or a probabilistic model. The algorithms are used to design shape-gain vector quantizers for both the waveform coding and voice coding application. The quantizers are simulated, and their performance is compared to that of previously reported vector quantization systems.

305 citations


Patent
Harald Höge1
12 Sep 1984
TL;DR: In this article, a method of determining speech spectra for automatic speech recognition and speech coding is presented, wherein a codebook tree is used, namely an arrangement of codebook spectra which is orientated with binary branching and thus can be addressed in binary-coded fashion, with levels which contain 2**L spectra, characterized in that in an analysis stage (A) which is supplied with a speech signal which is to be analysed, firstly, preferably using a signal processor (S), the respective spectral parameter set is determined, that each obtained parameter set comprising parameter values
Abstract: 1. A method of determining speech spectra for automatic speech recognition and speech coding, wherein a codebook tree is used, namely an arrangement of codebook spectra which is orientated with binary branching and thus can be addressed in binary-coded fashion, with levels which contain 2**L spectra, characterized in that in an analysis stage (A) which is supplied with a speech signal which is to be analysed, firstly, preferably using a signal processor (S), the respective spectral parameter set is determined, that each obtained parameter set comprising parameter values Pi is binary-coded in a coding stage (C) whereby each parameter value Pi is assigned a code Ci , that all the codes ({Ci }) which have been formed and which are to be evaluated are combined to form an overall code (Cg ), that the overall code (Cg ) is used as addressing signal for a read-only store ROM, namely a Hash-ROM (H), where the store content of the Hash-ROM (H) comprises the numbers, assigned to the required spectrum, of the codebook spectra in a K-th level of the codebook tree, that the numbers of the codebook spectra are used as an indicator of the codebook store section which contains the codebook tree from the K-th level, and that when entering into the K-th level of the codebook tree has been facilitated in this way, a final determination of the codebook spectrum in the L-th level of the codebook tree is carried out by means of a comparator unit (V) in the form of a tree search.

168 citations


Proceedings ArticleDOI
Sharad Singhal1, B. S. Atal2
19 Mar 1984
TL;DR: This paper focuses on problems encountered in attempting to maintain speech quality while synthesizing speech using multi-pulse excitation at lower bit rates.
Abstract: The multi-pulse excitation model provides a method for producing natural-sounding speech at medium to low bit rates. Multi-pulse analysis obtains the all-pole filter excitation by minimizing a spectrally-weighted mean-squared error between the original and synthetic speech signals. Although the method provides high quality speech around 10 kbits/sec, speech quality suffers if the bit rate is lowered. In this paper, we focus on problems encountered in attempting to maintain speech quality while synthesizing speech using multi-pulse excitation at lower bit rates.

163 citations


Patent
02 Apr 1984
TL;DR: In this paper, a technique for transmitting an entire analog speech signal (S) and a modulated data signal (D(t)) over a transmission channel (20) such as a common analog telephone speech channel was proposed.
Abstract: A technique for transmitting an entire analog speech signal (S(t)) and a modulated data signal (D(t)) over a transmission channel (20) such as a common analog telephone speech channel. The present technique multiplexes the entire modulated data signal within the normal analog speech signal frequency band where the speech is present and its signal power density characteristic is at a low level. Separation of the speech and data signals at the receiver (30) is effected by recovering the modulation carrier frequency (fc) and demodulating (33) the receiver signal (X(t)) to recover the data signal. The data signal is then remodulated (34) with the recovered carrier and is convolved with an arbitrary channel impulse response in an adaptive filter (35) whose output signal is subtracted (37) from the received composite data and speech signal (X(t)) to generate the recovered speech signal (S(t)). To improve the recovered speech signal, a least mean square algorithm (36) is used to update the arbitrary channel impulse response output signal of the adaptive filter (35).

151 citations


Proceedings ArticleDOI
01 Mar 1984
TL;DR: Harmonic Coding is synthesized in the time domain, as a superimposition of "harmonics" whose instantaneous frequency varies continuously along an interpolation curve, within each frame, so that fast pitch variations can be tracked with no difficulty.
Abstract: The Harmonic Coding concept has already shown its potential for efficiently coding speech. Previous implementations have usec a frame rate of one every 16 ms. This was mainly due to the fact that, with longer frames, even a nonstationary spectral model (of low order) cannot reproduce the zones of fast-varying pitch with the desirable quality. However, the high framing rate is a limitation, since it implies that fewer bits will be available for encoding each frame. A solution for this problem has been devised: the signal is synthesized in the time domain, as a superimposition of "harmonics" whose instantaneous frequency varies continuously along an interpolation curve, within each frame. In this way, fast pitch variations can be tracked with no difficulty. Experimental results are presented, confirming these facts. The integration of this synthesis scheme in a speech coder is discussed.

98 citations


Patent
12 Oct 1984
TL;DR: In this paper, the authors describe a speech encoding apparatus that includes means for analyzing and encoding the spoken version of the message to be encoded and means for combining the codes of the corresponding written message with the code of the spoken message, and for generating a combination code containing the data of the duration and pitch of the alophones of the coded message.
Abstract: A speech encoding apparatus characterized in that it includes means (2) for analyzing and encoding the spoken version of the message to be encoded and means (3) for combining the codes of the corresponding written message with the codes of the spoken message, and for generating a combination code containing the data of the duration and pitch of the alophones of the coded message.

73 citations


Proceedings ArticleDOI
01 Mar 1984
TL;DR: It is found that the additional glottal parameters can be coded effectively such that the total bit rate is in the same range as for conventional LPC.
Abstract: A procedure is suggested for improving LPC speech quality. The central theme is to introduce a parametric model of voiced excitation - a glottal source model. In the analysis this allows for a different method than the AR-estimation used in conventional LPC. Here, a method known as AR-X-estimation is used. A complete analysis and coding method is presented. It is found that the additional glottal parameters can be coded effectively such that the total bit rate is in the same range as for conventional LPC. The glottal LPC-vocoder does significantly improve synthesis quality as compared to standard LPC. It should be emphasized, however, that the glottal vocoder requires high quality speech as input, recorded in a phase linear system. Moreover, the computational complexity is high.

56 citations



Proceedings ArticleDOI
01 Mar 1984
TL;DR: A computationally efficient formulation is derived for both covariance and correlation type analyses for multipulse coding of speech, ranging from a purely sequential one to one which reoptimizes pulse amplitudes at each step.
Abstract: This paper discusses the analysis techniques used to derive the excitation waveform for multipulse coding of speech. A computationally efficient formulation is derived for both covariance and correlation type analyses. These methods differ in the way block edges are treated. Several methods for pulse amplitude and position determination are given, ranging from a purely sequential one to one which reoptimizes pulse amplitudes at each step. It is shown that the reoptimization scheme has a nested structure that allows a reduction in the computations. An efficient method for pulse position coding is given. This method can essentially achieve the entropy limit for randomly placed pulses. Experimental results are given for typical configurations including computational requirements and speech quality assessments.

48 citations


Proceedings ArticleDOI
19 Mar 1984
TL;DR: A Hierarchical Vector Quantization scheme that can operate on "supervectors" of dimensionality in the hundreds of samples is introduced and Gain normalization and dynamic codebook allocation are used in coding both feature vectors and the final data subvectors.
Abstract: This paper introduces a Hierarchical Vector Quantization (HVQ) scheme that can operate on "supervectors" of dimensionality in the hundreds of samples. HVQ is based on a tree-structured decomposition of the original super-vector into a large number of low dimensional vectors. The supervector is partitioned into subvectors, the subvectors into minivectors and so on. The "glue" that links subvectors at one level to the parent vector at the next higher level is a feature vector that characterizes the correlation pattern of the parent vector and controls the quantization of lower level feature vectors and ultimately of the final descendant data vectors. Each component of a feature vector is a scalar parameter that partially describes a corresponding subvector. The paper presents a three level HVQ for which the feature vectors are based on subvector energies. Gain normalization and dynamic codebook allocation are used in coding both feature vectors and the final data subvectors. Simulation results demonstrate the effectiveness of HVQ for speech waveform coding at 9.6 and 16 Kb/s.

Journal ArticleDOI
TL;DR: The system results in a substantial improvement compared with the conventional full-band APC system in regard to SNR performance and predictor loop stability, and can provide speech quality subjectively equivalent to 6 bit log-PCM at 9.6 kbits/s.
Abstract: Adaptive predictive coding with dynamic bit allocation is presented for speech encoding at low to medium bit rates (6.4 kbits/s to 16 kbits/s). In this system, a split-band predictive coding scheme and a bit allocation scheme are employed in order to remove the redundancies due to a periodic concentration of the prediction residual energy, as well as the nonuniform nature of the speech spectrum. Quantization bits are dynamically allocated, both over the subbands (in the frequency domain) and over the subintervals (in the time domain), in accordance with the distribution of the residual energies in the time-frequency domain. Optimum bit allocation is derived based on the mean square error criterion on the speech waveform. The SNR gain is presented as the sum of the spectral SNR gain G f , equivalent to the prediction gain, and the temporal SNR gain G t . Although G t is much smaller than G f , temporal bit allocation greatly improves the actual SNR performance of the APC system to more than the value expected from its SNR gain in the bit rate range of less than 2 bits/sample. A study on the segmental SNR performance for various coder designs shows that the coder design using three subbands, four subintervals, and a fourth-order predictor in each subband is most appropriate for speech encoding in the bit rate range of 6.4 kbits/s to 16 kbits/s. This system is evaluated in terms of the segmental SNR and subjective speech quality. The results show that the system results in a substantial improvement compared with the conventional full-band APC system in regard to SNR performance and predictor loop stability. It is also shown that this system can provide speech quality subjectively equivalent to 7 bit log-PCM at 16 kbits/s, and to 6 bit log-PCM at 9.6 kbits/s.

PatentDOI
TL;DR: In this paper, the quantization step is determined according to the fundamental step size which provides the statistical variance, equal to one, to the quantized signal and/or the power of the residual signal.
Abstract: A speech signal coding system comprises a prediction filter coupled with an output of a quantizer for prediction of a signal. A subtractor provides the difference between an input signal and an output of the prediction filter. A quantizer quantizes the residual signal, which is the difference provided by the subtractor. The quantizer is improved by adaptively adjusting step size for quantization. Thus, the coded outputs, according to the present invention, are the parameter information of the prediction filter, quantized output of the residual signal, and step information for quantization. The quantization step is determined according to the fundamental step size which provides the statistical variance, equal to one, to the quantized signal, and/or the power of the residual signal. Because of an efficient encoding with an adaptive control of the quantization step, the bandwidth for transmission of the coded signal in a communication system or transmission rate of coded speech signal is minimized. Excellent speech is reproduced through a narrow band channel, or low bit rate digital channel like 16 kbits/second digital channel.

Proceedings ArticleDOI
01 Mar 1984
TL;DR: Experiments involving SVQ in coding the speech baseband residual are described which show in particular that subband coding does not contribute any quality improvement when SVQ is used.
Abstract: This paper concerns a new Quantization Scheme for efficient encoding of waveforms below or about one bit per sample. This technique is then applied to the encoding of baseband residual signals to demonstrate the feasibility of (baseband) Residual Excited LPC at 2400 hit/sec. The technique called "Spherical Vector Quantization" is described in which a block of n consecutive samples is quantized as a vector. The magnitude of the vector is transmitted independently of the vector's orientation. This vector's orientation is vector quantized using a codebook which can be seen as representing a set of N points on a unit hypersphere. The cases for blocks of n = 8 and 24 are discussed which make use of results by Conway and Sloane on regular point lattices. For n = 8, the algorithm is detailed which solves both the problem of finding the closest point on the hypersphere and the problem of determining the index of that point. Experiments involving SVQ in coding the speech baseband residual are described which show in particular that subband coding does not contribute any quality improvement when SVQ is used.

Journal ArticleDOI
TL;DR: An overview of adaptive prediction in differential pulse code modulation systems used for speech encoding at 16 to 32 kilobits/sec.(kbps) is presented, and comparative performances results of several backward adaptive algorithms and predictor structures are discussed.
Abstract: An overview of adaptive prediction in differential pulse code modulation (DPCM) systems used for speech encoding at 16 to 32 kilobits/sec(kbps) is presented Features of the paper include a discussion of both infinite impulse response and finite impulse response predictors and a development of the various predictor implementation structures, such as the direct or transversal form, the lattice form, and the cascade of second order sections form Differences between forward and backward adaptation are described, and comparative performances results of several backward adaptive algorithms and predictor structures are discussed

Proceedings ArticleDOI
01 Mar 1984
TL;DR: Simulation results demonstrate that vector quantization offers a distinct perceptual improvement compared with scalar quantization of the same subband signals and side information for the same total bit rate.
Abstract: Vector quantization (VQ) is examined as a technique to enhance performance in subband coding of speech at 9.6 kb/s. The set of short-term subband power levels is vector quantized, providing low-rate side information to control the coding of the subband signals. Each subband signal is then vector quantized with variable size codebooks that are dynamically assigned by the quantized side information. Two versions are described, a 7-band coder and a 14-band coder. Simulation results demonstrate that vector quantization offers a distinct perceptual improvement compared with scalar quantization of the same subband signals and side information for the same total bit rate.

Proceedings ArticleDOI
Roland Wilson1
01 Mar 1984
TL;DR: A new class of predictive coding algorithms, based on the quad-tree image representation, is described, and data compression schemes based on these algorithms have been found to produce acceptable images at rates as low as 0.25 bit/pixel.
Abstract: A new class of predictive coding algorithms, based on the quad-tree image representation, is described. Data compression schemes based on these algorithms have been found to produce acceptable images (peak-rms SNR > 30dB) at rates as low as 0.25 bit/pixel.

Patent
Shigeru Ono1
02 Jul 1984
TL;DR: In this paper, an improved excitation signal in a low bit-rate coding device for coding a discrete speech signal sequence into an output code sequence for use in exciting a synthesizing filter, an autocorrelation function of an impulse response calculated by using a parmeter sequence representative of a spectral envelope of the segment and a cross-correlation function between the segments and the impulse response are used to produce a sequence of excitation pulses by successively deciding locations and amplitudes of the pulses with the location of a currently processed pulse decided by the use of the locations and the ampl
Abstract: An improved excitation signal in a low bit-rate coding device for coding a discrete speech signal sequence into an output code sequence for use in exciting a synthesizing filter, an autocorrelation function of an impulse response calculated for the synthesizing filter by using a parmeter sequence representative of a spectral envelope of the segment and a cross-correlation function between the segment and the impulse response are used to produce a sequence of excitation pulses by successively deciding locations and amplitudes of the pulses with the location of a currently processed pulse decided by the use of the locations and the amplitudes of previously processed pulses and with renewal of the previously processed pulse amplitudes carried out concurrently with decision of the currently processed pulse amplitude by the use of the previously and currently processed pulse locations. Alternatively, the currently processed pulse location and the previously and currently processed pulse amplitudes are decided by the use of the previously processed pulse locations. The parameter and the excitation pulse sequences are coded and then combined into the output code sequence. The correlation functions are preferably calculated with the segment and the impulse response weighted by weights dependent on the parameter sequence. The segment may be a frame of the speech signal sequence or a subframe of a constant or variable length.

Proceedings ArticleDOI
K. Oh1, C. Un
19 Mar 1984
TL;DR: It has been found that for pitch detection of noisy speech the algorithm that uses an AMDF or an autocorrelation function yields relatively good performance than others.
Abstract: Results of a performance comparison study of eight pitch extraction algorithms for noisy as well as clean speech are presented. These algorithms are the autocorrelation method with center clipping, the autocorrelation method with modified center clipping, the simplified inverse filter tracking (SIFT) method, the average magnitude difference function (AMDF) method, the pitch detection method based on LPC inverse filtering and AMDF, the data reduction method, the parallel processing method and the cepstrum method. It has been found that for pitch detection of noisy speech the algorithm that uses an AMDF or an autocorrelation function yields relatively good performance than others. A pitch detector that uses center clipped speech as an input signal is effective in pitch extraction of noisy speech. In general, preprocessing such as LPC inverse filtering or center clipping of input speech yields remarkable improvement in pitch detection.

Journal ArticleDOI
TL;DR: In this article, the authors used the maximum likelihood (ML) method to derive a spectral matching criterion for autoregressive (i.e., all-pole) random processes.
Abstract: Itakura and Saito [1] used the maximum likelihood (ML) method to derive a spectral matching criterion for autoregressive (i.e., all-pole) random processes. In this paper, their results are generalized to periodic processes having arbitrary model spectra. For the all-pole model, Kay's [2] covariance domain solution to the recursive ML (RML) problem is cast into the spectral domain and used to obtain the RML solution for periodic processes. When applied to speech, this leads to a method for solving the joint pitch and spectrum envelope estimation problems. It is shown that if the number of frequency power measurements greatly exceeds the model order, then the RML algorithm reduces to a pitch-directed, frequency domain version of linear predictive (LP) spectral analysis. Experiments on a real-time vocoder reveals that the RML synthetic speech has the quality of being heavily smoothed.

PatentDOI
TL;DR: In this article, a speech recognition method and apparatus employ a speech processing circuitry for repetitively deriving from a speech input, at a frame repetition rate, a plurality of acoustic parameters.
Abstract: A speech recognition method and apparatus employ a speech processing circuitry for repetitively deriving from a speech input, at a frame repetition rate, a plurality of acoustic parameters. The acoustic parameters represent the speech input signal for a frame time. A plurality of template matching and cost processing circuitries are connected to a system bus, along with the speech processing circuitry, for determining, or identifying, the speech units in the input speech, by comparing the acoustic parameters with stored template patterns. The apparatus can be expanded by adding more template matching and cost processing circuitry to the bus thereby increasing the speech recognition capacity of the apparatus. Template pattern generation is advantageously aided by using a "joker" word to specify the time boundaries of utterances spoken in isolation, by finding the beginning and ending of an utterance surrounded by silence.

Patent
11 May 1984
TL;DR: In this article, Markov models are applied to quantized speech parameters to represent their time behavior in a probabilistic manner, which is accomplished by representing the quantised speech parameters as finite state machines having predetermined matrices of transitional probabilities from which the conditional probabilities as to the quantization of successive speech data frames are established.
Abstract: Method and system for encoding digital speech information to characterize spoken human speech with an optimally reduced speech data rate while retaining speech quality in the audible reproduction of the encoded digital speech information. Markov modeling is applied to quantized speech parameters to represent their time behavior in a probabilistic manner. This is accomplished by representing the quantized speech parameters as finite state machines having predetermined matrices of transitional probabilities from which the conditional probabilities as to the quantized speech parameter values of successive speech data frames are established. The probabilistic description as so obtained is then used to represent the respective quantized values of the speech parameters by a digital code through Huffman coding in which digital codewords of variable length represent the quantized speech parameter values in accordance with their probability of occurrence such that more probable quantized values are assigned digital codewords of a shorter bit length while less probable quantized values are assigned digital codewords of a longer bit length.

Proceedings ArticleDOI
01 Mar 1984
TL;DR: It is demonstrated that the inclusion of a pitch detector significantly improves the perceived quality of the synthetic speech and a modification of the original algorithm is described, resulting in a lower complexity, and a speech quality close to the results obtained with the original algorithms.
Abstract: We report on the results obtained from simulations of the Multi-Pulse Excitation Coder as proposed by Atal and Remde [4]. We investigated the effects of the different analysis parameters on the resulting synthetic speech signals, using objective and subjective tests. We compared the in [4] proposed sub-optimal solutions with another sub-optimal solution based on an orthogonalization of the solution space and found that the original proposed solutions are reasonable choices. We demonstrate that the inclusion of a pitch detector significantly improves the perceived quality of the synthetic speech. We also describe a modification of the original algorithm, resulting in a lower complexity, and a speech quality close to the results obtained with the original algorithm.

Journal ArticleDOI
TL;DR: This paper presents a method of incorporating LPC spectral shape and energy into the code-book entries of the vector quantizer using a distortion measure for comparing two LPC vectors that uses the weighted sum of an LPC shape distortion and a log energy distortion.
Abstract: The theory of vector quantization (VQ) of linear predictive coding (LPC) coefficients has established a wide variety of techniques for quantizing LPC spectral shape to minimize overall spectral distortion. Such vector quantizers have been widely used in the areas of speech coding and speech recognition. The conventional vector quantizer utilizes only spectral shape information and essentially disregards the energy or gain term associated with the optimal LPC fit to the signal being modeled. In this paper we present a method of incorporating LPC spectral shape and energy into the code-book entries of the vector quantizer. To do this, we postulate a distortion measure for comparing two LPC vectors that uses the weighted sum of an LPC shape distortion and a log energy distortion. Based on this combined distortion measure, we have designed and studied vector quantizers of several sizes for use in isolated word speech recognition experiments. We found that a fairly significant correlation exists between LPC shape and signal energy. Hence, an LPC shape combined with energy vector quantizer with a given distortion requires far fewer code-book entries than one in which LPC shape and energy are quantized separately. Based on isolated word recognition tests on both a 10-digit and a 129-word airlines vocabulary, we found improvements in recognition accuracy by using the VQ with both LPC shape and energy over that obtained using a VQ with LPC shape alone.

Proceedings ArticleDOI
01 Mar 1984
TL;DR: A system for speech analysis and enhancement which combines signal processing and symbolic processing in a closely coupled manner and attempts to reconstruct the original speech waveform using symbolic processing to help model the signal and to guide reconstruction.
Abstract: This paper describes a system for speech analysis and enhancement which combines signal processing and symbolic processing in a closely coupled manner. The system takes as input both a noisy speech signal and a symbolic description of the speech signal. The system attempts to reconstruct the original speech waveform using symbolic processing to help model the signal and to guide reconstruction. The system uses various signal processing algorithms for parameter estimation and reconstruction.

Journal ArticleDOI
TL;DR: A fixed-tap differential pulse code modulation (DPCM) system with a robust backward-adaptive Jayant quantizer is investigated for speech encoding at 16-40 kbits/s using binary phase shift keying over an additive white Gaussian noise channel.
Abstract: A fixed-tap differential pulse code modulation (DPCM) system with a robust backward-adaptive Jayant quantizer is investigated for speech encoding at 16-40 kbits/s using binary phase shift keying over an additive white Gaussian noise channel. The performance of this system becomes unacceptable as the channel bit error rate (P_{b}) approaches 10-2. Using high-rate, long constraint length, self-orthogonal convolutional codes, the DPCM system performance is much-improved for 10^{-4} depending on the transmitted data rate. The use of high-rate (n - 1)/n, n = 2,3,4, , and 5 codes minimizes the number of bits allocated to channel coding, and decoding complexity is reduced by employing self-orthogonal codes which admit threshold decoding. Subjectively, while there is additional quantization noise with channel coding, the irritating popping and squeaking sounds due to channel errors are eliminated.

Proceedings ArticleDOI
01 Mar 1984
TL;DR: This study was prompted by direct observation of the structure of the excitation, where patterns of pulses may be found which are associated with phase-correcting mechanisms of the LPC impulse response, and developed a new multipulse technique based on a special ARMA model formed by cascading an all-pole with anall-pass network.
Abstract: Multipulse LPC, as is often designated the model proposed by Atal and Remae, has been directed towards 9.6 kb/s speech coding. However, at such bit rate, the speech quality is not yet generally acceptable. This paper has a double purpose - one is to investigate the role of short-time phase in multipulse LPC. The other is to look for different modelling structures to be used with this method. This study was prompted by direct observation of the structure of the excitation, where patterns of pulses may be found which are associated with phase-correcting mechanisms of the LPC impulse response. Consequently, a new multipulse technique was developed, based on a special ARMA model formed by cascading an all-pole with an all-pass network. This new model will be referred to as the MAPAP (Multipulse All-Pole All-Pass) method. Another one was tried in which the synthetic speech is formed by combination of several MAPAP signals. We therefore denoted it "multichannel multipulse" method. The potential advantages of both single and multichannel models seem rather promising.

Proceedings ArticleDOI
01 Mar 1984
TL;DR: This paper presents a method of incorporating LPC spectral shape and energy into the codebook entries of the vector quantizer, and finds improvements in recognition accuracy by using the VQ with both LPCshape and energy over that obtained using a VQWith LPC shape alone.
Abstract: The theory of vector quantization (VQ) of linear predictive coding (LPC) coefficients has established a wide variety of techniques for quantizing LPC spectral shape to minimize overall spectral distortion. Such vector quantizers have been widely used in the areas of speech coding and speech recognition. The conventional vector quantizer utilizes only spectral shape information and essentially disregards the energy or gain term associated with the optimal LPC fit to the signal being modelled. In this paper we present a method of incorporating LPC spectral shape and energy into the codebook entries of the vector quantizer. To do this we postulate a distortion measure for comparing two LPC vectors which uses a weighted sum of an LPC shape distortion and a log energy distortion. Based on this combined distortion measure we have designed and studied vector quantizers of several sizes for use in isolated word speech recognition experiments. We have found that a fairly significant correlation exists between LPC shape and signal energy; hence a combined LPC shape plus energy vector quantizer with a given distortion requires far fewer codebook entries than one in which LPC shape and energy are quantized separately. Based on isolated word recognition tests on both a 10-digit and a 129 word airlines vocabulary, we have found improvements in recognition accuracy by using the VQ with both LPC shape and energy over that obtained using a VQ with LPC shape alone.

Proceedings ArticleDOI
19 Mar 1984
TL;DR: For the applications of speech synthesis from speech model parameters, time-scale modification of clean speech, speech enhancement by spectral subtraction, and helium speech enhancement, significant improvement is not gained by using the LSEE-MSTFTM algorithm.
Abstract: In this paper, speech synthesis directly from the processed Short-Time Fourier Transform Magnitude (STFTM) using the LSEE-MSTFTM algorithm [6,7] is compared to more conventional algorithms for several speech processing applications. For the applications considered, the most improvement occurs for time-scale modification of multiple speaker speech and noisy speech since these input signals are not well modeled by the analysis/synthesis system used for comparison. However, for the applications of speech synthesis from speech model parameters, time-scale modification of clean speech, speech enhancement by spectral subtraction, and helium speech enhancement, significant improvement is not gained by using the LSEE-MSTFTM algorithm. Significantly better results are not obtained since a good STFT phase estimate is available and employed in the conventional approaches to these applications.

Proceedings ArticleDOI
01 Mar 1984
TL;DR: A further application for time-alignment algorithms is described, in which replacement dialogue for a film soundtrack may be automatically synchronized to reference dialogue recorded during filming, in a digital signal processing system that uses a DP algorithm.
Abstract: A number of applications exist in basic speech research for Dynamic Programming (DP) algorithms that can produce accurate time registration data for aligning one speech signal with a similar speech signal. In this paper, a further application for time-alignment algorithms is described, in which replacement dialogue for a film soundtrack may be automatically synchronized to reference dialogue recorded during filming. This is being carried out in a digital signal processing system that uses a DP algorithm capable of aligning utterances of indeterminate length accurately and efficiently in real-time. The main features of this system and the DP algorithm will be described.