Showing papers on "Linear predictive coding published in 1984"

PDF

Open Access

Proceedings Article•DOI•

Line spectrum pair (LSP) and speech data compression

[...]

F.K. Soong¹, Biing-Hwang Juang¹•Institutions (1)

19 Mar 1984

TL;DR: An expression for spectral sensitivity with respect to single LSP frequency deviation is derived such that some insight on their quantization effects can be obtained and results on multi-pulse LPC using LSP for spectral information compression are presented.

...read moreread less

Abstract: Line Spectrum Pair (LSP) was first introduced by Itakura [1,2] as an alternative LPC spectral representations. It was found that this new representation has such interesting properties as (1) all zeros of LSP polynomials are on the unit circle, (2) the corresponding zeros of the symmetric and anti-symmetric LSP polynomials are interlaced, and (3) the reconstructed LPC all-pole filter preserves its minimum phase property if (1) and (2) are kept intact through a quantization procedure. In this paper we prove all these properties via a "phase function." The statistical characteristics of LSP frequencies are investigated by analyzing a speech data base. In addition, we derive an expression for spectral sensitivity with respect to single LSP frequency deviation such that some insight on their quantization effects can be obtained. Results on multi-pulse LPC using LSP for spectral information compression are finally presented.

...read moreread less

506 citations

Journal Article•DOI•

Product code vector quantizers for waveform and voice coding

[...]

M. J. Sabin¹, Robert M. Gray¹•Institutions (1)

Stanford University¹

01 Jun 1984-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: Several algorithms are presented for the design of shape-gain vector quantizers based on a traning sequence of data or a probabilistic model, and their performance is compared to that of previously reported vector quantization systems.

...read moreread less

Abstract: Memory and computation requirements imply fundamental limitations on the quality that can be achieved in vector quantization systems used for speech waveform coding and linear predictive voice coding (LPC). One approach to reducing storage and computation requirements is to organize the set of reproduction vectors as the Cartesian product of a vector codebook describing the shape of each reproduction vector and a scalar codebook describing the gain or energy. Such shape-gain vector quantizers can be applied both to waveform coding using a quadratic-error distortion measure and to voice coding using an Itakura-Saito distortion measure. In each case, the minimum distortion reproduction vector can be found by first selecting a shape code-word, and then, based on that choice, selecting a gain codeword. Several algorithms are presented for the design of shape-gain vector quantizers based on a traning sequence of data or a probabilistic model. The algorithms are used to design shape-gain vector quantizers for both the waveform coding and voice coding application. The quantizers are simulated, and their performance is compared to that of previously reported vector quantization systems.

...read moreread less

305 citations

Proceedings Article•DOI•

Improving performance of multi-pulse LPC coders at low bit rates

[...]

Sharad Singhal¹, B. S. Atal²•Institutions (2)

Bell Labs¹, AT&T²

19 Mar 1984

TL;DR: This paper focuses on problems encountered in attempting to maintain speech quality while synthesizing speech using multi-pulse excitation at lower bit rates.

...read moreread less

Abstract: The multi-pulse excitation model provides a method for producing natural-sounding speech at medium to low bit rates. Multi-pulse analysis obtains the all-pole filter excitation by minimizing a spectrally-weighted mean-squared error between the original and synthetic speech signals. Although the method provides high quality speech around 10 kbits/sec, speech quality suffers if the bit rate is lowered. In this paper, we focus on problems encountered in attempting to maintain speech quality while synthesizing speech using multi-pulse excitation at lower bit rates.

...read moreread less

163 citations

Patent•

Simultaneous transmission of speech and data over an analog channel

[...]

Nash Randy David¹, Wai-Choong Wong¹•Institutions (1)

Bell Labs¹

02 Apr 1984

TL;DR: In this paper, a technique for transmitting an entire analog speech signal (S) and a modulated data signal (D(t)) over a transmission channel (20) such as a common analog telephone speech channel was proposed.

...read moreread less

Abstract: A technique for transmitting an entire analog speech signal (S(t)) and a modulated data signal (D(t)) over a transmission channel (20) such as a common analog telephone speech channel. The present technique multiplexes the entire modulated data signal within the normal analog speech signal frequency band where the speech is present and its signal power density characteristic is at a low level. Separation of the speech and data signals at the receiver (30) is effected by recovering the modulation carrier frequency (fc) and demodulating (33) the receiver signal (X(t)) to recover the data signal. The data signal is then remodulated (34) with the recovered carrier and is convolved with an arbitrary channel impulse response in an adaptive filter (35) whose output signal is subtracted (37) from the received composite data and speech signal (X(t)) to generate the recovered speech signal (S(t)). To improve the recovered speech signal, a least mean square algorithm (36) is used to update the arbitrary channel impulse response output signal of the adaptive filter (35).

...read moreread less

151 citations

Proceedings Article•DOI•

Variable-frequency synthesis: An improved harmonic coding scheme

[...]

Luís B. Almeida¹, Fernando M. Silva•Institutions (1)

Instituto Superior Técnico¹

01 Mar 1984

TL;DR: Harmonic Coding is synthesized in the time domain, as a superimposition of "harmonics" whose instantaneous frequency varies continuously along an interpolation curve, within each frame, so that fast pitch variations can be tracked with no difficulty.

...read moreread less

Abstract: The Harmonic Coding concept has already shown its potential for efficiently coding speech. Previous implementations have usec a frame rate of one every 16 ms. This was mainly due to the fact that, with longer frames, even a nonstationary spectral model (of low order) cannot reproduce the zones of fast-varying pitch with the desirable quality. However, the high framing rate is a limitation, since it implies that fewer bits will be available for encoding each frame. A solution for this problem has been devised: the signal is synthesized in the time domain, as a superimposition of "harmonics" whose instantaneous frequency varies continuously along an interpolation curve, within each frame. In this way, fast pitch variations can be tracked with no difficulty. Experimental results are presented, confirming these facts. The integration of this synthesis scheme in a speech coder is discussed.

...read moreread less

98 citations

Proceedings Article•DOI•

A glottal LPC-vocoder

[...]

Per Hedelin¹•Institutions (1)

Chalmers University of Technology¹

01 Mar 1984

TL;DR: It is found that the additional glottal parameters can be coded effectively such that the total bit rate is in the same range as for conventional LPC.

...read moreread less

Abstract: A procedure is suggested for improving LPC speech quality. The central theme is to introduce a parametric model of voiced excitation - a glottal source model. In the analysis this allows for a different method than the AR-estimation used in conventional LPC. Here, a method known as AR-X-estimation is used. A complete analysis and coding method is presented. It is found that the additional glottal parameters can be coded effectively such that the total bit rate is in the same range as for conventional LPC. The glottal LPC-vocoder does significantly improve synthesis quality as compared to standard LPC. It should be emphasized, however, that the glottal vocoder requires high quality speech as input, recorded in a phase linear system. Moreover, the computational complexity is high.

...read moreread less

56 citations

Journal Article•DOI•

Speech-quality assessment methods for speech-coding systems

[...]

Nobuhiko Kitawaki, Masaaki Honda, Kenzo Itoh

01 Oct 1984-IEEE Communications Magazine

50 citations

Proceedings Article•DOI•

Efficient computation and encoding of the multipulse excitation for LPC

[...]

M. Berouti¹, H. Garten, Peter Kabal, P. Mermelstein•Institutions (1)

bell northern research¹

01 Mar 1984

TL;DR: A computationally efficient formulation is derived for both covariance and correlation type analyses for multipulse coding of speech, ranging from a purely sequential one to one which reoptimizes pulse amplitudes at each step.

...read moreread less

Abstract: This paper discusses the analysis techniques used to derive the excitation waveform for multipulse coding of speech. A computationally efficient formulation is derived for both covariance and correlation type analyses. These methods differ in the way block edges are treated. Several methods for pulse amplitude and position determination are given, ranging from a purely sequential one to one which reoptimizes pulse amplitudes at each step. It is shown that the reoptimization scheme has a nested structure that allows a reduction in the computations. An efficient method for pulse position coding is given. This method can essentially achieve the entropy limit for randomly placed pulses. Experimental results are given for typical configurations including computational requirements and speech quality assessments.

...read moreread less

48 citations

Proceedings Article•DOI•

The harmonic magnitude suppression (EMS) technique for intelligibility enhancement in the presence of interfering speech

[...]

B. Hanson, D. Wong

01 Mar 1984

TL;DR: Algorithms based on spectral subtraction are developed for improving the intelligibility of speech that has been interfered by a second talker's voice, and significant gain in intelligibility for low signal-to-noise ratio conditions is achieved.

...read moreread less

Abstract: Algorithms based on spectral subtraction are developed for improving the intelligibility of speech that has been interfered by a second talker's voice. A number of new properties of spectral subtraction are shown, including the effects of phase on the output speech intelligibility, and the choice of magnitude spectral differences for best results. A harmonic extraction algorithm is also developed. Results of formal testing on the final system show that significant gain in intelligibility for low signal-to-noise ratio conditions is achieved.

...read moreread less

45 citations

Patent•DOI•

Speech recognition method including biased principal components

[...]

Peter F. Brown¹•Institutions (1)

ExxonMobil¹

27 Mar 1984-Journal of the Acoustical Society of America

TL;DR: In this article, a speech recognition method and apparatus employ a speech processing circuitry for repetitively deriving from a speech input, at a frame repetition rate, a plurality of acoustic parameters.

...read moreread less

Abstract: A speech recognition method and apparatus employ a speech processing circuitry for repetitively deriving from a speech input, at a frame repetition rate, a plurality of acoustic parameters. The acoustic parameters represent the speech input signal for a frame time. A plurality of template matching and cost processing circuitries are connected to a system bus, along with the speech processing circuitry, for determining, or identifying, the speech units in the input speech, by comparing the acoustic parameters with stored template patterns. The apparatus can be expanded by adding more template matching and cost processing circuitry to the bus thereby increasing the speech recognition capacity of the apparatus. The speech processing circuitry establishes overlapping time durations for generating the acoustic parameters and further employs a sinc-Kaiser smoothing function in combination with a folding technique for providing a discrete Fourier transform. The Fourier spectra are transformed using a biased principal component analysis which optimizes the across class variance. The template matching and cost processing circuitries provide distributed processing, on demand, of the acoustic parameters for generating through a dynamic programming technique the recognition decision.

...read moreread less

42 citations

Patent•

Method and means of determining coefficients for linear predictive coding

[...]

Ira A. Gerson¹•Institutions (1)

Motorola¹

28 Dec 1984

TL;DR: In this paper, an improved method and means of determining reflection coefficients that characterize an electrical signal that obtains characteristics of an all-zero inverse lattice filter was proposed, where the reflection coefficients were obtained by filtering the signal, sample the filtered signal, obtaining the elements of a correlation array from the samples, initializing values of arrays forward residuals, backward residuals and cross correlation of residuals.

...read moreread less

Abstract: An improved method and means of determining reflection coefficients that characterize an electrical signal that obtains characteristics of an all-zero inverse lattice filter. The reflection coefficients are obtained by filtering the signal, sample the filtered signal, obtaining the elements of a correlation array from the samples, initializing values of arrays forward residuals, backward residuals, and cross correlation of residuals, combining array elements to obtain a first reflection coefficient, removing from the forward, backward and cross-correlation arrays the effect of the first reflection coefficient, calculating from the revised arrays a second coefficient, and repeating the calculations to the desired order. In a second embodiment of the present invention, samples are selected from the digitized signal and multiplied by a windowing function. The windowed samples are used to derive values of an autocorrelation array which eliminates the need for both forward and backward arrays as in the first embodiment of the invention.

...read moreread less

Proceedings Article•DOI•

Spectral envelope sampling and interpolation in linear predictive analysis of speech

[...]

Hynek Hermansky, Hidehiko Fujisaki¹, Y. Sato²•Institutions (2)

University of Tokyo¹, Fujitsu²

19 Mar 1984

TL;DR: It is shown, on analyses of both synthetic and natural speech, that the averaged parabolic approximation between harmonic peaks of voiced speech spectrum reduces the sensitivity of the LP analysis to changes in the fundamental frequency Fo and to noise.

...read moreread less

Abstract: In spite of its extensive use, speech analysis based on linear prediction (LP) is liable to various causes of inaccuracy. This paper presents a novel approach to improve the accuracy in the estimation of the voiced speech production model based on the LP method. The presented method uses interpolation between spectral points which are least influenced by artifacts in the spectral analysis and by noise in the signal. We show, on analyses of both synthetic and natural speech, that the averaged parabolic approximation between harmonic peaks of voiced speech spectrum reduces the sensitivity of the LP analysis to changes in the fundamental frequency Fo and to noise. The method is well suited for combination with the Spectral Transform LP method, previously proposed by the authors [1].

...read moreread less

Proceedings Article•DOI•

Fully vector-quantized subband coding with adaptive codebook allocation

[...]

Allen Gersho¹, T. Ramstad, I. Versvik•Institutions (1)

University of California, Santa Barbara¹

01 Mar 1984

TL;DR: Simulation results demonstrate that vector quantization offers a distinct perceptual improvement compared with scalar quantization of the same subband signals and side information for the same total bit rate.

...read moreread less

Abstract: Vector quantization (VQ) is examined as a technique to enhance performance in subband coding of speech at 9.6 kb/s. The set of short-term subband power levels is vector quantized, providing low-rate side information to control the coding of the subband signals. Each subband signal is then vector quantized with variable size codebooks that are dynamically assigned by the quantized side information. Two versions are described, a 7-band coder and a 14-band coder. Simulation results demonstrate that vector quantization offers a distinct perceptual improvement compared with scalar quantization of the same subband signals and side information for the same total bit rate.

...read moreread less

Journal Article•DOI•

An enhanced LPC vocoder with no voiced/Unvoiced switch

[...]

Soon Kwon, A. Goldberg

01 Aug 1984-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A modified LPC system (LPC plus) which requires no voiced/unvoiced switch at the synthesizer is presented in this paper and produces synthesized speech which is more natural and intelligible than that produced by conventional LPC.

...read moreread less

Abstract: A modified LPC system (LPC plus) which requires no voiced/unvoiced switch at the synthesizer is presented in this paper. The excitation functions of the synthesizer filter are modeled to be the sum of the conventional pulse and noise sources. The mixture ratio is estimated from the LPC residual error signal, and this parameter controls the amplitudes of the pulse and noise sources. Since the V/UV switch is eliminated, this system produces robust speech in highly noisy environments, while a conventional LPC system produces degraded speech due to voicing errors. In addition, this technique has been applied to the speech of two simultaneous speakers and produces synthesized speech which is more natural and intelligible than that produced by conventional LPC.

...read moreread less

Proceedings Article•DOI•

A performance comparison of pitch extraction algorithms for noisy speech

[...]

K. Oh¹, C. Un•Institutions (1)

KAIST¹

19 Mar 1984

TL;DR: It has been found that for pitch detection of noisy speech the algorithm that uses an AMDF or an autocorrelation function yields relatively good performance than others.

...read moreread less

Abstract: Results of a performance comparison study of eight pitch extraction algorithms for noisy as well as clean speech are presented. These algorithms are the autocorrelation method with center clipping, the autocorrelation method with modified center clipping, the simplified inverse filter tracking (SIFT) method, the average magnitude difference function (AMDF) method, the pitch detection method based on LPC inverse filtering and AMDF, the data reduction method, the parallel processing method and the cepstrum method. It has been found that for pitch detection of noisy speech the algorithm that uses an AMDF or an autocorrelation function yields relatively good performance than others. A pitch detector that uses center clipped speech as an input signal is effective in pitch extraction of noisy speech. In general, preprocessing such as LPC inverse filtering or center clipping of input speech yields remarkable improvement in pitch detection.

...read moreread less

Journal Article•DOI•

Maximum likelihood spectral estimation and its application to narrow-band speech coding

[...]

R.J. McAulay¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Apr 1984-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: In this article, the authors used the maximum likelihood (ML) method to derive a spectral matching criterion for autoregressive (i.e., all-pole) random processes.

...read moreread less

Abstract: Itakura and Saito [1] used the maximum likelihood (ML) method to derive a spectral matching criterion for autoregressive (i.e., all-pole) random processes. In this paper, their results are generalized to periodic processes having arbitrary model spectra. For the all-pole model, Kay's [2] covariance domain solution to the recursive ML (RML) problem is cast into the spectral domain and used to obtain the RML solution for periodic processes. When applied to speech, this leads to a method for solving the joint pitch and spectrum envelope estimation problems. It is shown that if the number of frequency power measurements greatly exceeds the model order, then the RML algorithm reduces to a pitch-directed, frequency domain version of linear predictive (LP) spectral analysis. Experiments on a real-time vocoder reveals that the RML synthetic speech has the quality of being heavily smoothed.

...read moreread less

Patent•DOI•

Speech recognition training method

[...]

James K. Baker¹, John W. Klovstad¹, Chin-hui Lee¹, Kalyan Ganesan¹•Institutions (1)

ExxonMobil¹

27 Mar 1984-Journal of the Acoustical Society of America

...read moreread less

Abstract: A speech recognition method and apparatus employ a speech processing circuitry for repetitively deriving from a speech input, at a frame repetition rate, a plurality of acoustic parameters. The acoustic parameters represent the speech input signal for a frame time. A plurality of template matching and cost processing circuitries are connected to a system bus, along with the speech processing circuitry, for determining, or identifying, the speech units in the input speech, by comparing the acoustic parameters with stored template patterns. The apparatus can be expanded by adding more template matching and cost processing circuitry to the bus thereby increasing the speech recognition capacity of the apparatus. Template pattern generation is advantageously aided by using a "joker" word to specify the time boundaries of utterances spoken in isolation, by finding the beginning and ending of an utterance surrounded by silence.

...read moreread less

Patent•

Method and system for encoding digital speech information

[...]

Panagiotis E. Papamichalis¹•Institutions (1)

Texas Instruments¹

11 May 1984

TL;DR: In this article, Markov models are applied to quantized speech parameters to represent their time behavior in a probabilistic manner, which is accomplished by representing the quantised speech parameters as finite state machines having predetermined matrices of transitional probabilities from which the conditional probabilities as to the quantization of successive speech data frames are established.

...read moreread less

Abstract: Method and system for encoding digital speech information to characterize spoken human speech with an optimally reduced speech data rate while retaining speech quality in the audible reproduction of the encoded digital speech information. Markov modeling is applied to quantized speech parameters to represent their time behavior in a probabilistic manner. This is accomplished by representing the quantized speech parameters as finite state machines having predetermined matrices of transitional probabilities from which the conditional probabilities as to the quantized speech parameter values of successive speech data frames are established. The probabilistic description as so obtained is then used to represent the respective quantized values of the speech parameters by a digital code through Huffman coding in which digital codewords of variable length represent the quantized speech parameter values in accordance with their probability of occurrence such that more probable quantized values are assigned digital codewords of a shorter bit length while less probable quantized values are assigned digital codewords of a longer bit length.

...read moreread less

Proceedings Article•DOI•

Experimental evaluation of different approaches to the multi-pulse coder

[...]

P. Kroon¹, E. Deprettere•Institutions (1)

Delft University of Technology¹

01 Mar 1984

TL;DR: It is demonstrated that the inclusion of a pitch detector significantly improves the perceived quality of the synthetic speech and a modification of the original algorithm is described, resulting in a lower complexity, and a speech quality close to the results obtained with the original algorithms.

...read moreread less

Abstract: We report on the results obtained from simulations of the Multi-Pulse Excitation Coder as proposed by Atal and Remde [4]. We investigated the effects of the different analysis parameters on the resulting synthetic speech signals, using objective and subjective tests. We compared the in [4] proposed sub-optimal solutions with another sub-optimal solution based on an orthogonalization of the solution space and found that the original proposed solutions are reasonable choices. We demonstrate that the inclusion of a pitch detector significantly improves the perceived quality of the synthetic speech. We also describe a modification of the original algorithm, resulting in a lower complexity, and a speech quality close to the results obtained with the original algorithm.

...read moreread less

Journal Article•DOI•

A vector quantizer combining energy and LPC parameters and its application to isolated word recognition

[...]

Lawrence R. Rabiner¹, Man Mohan Sondhi¹, Stephen E. Levinson¹•Institutions (1)

Bell Labs¹

06 May 1984-AT&T Bell Laboratories technical journal

TL;DR: This paper presents a method of incorporating LPC spectral shape and energy into the code-book entries of the vector quantizer using a distortion measure for comparing two LPC vectors that uses the weighted sum of an LPC shape distortion and a log energy distortion.

...read moreread less

Abstract: The theory of vector quantization (VQ) of linear predictive coding (LPC) coefficients has established a wide variety of techniques for quantizing LPC spectral shape to minimize overall spectral distortion. Such vector quantizers have been widely used in the areas of speech coding and speech recognition. The conventional vector quantizer utilizes only spectral shape information and essentially disregards the energy or gain term associated with the optimal LPC fit to the signal being modeled. In this paper we present a method of incorporating LPC spectral shape and energy into the code-book entries of the vector quantizer. To do this, we postulate a distortion measure for comparing two LPC vectors that uses the weighted sum of an LPC shape distortion and a log energy distortion. Based on this combined distortion measure, we have designed and studied vector quantizers of several sizes for use in isolated word speech recognition experiments. We found that a fairly significant correlation exists between LPC shape and signal energy. Hence, an LPC shape combined with energy vector quantizer with a given distortion requires far fewer code-book entries than one in which LPC shape and energy are quantized separately. Based on isolated word recognition tests on both a 10-digit and a 129-word airlines vocabulary, we found improvements in recognition accuracy by using the VQ with both LPC shape and energy over that obtained using a VQ with LPC shape alone.

...read moreread less

Proceedings Article•DOI•

Knowledge based speech analysis and enhancement

[...]

C. Myers¹, Alan V. Oppenheim, Randall Davis, W. Dove•Institutions (1)

Massachusetts Institute of Technology¹

01 Mar 1984

TL;DR: A system for speech analysis and enhancement which combines signal processing and symbolic processing in a closely coupled manner and attempts to reconstruct the original speech waveform using symbolic processing to help model the signal and to guide reconstruction.

...read moreread less

Abstract: This paper describes a system for speech analysis and enhancement which combines signal processing and symbolic processing in a closely coupled manner. The system takes as input both a noisy speech signal and a symbolic description of the speech signal. The system attempts to reconstruct the original speech waveform using symbolic processing to help model the signal and to guide reconstruction. The system uses various signal processing algorithms for parameter estimation and reconstruction.

...read moreread less

Proceedings Article•DOI•

A study on short-time phase and multipulse LPC

[...]

Isabel Trancoso¹, R. Garcia-Gomez, José Tribolet•Institutions (1)

Instituto Superior Técnico¹

01 Mar 1984

TL;DR: This study was prompted by direct observation of the structure of the excitation, where patterns of pulses may be found which are associated with phase-correcting mechanisms of the LPC impulse response, and developed a new multipulse technique based on a special ARMA model formed by cascading an all-pole with anall-pass network.

...read moreread less

Abstract: Multipulse LPC, as is often designated the model proposed by Atal and Remae, has been directed towards 9.6 kb/s speech coding. However, at such bit rate, the speech quality is not yet generally acceptable. This paper has a double purpose - one is to investigate the role of short-time phase in multipulse LPC. The other is to look for different modelling structures to be used with this method. This study was prompted by direct observation of the structure of the excitation, where patterns of pulses may be found which are associated with phase-correcting mechanisms of the LPC impulse response. Consequently, a new multipulse technique was developed, based on a special ARMA model formed by cascading an all-pole with an all-pass network. This new model will be referred to as the MAPAP (Multipulse All-Pole All-Pass) method. Another one was tried in which the synthetic speech is formed by combination of several MAPAP signals. We therefore denoted it "multichannel multipulse" method. The potential advantages of both single and multichannel models seem rather promising.

...read moreread less

Proceedings Article•DOI•

A speech direction finder

[...]

D. Fischell¹, C. Coker•Institutions (1)

Bell Labs¹

19 Mar 1984

TL;DR: The speech direction finder described here is a relatively simple device based on an off the shelf microcomputer which can provide the direction to a talker to within 3 degrees of azimuth angle on a single spoken syllable.

...read moreread less

Abstract: The speech direction finder described here is a relatively simple device based on an off the shelf microcomputer. It can provide the direction to a talker to within 3 degrees of azimuth angle on a single spoken syllable, will only respond to speech, and when used with Wallace linear array microphones can provide this at distances of 50 feet or more. There are numerous applications for the device which may enhance the quality of audio and video teleconferences.

...read moreread less

Proceedings Article•DOI•

A vector quantizer incorporating both LPC shape and energy

[...]

Lawrence R. Rabiner¹, Man Mohan Sondhi, Stephen E. Levinson•Institutions (1)

Bell Labs¹

01 Mar 1984

TL;DR: This paper presents a method of incorporating LPC spectral shape and energy into the codebook entries of the vector quantizer, and finds improvements in recognition accuracy by using the VQ with both LPCshape and energy over that obtained using a VQWith LPC shape alone.

...read moreread less

Abstract: The theory of vector quantization (VQ) of linear predictive coding (LPC) coefficients has established a wide variety of techniques for quantizing LPC spectral shape to minimize overall spectral distortion. Such vector quantizers have been widely used in the areas of speech coding and speech recognition. The conventional vector quantizer utilizes only spectral shape information and essentially disregards the energy or gain term associated with the optimal LPC fit to the signal being modelled. In this paper we present a method of incorporating LPC spectral shape and energy into the codebook entries of the vector quantizer. To do this we postulate a distortion measure for comparing two LPC vectors which uses a weighted sum of an LPC shape distortion and a log energy distortion. Based on this combined distortion measure we have designed and studied vector quantizers of several sizes for use in isolated word speech recognition experiments. We have found that a fairly significant correlation exists between LPC shape and signal energy; hence a combined LPC shape plus energy vector quantizer with a given distortion requires far fewer codebook entries than one in which LPC shape and energy are quantized separately. Based on isolated word recognition tests on both a 10-digit and a 129 word airlines vocabulary, we have found improvements in recognition accuracy by using the VQ with both LPC shape and energy over that obtained using a VQ with LPC shape alone.

...read moreread less

Proceedings Article•DOI•

Speech synthesis from short-time Fourier transform magnitude and its application to speech processing

[...]

D. Griffin¹, D. Deadrick, Jae S Lim•Institutions (1)

Massachusetts Institute of Technology¹

19 Mar 1984

TL;DR: For the applications of speech synthesis from speech model parameters, time-scale modification of clean speech, speech enhancement by spectral subtraction, and helium speech enhancement, significant improvement is not gained by using the LSEE-MSTFTM algorithm.

...read moreread less

Abstract: In this paper, speech synthesis directly from the processed Short-Time Fourier Transform Magnitude (STFTM) using the LSEE-MSTFTM algorithm [6,7] is compared to more conventional algorithms for several speech processing applications. For the applications considered, the most improvement occurs for time-scale modification of multiple speaker speech and noisy speech since these input signals are not well modeled by the analysis/synthesis system used for comparison. However, for the applications of speech synthesis from speech model parameters, time-scale modification of clean speech, speech enhancement by spectral subtraction, and helium speech enhancement, significant improvement is not gained by using the LSEE-MSTFTM algorithm. Significantly better results are not obtained since a good STFT phase estimate is available and employed in the conventional approaches to these applications.

...read moreread less

Proceedings Article•DOI•

Use of dynamic programming for automatic synchronization of two similar speech signals

[...]

P. Bloom

01 Mar 1984

TL;DR: A further application for time-alignment algorithms is described, in which replacement dialogue for a film soundtrack may be automatically synchronized to reference dialogue recorded during filming, in a digital signal processing system that uses a DP algorithm.

...read moreread less

Abstract: A number of applications exist in basic speech research for Dynamic Programming (DP) algorithms that can produce accurate time registration data for aligning one speech signal with a similar speech signal. In this paper, a further application for time-alignment algorithms is described, in which replacement dialogue for a film soundtrack may be automatically synchronized to reference dialogue recorded during filming. This is being carried out in a digital signal processing system that uses a DP algorithm capable of aligning utterances of indeterminate length accurately and efficiently in real-time. The main features of this system and the DP algorithm will be described.

...read moreread less

Proceedings Article•DOI•

An endpoint detector for LPC speech using residual error look-ahead for vector quantization applications

[...]

Chieh Tsao¹, Robert M. Gray•Institutions (1)

Stanford University¹

01 Mar 1984

TL;DR: An end-point detector for LPC speech using squared prediction error look-ahead and automatic/manual threshold determination is described, which is relatively immune to transient pulses and various low-level noises, yet preserves low- level speech sounds such as weak fricatives to a significant extent under moderate noise conditions.

...read moreread less

Abstract: An end-point detector for LPC speech using squared prediction error look-ahead and automatic/manual threshold determination is described. The detector is algorithmically simple, computationally efficient,and uses only one decision parameter. Preliminary tests indicate that it is relatively immune to transient pulses and various low-level noises, yet preserves low-level speech sounds such as weak fricatives to a significant extent under moderate noise conditions. Tests indicate that 93.8% of automatically determined endpoints agree to within two frames of manually determined endpoints. The detector is especially suitable for use in vector-quantization based LPC systems, where the squared prediction error is easily available.

...read moreread less

Proceedings Article•DOI•

Low bit rate speech enhancement using a new method of multiple impulse excitation

[...]

A. Parker¹, S. Alexander, H. Trussell•Institutions (1)

North Carolina State University¹

01 Mar 1984

TL;DR: A simple method is presented for extracting the amplitudes and locations of a multiple impulse excitation model which allows a more accurate recomputation of the autoregressive coefficients based upon incorporating the multipulse excitation.

...read moreread less

Abstract: One of the sources of degradation in LPC-synthesized speech is the mechanical quality due to a single impulse excitation per pitch period. This paper presents a simple method for extracting the amplitudes and locations of a multiple impulse excitation model. These multipulse parameters are obtained very easily from the autoregressive (LPC) residual. Additionally, a method is developed which allows a more accurate recomputation of the autoregressive coefficients based upon incorporating the multipulse excitation.

...read moreread less

Book Chapter•DOI•

Linear Predictive Coding

[...]

Alan Bundy¹, Lincoln Wallen•Institutions (1)

University of Edinburgh¹

01 Jan 1984

TL;DR: If one approximates the vocal tract as a series of fixed length tubes (which is equivalent to representing it as an all-pole digital filter) it becomes possible to predict successive samples of the speech wave as linear combinations of previous samples.

...read moreread less

Abstract: If one approximates the vocal tract as a series of fixed length tubes (which is equivalent to representing it as an all-pole digital filter) it becomes possible to predict successive samples of the speech wave as linear combinations of previous samples. The coefficients in the linear combination characterize the shape of the vocal tract. A sequence of sets of coefficients can be used to characterize the changing shape of the vocal tract over time. This representation is widely used because of the particularly efficient algorithms associated with it.

...read moreread less

Patent•

Arrangement for equalizing the variable attenuation of a signal on a communication line

[...]

Dimitrios P. Prezas¹, Nancy M. Saraf¹•Institutions (1)

Bell Labs¹

05 Mar 1984

TL;DR: In this article, a digital arrangement utilizing linear predictive coding for equalizing over a desired frequency spectrum the variable attenuation of a voice-frequency message signal transmitted on a communication line is presented.

...read moreread less

Abstract: Disclosed is a digital arrangement utilizing linear predictive coding for equalizing over a desired frequency spectrum the variable attenuation of a voice-frequency message signal transmitted on a communication line. The arrangement comprises a digital signal processor, program memories for storing program instruction sets, and a data memory for storing samples of a spectrally white test signal that has been variably attenuated by the line. Under control of one instruction set that incorporates linear predictive coding, the processor uses the stored test samples to calculate the reflection coefficients of the line that characterize the variable attenuation of a signal on the line. Under the control of the other instruction set, the processor functions as a digital inverse filter employing the calculated reflection coefficients for equalizing over the desired frequency spectrum the variable attenuation of a voice-frequency message signal transmitted on the line.

...read moreread less