Showing papers on "Linear predictive coding published in 1989"

PDF

Open Access

Book•

Digital speech processing, synthesis, and recognition

[...]

01 Jan 1989

TL;DR: This paper presents principal characteristics of speech speech production models speech analysis and analysis-synthesis systems linear predictive coding (LPC) analysis speech coding speech synthesis speech recognition future directions of speech processing.

...read moreread less

Abstract: Principal characteristics of speech speech production models speech analysis and analysis-synthesis systems linear predictive coding (LPC) analysis speech coding speech synthesis speech recognition future directions of speech processing. Appendices: convolution and z-transform vector quantization algorithm neural nests.

...read moreread less

307 citations

Journal Article•DOI•

A family of distortion measures based upon projection operation for robust speech recognition

[...]

D. Mansour¹, Biing-Hwang Juang¹•Institutions (1)

Bell Labs¹

01 Nov 1989-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: It is found that the orientation (or direction) of the cepstral vector is less susceptible to noise perturbation than the vector norm, and a family of distortion measures based on the projection between two cEPstral vectors is proposed, which have the same computational efficiency as the band-pass cepStral distortion measure.

...read moreread less

Abstract: Consideration is given to the formulation of speech similarity measures, a fundamental component in recognizer designs, that are robust to the change of ambient conditions. The authors focus on the speech cepstrum derived from linear prediction coefficients (the LPC cepstrum). By using some common models for noisy speech, they show analytically that additive white noise reduces the norm (length) of the LPC cepstral vectors. Empirical observations on the parameter histograms not only confirm the analytical results through the use of noise models but further reveal that at a given (global) signal-to-noise ratio (SNR), the norm reduction on cepstral vectors with larger norms is generally less than on vectors with smaller norms, and that lower order coefficients are more affected than higher order terms. In addition, it is found that the orientation (or direction) of the cepstral vector is less susceptible to noise perturbation than the vector norm. As a consequence of the above results, a family of distortion measures based on the projection between two cepstral vectors is proposed. The new measures have the same computational efficiency as the band-pass cepstral distortion measure. >

...read moreread less

166 citations

Journal Article•DOI•

Quality improvement of LPC-processed noisy speech by using spectral subtraction

[...]

G.S. Kang¹, L.J. Fransen¹•Institutions (1)

United States Naval Research Laboratory¹

01 Jun 1989-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The authors use the diagnostic acceptability measure (DAM) to evaluate speech quality of the latest 2400-b/s linear-predictive coder (LPC) with a noise suppressor at the front end and used a spectral subtraction technique for noise suppression.

...read moreread less

Abstract: Numerous noise-suppression techniques have been developed for operating at the front end of low-bit-rate digital voice terminals. Some of these techniques have been evaluated by standardized intelligibility tests such as the diagnostic rhyme test (DRT). It is well known that the use of a noise suppressor seldom improves the DRT score even though listeners have had the impression that speech quality was enhanced. Unfortunately, noise suppressors have only occasionally been evaluated by standardized quality tests. The authors supplement quality test data for reference purposes. They use the diagnostic acceptability measure (DAM) to evaluate speech quality of the latest 2400-b/s linear-predictive coder (LPC) with a noise suppressor at the front end. They used a spectral subtraction technique for noise suppression. Ten different sets of noisy speech recorded at actual military platforms (such as a helicopter, tank, turboprop, helicopter carrier, or jeep) were input sources. The magnitude of the DAM improvement is substantial: as much as six points on the average, which is large enough to upgrade speech quality somewhat. >

...read moreread less

163 citations

Proceedings Article•DOI•

Filtering of colored noise for speech enhancement and coding

[...]

B. Koo¹, Jerry D. Gibson¹, S.D. Gray•Institutions (1)

Texas A&M University¹

23 May 1989

TL;DR: The colored-noise prefilter greatly enhances the quality and intelligibility of LPC output speech for noisy inputs, and it is demonstrated that such gains are unavailable with white noise assumption Kalman and Wiener filters.

...read moreread less

Abstract: A report is presented on experiments using a colored-noise assumption Kalman filter to enhance speech additively contaminated by colored noise, such as helicopter noise and jeep noise, with a particular application to linear predictive coding (LPC) of noisy speech. The results indicate that the colored-noise Kalman filter provides a significant gain in SNR, a clear improvement in the sound spectrogram, and an audible improvement in output speech quality. The authors demonstrate that such gains are unavailable with white noise assumption Kalman and Wiener filters. The colored-noise prefilter greatly enhances the quality and intelligibility of LPC output speech for noisy inputs. >

...read moreread less

132 citations

Patent•DOI•

Speech analysis and synthesis system

[...]

Kazunori Ozawa¹•Institutions (1)

NEC¹

30 May 1989-Journal of the Acoustical Society of America

TL;DR: A speech analysis and synthesis system operates to determine a sound source signal for the entire interval of each speech unit which is to be used for speech synthesis, according to a spectrum parameter obtained from each speech units based on cepstrum as discussed by the authors.

...read moreread less

Abstract: A speech analysis and synthesis system operates to determine a sound source signal for the entire interval of each speech unit which is to be used for speech synthesis, according to a spectrum parameter obtained from each speech unit based on cepstrum. The sound source signal and the spectrum parameter are stored for each speech unit. Speech is synthesized according to the spectrum parameter while controlling prosody of the sound source signal. The spectrum of the synthesized speech is compensated through filtering based on cepstrum.

...read moreread less

130 citations

Journal Article•DOI•

Automatic and reliable estimation of glottal closure instant and period

[...]

Y.M. Cheng¹, Douglas O'Shaughnessy¹•Institutions (1)

Institut national de la recherche scientifique¹

01 Dec 1989-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The proposed pitch estimation only needs a very short frame length and gives accurate results at voice onset and the results show that the algorithm works in noise-free, noisy, and very noisy signals for vowels as well as for voiced constants.

...read moreread less

Abstract: The authors present an automatic and reliable algorithm for determining glottal closure instant (GCI). As a byproduct, nonstationary fundamental period estimation is achieved. The computation includes twelve-pole speech linear-prediction analysis, cross correlation, and both direct and inverse fast Fourier transforms (or a convolution). Maximum-likelihood epoch determination is used as the basis for locating GCIs, and the Hilbert transformation is applied to improve performance and reliability. A description of the system and the voiced/unvoiced/mixed (V/UV/M) decision procedure is given. The results show that the algorithm works in noise-free, noisy, and very noisy signals for vowels as well as for voiced constants. The proposed pitch estimation only needs a very short frame length and gives accurate results at voice onset. >

...read moreread less

126 citations

Journal Article•DOI•

Amplitude optimization and pitch prediction in multipulse coders

[...]

Sharad Singhal, B. S. Atal¹•Institutions (1)

AT&T¹

01 Mar 1989-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The algorithm provides a framework for computing multipulse excitation with varying degrees of optimization and computational complexity and finds that speech quality depends on the pulse rate and female speech requires a higher pulse rate than male speech.

...read moreread less

Abstract: Although the multipulse model is conceptually simple, the problem of locating the pulses is computationally complex. The authors discuss the basic multipulse model and describe a procedure to compute the excitation with optimally adjusted amplitudes. The algorithm provides a framework for computing multipulse excitation with varying degrees of optimization and computational complexity. The authors find that speech quality depends on the pulse rate. They also find that for the same quality, female speech requires a higher pulse rate than male speech. The pitch dependence can be reduced and speech quality improved for high-pitched speakers by incorporating long delay prediction in the multipulse model. >

...read moreread less

119 citations

Proceedings Article•DOI•

Spectral quantization and interpolation for CELP coders

[...]

Bishnu S. Atal¹, R. Cox¹, Peter Kroon•Institutions (1)

Bell Labs¹

23 May 1989

TL;DR: The authors present results on the comparative performance of nonuniform scalar quantizers using three different LPC (linear predictive coding) representations: the arcsine of reflection coefficients, the log area ratios, and the line spectral frequencies.

...read moreread less

Abstract: The authors present results on the comparative performance of nonuniform scalar quantizers using three different LPC (linear predictive coding) representations: the arcsine of reflection coefficients, the log area ratios, and the line spectral frequencies. On comparing the spectral distortion introduced by quantizers based on these representations, it was found that the average distortion was very similar for all three, with the arcsine showing fewer large spectral errors. In a parallel study, the performance of the above LPC representations and the autocorrelation coefficients for interpolating the spectrum between adjacent time frames was investigated and revealed only small differences between the different representations. Informal listening tests with a complete 8 kb/s code-excited linear predictive (CELP) coder, incorporating both quantization and interpolation, showed no significant differences between the various LPC representations, suggesting that the random codebook for the excitation is able to compensate for small spectral deviations. >

...read moreread less

90 citations

Proceedings Article•DOI•

Phonetically-based vector excitation coding of speech at 3.6 kbps

[...]

S. Wang¹, Allen Gersho¹•Institutions (1)

University of California, Santa Barbara¹

23 May 1989

TL;DR: A phonetically based segmentation of speech is performed to classify segments into five classes: onset, unvoiced low-pass voiced, steady-state voiced, Steady- state voiced, and transient voiced, using a distinctive coding scheme based on vector excitation coding (VXC).

...read moreread less

Abstract: A phonetically based segmentation of speech is performed to classify segments into five classes: onset, unvoiced low-pass voiced, steady-state voiced, and transient voiced. The segment lengths are constrained to an integer multiple of a unit-frame. For each segment class, a distinctive coding scheme based on vector excitation coding (VXC) is used. The maximum bit-rate is 3.6 kb/s, and a moderate coding delay of 45 ms is incurred. Performance is roughly comparable to conventional VXC/CELP (code-excited linear prediction) coding at 4.8 kb/s. >

...read moreread less

82 citations

Patent•DOI•

Hybrid switched multi-pulse/stochastic speech coding technique

[...]

Richard Louis Zinser¹•Institutions (1)

General Electric¹

18 May 1989-Journal of the Acoustical Society of America

TL;DR: In this paper, an unvoiced speech performance was improved in low-rate multi-pulse coders by employing a simple architecture with an output quality comparable to code excited linear predictive (CELP) coding.

...read moreread less

Abstract: Improved unvoiced speech performance in low-rate multi-pulse coders is achieved by employing a multi-pulse architecture that is simple in implementation but with an output quality comparable to code excited linear predictive (CELP) coding. A hybrid architecture is provided in which a stochastic excitation model that is used during unvoiced speech is also capable of modeling voiced speech by use of random codebook excitation. A modified method for calculating the gain during stochastic excitation is also provided.

...read moreread less

73 citations

Proceedings Article•DOI•

Speech codec for the European mobile radio system

[...]

Karl Hellwig¹, Peter Vary¹, D. Massaloux, J. P. Petit, C. Galand, M. Rosso - Show less +2 more•Institutions (1)

Philips¹

27 Nov 1989

TL;DR: The speech coding scheme which will be used as the standard for the European mobile radio system has been selected by the CEPT Groupe Special-Mobile (GSM) as a result of formal subjective listening tests based on the regular-pulse excitation linear predictive coding technique (RPE-LPC) combined with long-term prediction (LTP).

...read moreread less

Abstract: The speech coding scheme which will be used as the standard for the European mobile radio system has been selected by the CEPT Groupe Special-Mobile (GSM) as a result of formal subjective listening tests. It is based on the regular-pulse excitation linear predictive coding technique (RPE-LPC) combined with long-term prediction (LTP). The solution is called the RPE-LTP codec. The codec algorithm and the error protection scheme are presented. The net bit rate is 13.0 kb/s, and the gross bit rate, including error protection, is 22.8 kb/s. The experimental implementation based on VLSI signal processors is described. The speech quality obtained with the technique considered is far superior to that obtainable with present-day analog mobile radio systems. A duplex speech codec including error protection can be implemented with two VLSI sign processors with external data memories of about 1 K*16 b. >

...read moreread less

Book Chapter•DOI•

A robust low-delay CELP speech coder at 16 kbits/s

[...]

Jinhui Chen¹•Institutions (1)

Bell Labs¹

01 Jan 1989

TL;DR: A candidate algorithm for the new CCITT 16-kb/s speech coding standard is presented, based on backward-adaptive CELP (code-excited linear prediction) where the predictor and the excitation gain are updated by analyzing previously quantized signals.

...read moreread less

Abstract: A candidate algorithm for the new CCITT 16-kb/s speech coding standard is presented. This algorithm is based on backward-adaptive CELP (code-excited linear prediction) where the predictor and the excitation gain are updated by analyzing previously quantized signals. The only information transmitted is the excitation vector with a size as small as five samples so as to achieve a one-way coding delay of less than 2 ms. With a clear channel, this 16-kb/s coder slightly outperformed the CCITT standard 32 kb/s ADPCM (adaptive differential pulse code modulation) (G.721) in speech quality as measured by the mean opinion score (MOS). With noisy channels, the coder scored slightly higher than G.721 fora bit-error rate of 10/sup -2/ and significantly higher (by a margin of 0.5) for bit-error rate of 10/sup -3/. >

...read moreread less

Proceedings Article•DOI•

Efficient encoding of speech LSP parameters using the discrete cosine transformation

[...]

Nariman Farvardin¹, Rajiv Laroia¹•Institutions (1)

University of Maryland, College Park¹

23 May 1989

TL;DR: It is shown that an average spectral distortion of approximately 1 dB/sup 2/ can be achieved with 21 and 25 bits/frame using the 2-D DCT and DCT-DPCM schemes, respectively, which is a noticeable improvement over the previously reported bit rates of 32 bits/ frame and above.

...read moreread less

Abstract: The intraframe and interframe correlation properties are used to develop two efficient encoding algorithms for speech line spectrum pair (LSP) parameters. The first algorithm (2-D DCT), which requires relatively large coding delays, is based on two-dimensional (time and frequency) discrete cosine transform coding techniques; the second algorithm (DCT-DPCM), which does not need any coding delay, uses one-dimensional discrete cosine transform in the frequency domain and DPCM (differential pulse-code modulation) in the time domain. The performances of these systems for different bit rates and delays are studied, and appropriate comparisons are made. It is shown that an average spectral distortion of approximately 1 dB/sup 2/ can be achieved with 21 and 25 bits/frame using the 2-D DCT and DCT-DPCM schemes, respectively. This is a noticeable improvement over the previously reported bit rates of 32 bits/frame and above. >

...read moreread less

Journal Article•DOI•

Formant speech synthesis: improving production quality

[...]

N.B. Pinto¹, Donald G. Childers¹, Ajit L. Lalwani¹•Institutions (1)

University of Florida¹

01 Dec 1989-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A series of algorithms for silent and voiced/unvoiced/mixed excitation interval classification, pitch detection, formant estimation and formant tracking was developed, which can surpass the performance of single-channel (acoustic-signal-based) algorithms.

...read moreread less

Abstract: The authors describe analysis and synthesis methods for improving the quality of speech produced by D.H. Klatt's (J. Acoust. Soc. Am., vol.67, p.971-95, 1980) software formant synthesizer. Synthetic speech generated using an excitation waveform resembling the glotal volume-velocity was found to be perceptually preferred over speech synthesized using other types of excitation. In addition, listeners ranked speech tokens synthesized with an excitation waveform that simulated the effects of source-tract interaction higher in neutralness than tokens synthesized without such interaction. A series of algorithms for silent and voiced/unvoiced/mixed excitation interval classification, pitch detection, formant estimation and formant tracking was developed. The algorithms can utilize two channels of input data, i.e., speech and electroglottographic signals, and can therefore surpass the performance of single-channel (acoustic-signal-based) algorithms. The formant synthesizer was used to study some aspects of the acoustic correlates of voice quality, e.g., male/female voice conversion and the simulation of breathiness, roughness, and vocal fry. >

...read moreread less

Patent•DOI•

Method and apparatus for phase synthesis for speech processing

[...]

John C. Hardwick, J.S. Lim

30 Nov 1989-Journal of the Acoustical Society of America

TL;DR: In this article, a speech decoder for synthesizing a speech signal from a digitized speech bit stream of the type produced by processing speech with a speech encoder is described, which includes an analyzer for processing the digitized bit stream to generate an angular frequency and magnitude for each of a plurality of sinusoidal components representing the speech processed by the encoder, the analyzer generating the angular frequencies and magnitudes over a sequence of times, and a random signal generator for generating a time sequence of random phase components; a phase synthesizer for synthesized phases for at

...read moreread less

Abstract: A speech decoder apparatus for synthesizing a speech signal from a digitized speech bit stream of the type produced by processing speech with a speech encoder. The apparatus includes an analyzer for processing the digitized speech bit stream to generate an angular frequency and magnitude for each of a plurality of sinusoidal components representing the speech processed by the speech encoder, the analyzer generating the angular frequencies and magnitudes over a sequence of times; a random signal generator for generating a time sequence of random phase components; a phase synthesizer for generating a time sequence of synthesized phases for at least some of the sinusoidal components, the synthesized phases being generated from the angular frequencies and random phase components; and a synthesizer for synthesizing speech from the time sequences of angular frequencies, magnitudes, and synthesized phases.

...read moreread less

Proceedings Article•DOI•

A phonetic vocoder

[...]

Joseph Picone¹, George R. Doddington¹•Institutions (1)

Texas Instruments¹

23 May 1989

TL;DR: The authors study the problem of coding spectral information in speech at bit rates in the range of 100-400 b/s using speaker-independent phone-based recognition using hidden Markov model (HMM)-based phone models to show a promising approach.

...read moreread less

Abstract: The authors study the problem of coding spectral information in speech at bit rates in the range of 100-400 b/s using speaker-independent phone-based recognition. Spectral information is coded as a sequence of phonetic events and a sequence of transitions through the corresponding hidden Markov model (HMM)-based phone models. This simple phonetic speech-coding system has been shown to be a promising approach. A simple inventory of phonemes is sufficient for capturing the bulk of the acoustic information. >

...read moreread less

Proceedings Article•DOI•

Phase coherence in speech reconstruction for enhancement and coding applications

[...]

Thomas F. Quatieri¹, R. J. McAulay¹•Institutions (1)

Massachusetts Institute of Technology¹

23 May 1989

TL;DR: A zero-phase sinusoidal analysis-synthesis system which generates natural-sounding speech without the requirement of vocal tract phase is described, which provides a basis for improving sound quality by providing different levels of phase coherence in speech reconstruction for time-scale modification.

...read moreread less

Abstract: It has been shown that an analysis-synthesis system based on a sinusoidal representation leads to synthetic speech that is essentially perceptually indistinguishable from the original. A change in speech quality has been observed, however, when the phase relation of the sine waves is altered. This occurs in practice when sine waves are processed for speech enhancement and for speech coding. A description is given of a zero-phase sinusoidal analysis-synthesis system which generates natural-sounding speech without the requirement of vocal tract phase. The method provides a basis for improving sound quality by providing different levels of phase coherence in speech reconstruction for time-scale modification, for a baseline system for coding, and for reducing the peak-to-RMS ratio by dispersion. >

...read moreread less

Patent•DOI•

Artificial intelligence pattern-recognition-based noise reduction system for speech processing

[...]

Daniel Graupe

07 Nov 1989-Journal of the Acoustical Society of America

TL;DR: In this article, an artificial intelligence system is used to decide upon the adjustment of a filter subsystem by distinguishing between noise and speech in the spectrum of the incoming signal of speech plus noise.

...read moreread less

Abstract: A system is provided to reduce noise from a signal of speech that is contaminated by noise. The present system employs an artificial intelligence that is capable of deciding upon the adjustment of a filter subsystem by distinguishing between noise and speech in the spectrum of the incoming signal of speech plus noise. The system does this by testing the pattern of a power or envelope function of the frequency spectrum of the incoming signal. The system determines that the fast changing portions of that envelope denote speech whereas the residual is determined to be the frequency distribution of the noise power. This determination is done while examining either the whole spectrum, or frequency bands thereof, regardless of where the maximum of the spectrum lies. In another embodiment of the invention, a feedback loop is incorporated which provides incremental adjustments to the filter by employing a gradient search procedure to attempt to increase certain speech-like features in the system's output. The present system does not require consideration of minima of functions of the incoming signal or pauses in speech. Instead, the present system employs an artificial intelligence system to which is input the envelope pattern of the incoming signal of speech and noise. The present system then filters out of this envelope signal the rapidly changing variations of the envelope over fixed time windows.

...read moreread less

Patent•DOI•

Noise reduction system using neural network

[...]

Aritsuka Toshiyuki¹, Akio Amano¹, Hataoka Nobuo¹, Akira Ichikawa¹•Institutions (1)

Hitachi¹

12 Dec 1989-Journal of the Acoustical Society of America

TL;DR: A noise reduction system used for transmission and/or recognition of speech includes a speech analyzer for analyzing a noisy speech input signal thereby converting the speech signal into feature vectors such as autocorrelation coefficients, and a neural network for receiving the feature vectors of the noisy speech signal as its input.

...read moreread less

Abstract: A noise reduction system used for transmission and/or recognition of speech includes a speech analyzer for analyzing a noisy speech input signal thereby converting the speech signal into feature vectors such as autocorrelation coefficients, and a neural network for receiving the feature vectors of the noisy speech signal as its input. The neural network extracts from a codebook an index of prototype vectors corresponding to a noise-free equivalent to the noisy speech input signal. Feature vectors of speech are read out from the codebook on the basis of the index delivered as an output from the neural network, thereby causing the speech input to be reproduced on the basis of the feature vectors of speech read out from the codebook.

...read moreread less

Journal Article•DOI•

Frequency-varying sinusoidal modeling of speech

[...]

L.S. Marques, L.B. Almeida

01 May 1989-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A sinusoidal model is presented where the nonstationary nature of speech is considered by using a time-varying frequency and amplitude for each sinusoid using a suboptimal linear estimator.

...read moreread less

Abstract: A sinusoidal model is presented where the nonstationary nature of speech is considered by using a time-varying frequency and amplitude for each sinusoid. The proposed model generalizes other sinusoidal models while still having an analytically tractable short-time spectrum. The estimation of the parameters of the sinusoids is done in the frequency domain by a suboptimal linear estimator. The experimental results obtained with the proposed model illustrate its ability to represent nonstationary speech frames. >

...read moreread less

Patent•DOI•

Method and apparatus for processing speech signals

[...]

Jaswant Raj Jain

03 Jul 1989-Journal of the Acoustical Society of America

TL;DR: In this article, a point-by-point division of the signal by an amplitudes function, which is obtained from lowpass filtering the magnitude of signal, is used for pitch detection and speech coding.

...read moreread less

Abstract: Processing speech signals applicable to a variety of speech processing including narrowband, mediumband and wideband coding. The speech signal is modified by a normalization process using the envelope of the speech signal such that the modified signal will have more desirable characteristics as seen by the intended processing algorithm. The modification is achieved by a point-by-point division (normalization) of the signal by an amplitudes function, which is obtained from lowpass filtering the magnitude of the signal. Several examples of normalized signal are presented. Application to pitch detection and speech coding are described herein.

...read moreread less

Proceedings Article•DOI•

Speech coding with time-varying bit allocations to excitation and LPC parameters

[...]

Nuggehally Sampath Jayant¹, Juin-Hwey Chen¹•Institutions (1)

Bell Labs¹

23 May 1989

TL;DR: The authors explore the benefits of time-varying bit allocation to excitation and LPC (linear predictive coding) parameters for the case of codebook-excited LPC, finding that gains due to variable bit allocation were most noticeable in the 6.4 kb/s system, especially with female speakers.

...read moreread less

Abstract: The authors explore the benefits of time-varying bit allocation to excitation and LPC (linear predictive coding) parameters for the case of codebook-excited LPC. The overall bit rate in the experiment was 4.8, 6.4, or 8.0 kb/s. In each case, permissible bit rates for the LPC component were 0, 24, 36, or 48 bits per frame, one of which was selected for each speech frame using a brute-force search maximum performance. Average SNR gains over conventional time-invariant methods were modest, on the order of 1 to 2 dB, but gains for certain speech segments were as high as 3 to 5 dB. Perceptually, gains due to variable bit allocation were most noticeable in the 6.4 kb/s system, especially with female speakers. However, even in this case, the benefits of flexible bit allocation were somewhat offset by distortions due to other inadequacies in the coding algorithm. >

...read moreread less

Proceedings Article•DOI•

The effects of subtractive-type speech enhancement/noise reduction algorithms on parameter estimation for improved recognition and coding in high noise environments

[...]

W.M. Kushner¹, Vladimir Goncharoff¹, C.S. Wu, V.V. Nguyen, J. Damoulakis - Show less +1 more•Institutions (1)

Martin Marietta Materials, Inc.¹

23 May 1989

TL;DR: The study shows that removing less than the full amount of noise and whitening it improves spectral estimation and speech device performance.

...read moreread less

Abstract: The authors present the results of a study designed to investigate the effects of subtractive-type noise reduction algorithms on LPC-based spectral parameter estimation as related to the performance of speech processors operating with input SNRs of 15 dB and below. Subtractive noise preprocessing greatly improves the SNR, but system performance improvement is not commensurate. LPC spectral estimation is affected by the character of the residual noise which exhibits greater variance and spectral granularity than the original broadband noise. The study shows that removing less than the full amount of noise and whitening it improves spectral estimation and speech device performance. Techniques and performance results are presented. >

...read moreread less

Proceedings Article•DOI•

Robust recognition of loud and Lombard speech in the fighter cockpit environment

[...]

B.J. Stanton¹, L.H. Jamieson¹, G.D. Allen¹•Institutions (1)

United States Air Force Academy¹

23 May 1989

TL;DR: In this paper, a method is devised that uses the differences in spectral slope between linear predictive coding log magnitude spectra to weight the point-by-point energy differences between the spectra.

...read moreread less

Abstract: The major goal of this research is to reduce the discrepancy in recognition performance between normal and abnormal speech, given that reference templates were derived only from normal speech. A method is devised that uses the differences in spectral slope between linear predictive coding log magnitude spectra to weight the point-by-point energy differences between the spectra. The distances of all reference tokens of like phonemes are combined to form a smallest cumulative distance (SCD) method. When SCD is combined with the method of slope-dependent weighting (SDW), the most significant success is obtained. In terms of error rates for a fixed phoneme vector length of five, SDW+SCD is found to reduce the difference in error rate between normal and abnormal speech by approximately 50%. >

...read moreread less

Patent•DOI•

Voice verification circuit for validating the identity of an unknown person

[...]

Jayant M. Naik¹, Lorin P. Netsch¹, George R. Doddington¹•Institutions (1)

Texas Instruments¹

09 May 1989-Journal of the Acoustical Society of America

TL;DR: A speaker verification system receives input speech from a speaker of unknown identity and undergoes linear predictive coding analysis and transformation to maximize separability between true speakers and impostors when compared to reference speech parameters which have been similarly transformed.

...read moreread less

Abstract: A speaker verification system receives input speech from a speaker of unknown identity. The speech undergoes linear predictive coding (LPC) analysis and transformation to maximize separability between true speakers and impostors when compared to reference speech parameters which have been similarly transformed. The transformation incorporated a "inter-class" covariance matrix of successful impostors within a database.

...read moreread less

Proceedings Article•DOI•

Backward adaptation for low delay vector excitation coding of speech at 16 kbit/s

[...]

Vladimir Cuperman¹, Allen Gersho, Robert Pettigrew, J.J. Shynk, J.-H. Yao - Show less +1 more•Institutions (1)

Simon Fraser University¹

27 Nov 1989

TL;DR: The LD-VXC coder provides very good speech quality at 16 kb/s, moderate complexity, a delay of under 2 ms, and a gentle degradation of quality with transmission errors, and was submitted to the CCITT as a candidate for a future 16-kb/s speech coding standard.

...read moreread less

Abstract: To attain a very-low-delay speech coder at 16 kb/s while maintaining a quality acceptable for the public switched telephone network, low delay vector excitation coding (LD-VXC) is introduced. Backward adaptation is used to track the spectral characteristics of the signal without requiring any buffering of the input speech, thereby allowing a very low delay to be achieved in an analysis-by-synthesis structure. The algorithm differs markedly from conventional VXC or CELP (code-excited linear prediction) coders due to the use of backward adaptive linear prediction for modeling the time-varying short- and long-term correlation of speech. The LD-VXC coder provides very good speech quality at 16 kb/s, moderate complexity, a delay of under 2 ms, and a gentle degradation of quality with transmission errors. The algorithm was submitted to the CCITT as a candidate for a future 16-kb/s speech coding standard. >

...read moreread less

Proceedings Article•DOI•

Fast CELP coding based on the Barnes-Wall lattice in 16 dimensions

[...]

C. Lamblin¹, J.-P. Adoul, D. Massaloux, S. Morissette•Institutions (1)

CNET¹

23 May 1989

TL;DR: Novel fast optimal algorithms for finding the best sequence in this Barnes-Wall shell innovation codebook makes it possible to design a CELP coder at 9.6 kb/s with good quality and still implementable on a current digital-signal-processing chip.

...read moreread less

Abstract: The authors present an algebraic code-excited linear prediction (CELP) speech coder where the innovation codebook comes from the first spherical code of the Barnes-Wall lattice in 16 dimensions. Novel fast optimal algorithms for finding the best sequence in this Barnes-Wall shell innovation codebook are described. This algebraic codebook makes it possible to design a CELP coder at 9.6 kb/s with good quality and still implementable on a current digital-signal-processing chip. >

...read moreread less

Proceedings Article•DOI•

Multimode coding: application to CELP

[...]

Tomohiko Taniguchi¹, Shigeyuki Unagami¹, Robert M. Gray•Institutions (1)

Fujitsu¹

23 May 1989

TL;DR: A novel approach to narrow- and medium-band speech coding that can dynamically balance the transmission rate between the excitation and the spectral parameters is introduced, improving the subjective speech quality.

...read moreread less

Abstract: The authors introduce a novel approach to narrow- and medium-band speech coding that can dynamically balance the transmission rate between the excitation and the spectral parameters. The coding algorithm, called multimode coding, operates several coding blocks, each of which has a different bit assignment in parallel, and selects the optimum coding block frame by frame based on an evaluation of the reproduced speech quality. This coding algorithm is applied to 4.8 and 8.0 kb/s CELP coders, and 2.0-2.4 dB of SNRseg improvement is achieved over conventional CELP coders. The spectral distortion measure is added as an evaluation function, improving the subjective speech quality. >

...read moreread less

Proceedings Article•DOI•

The optimization of perceptually-based features for speaker identification

[...]

L. Xu, J. Oglesby, J.S. Mason

23 May 1989

TL;DR: Results support the hypothesis that the higher orders of PLP contain significant speaker-specific information, with ASI performance improving rapidly up to order 8, and then far more slowly yet consistently up toOrder 16, and a similar pattern is seen for codebook size, with fast improvements up to size 64, with more gradual gains thereafter.

...read moreread less

Abstract: Results of an experimental study and the optimization of features for a conventional vector-quantization codebook-based automatic speaker identification (ASI) system are presented. Standard LPC (linear predictive coding) and a perceptually weighted feature termed PLP (perceptually based linear prediction) are compared using appropriate distance measures, namely, the log-likelihood, and three cepstral variants: constant weighting, the robot-power-sum, and the inverse variance. PLP features combined with a weighted cepstral measure are found to be consistently the best in a number of different digit-independent ASI experiments. Results support the hypothesis that the higher orders of PLP (>5) contain significant speaker-specific information, with ASI performance improving rapidly up to order 8, and then far more slowly yet consistently up to order 16. A similar pattern is seen for codebook size, with fast improvements up to size 64, with more gradual gains thereafter. >

...read moreread less

Proceedings Article•DOI•

A model of LPC excitation in terms of eigenvectors of the autocorrelation matrix of the impulse response of the LPC filter

[...]

B. Atal¹•Institutions (1)

Bell Labs¹

23 May 1989

TL;DR: The author presents results on the precision that is necessary in theexcitation without producing perceptible distortion in the output speech signal and an estimate of the minimum number of bits necessary for accurate reproduction of the excitation.

...read moreread less

Abstract: A description is presented of a framework for developing compact and accurate representation of LPC (linear predictive coding) excitation. In this representation, the excitation waveform is expressed as a linear combination of the eigenvectors of the autocorrelation matrix of the LPC filter's impulse response. This representation allows a systematic study of changes in the filter excitation on the speech output. The author presents results on the precision that is necessary in the excitation without producing perceptible distortion in the output speech signal and an estimate of the minimum number of bits necessary for accurate reproduction of the excitation. >

...read moreread less