Showing papers on "Linear predictive coding published in 1995"

PDF

Open Access

Journal Article•DOI•

Speech recognition in noisy environments: a survey

[...]

Yifan Gong¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Apr 1995-Speech Communication

TL;DR: The survey indicates that the essential points in noisy speech recognition consist of incorporating time and frequency correlations, giving more importance to high SNR portions of speech in decision making, exploiting task-specific a priori knowledge both of speech and of noise, using class-dependent processing, and including auditory models in speech processing.

...read moreread less

712 citations

Book•

Speech Coding and Synthesis

[...]

Willem Bastiaan Kleijn, Kuldip K. Paliwal

01 Nov 1995

TL;DR: An introduction to speech coding, W.B. Kleijn evaluation of speech coders, and a robust algorithm for pitch tracking (RAPT), D. McAulay and T.F. Quatieri waveform interpolation for coding and synthesis.

...read moreread less

Abstract: An introduction to speech coding, W.B. Kleijn and K.K. Paliwal speech coding standards, R.V. Cox linear-prediction based analysis-by-synthesis coding, P. Kroon and W.B. Kleijn sinusoidal coding, R.J. McAulay and T.F. Quatieri waveform interpolation for coding and synthesis, W.B. Kleijn and J. Haagen low-delay coding of speech, J.-H. Chen multimode and variable-rate coding of speech, A. Das et al wideband speech coding, J.-P. Adoul and R. Lefebvre vector quantization for speech transmission, P. Hedelin et al theory for transmission of vector quantization data, P. Hedelin et al waveform coding and auditory masking, R. Veldhuis and A. Kohlrausch quantization of LPC parameters, K.K. Paliwal and W.B. Kleijn evaluation of speech coders, P. Kroon a robust algorithm for pitch tracking (RAPT), D. Talkin time-domain and frequency-domain techniques for prosodic modification of speech, E. Moulines and W. Verhelst nonlinear processing of speech, G. Kubin an approach to text-to-speech synthesis, R. Sproat and J. Olive the generation of prosodic structure and intonation in speech synthesis, J. Terken and R. Collier computation of timing in text-to-speech synthesis, J.P.H. van Santen objective optimization in algorithms for text-to-speech synthesis, Y. Sagisaka and N. Iwahashi quality evaluation of synthesized speech, V.J. van Heuven and R. van Bezooijen.

...read moreread less

621 citations

Book•

Digital Speech: Coding for Low Bit Rate Communication Systems

[...]

A. Kindoz, Ahmet M. Kondoz

01 Feb 1995

TL;DR: A detailed account of the most recently developed digital speech coders designed specifically for use in the evolving communications systems, including an in-depth examination of the important topic of code excited linear prediction (CELP).

...read moreread less

Abstract: From the Publisher: A detailed account of the most recently developed digital speech coders designed specifically for use in the evolving communications systems. Discusses the variety of speech coders utilized with such new systems as MBE IMMARSAT-M. Includes an in-depth examination of the important topic of code excited linear prediction (CELP).

...read moreread less

453 citations

Journal Article•DOI•

A mixed excitation LPC vocoder model for low bit rate speech coding

[...]

Alan V. McCree¹, Thomas P. Barnwell¹•Institutions (1)

Georgia Institute of Technology¹

01 Jul 1995-IEEE Transactions on Speech and Audio Processing

TL;DR: A new mixed excitation LPC vocoder model is presented that preserves the low bit rate of a fully parametric model but adds more free parameters to the excitation signal so that the synthesizer can mimic more characteristics of natural human speech.

...read moreread less

Abstract: Traditional pitch-excited linear predictive coding (LPC) vocoders use a fully parametric model to efficiently encode the important information in human speech. These vocoders can produce intelligible speech at low data rates (800-2400 b/s), but they often sound synthetic and generate annoying artifacts such as buzzes, thumps, and tonal noises. These problems increase dramatically if acoustic background noise is present at the speech input. This paper presents a new mixed excitation LPC vocoder model that preserves the low bit rate of a fully parametric model but adds more free parameters to the excitation signal so that the synthesizer can mimic more characteristics of natural human speech. The new model also eliminates the traditional requirement for a binary voicing decision so that the vocoder performs well even in the presence of acoustic background noise. A 2400-b/s LPC vocoder based on this model has been developed and implemented in simulations and in a real-time system. Formal subjective testing of this coder confirms that it produces natural sounding speech even in a difficult noise environment. In fact, diagnostic acceptability measure (DAM) test scores show that the performance of the 2400-b/s mixed excitation LPC vocoder is close to that of the government standard 4800-b/s CELP coder. >

...read moreread less

352 citations

Journal Article•DOI•

Determination of instants of significant excitation in speech using group delay function

[...]

R.L.H.M. Smits, B. Yegnanarayana¹•Institutions (1)

Indian Institute of Technology Madras¹

01 Sep 1995-IEEE Transactions on Speech and Audio Processing

TL;DR: A new method based on the global phase characteristics of minimum phase signals for determining the instants of significant excitation in speech signals is proposed, which works well for all types of voiced speech in male as well as female speech but, in all cases, under noise-free conditions only.

...read moreread less

Abstract: A new method for determining the instants of significant excitation in speech signals is proposed. In the paper, significant excitation refers primarily to the instant of glottal closure within a pitch period in voiced speech. The method is based on the global phase characteristics of minimum phase signals. The average slope of the unwrapped phase of the short-time Fourier transform of linear prediction residual is calculated as a function of time. Instants where the phase slope function makes a positive zero-crossing are identified as significant excitations. The method is discussed in a source-filter context of speech production. The method is not sensitive to the characteristics of the filter. The influence of the type, length, and position of the analysis window is discussed. The method works well for all types of voiced speech in male as well as female speech but, in all cases, under noise-free conditions only. >

...read moreread less

209 citations

Journal Article•DOI•

Theoretical analysis of the high-rate vector quantization of LPC parameters

[...]

Gardner William R¹, Bhaskar D. Rao•Institutions (1)

Qualcomm¹

01 Sep 1995-IEEE Transactions on Speech and Audio Processing

TL;DR: A theoretical analysis of high-rate vector quantization (VQ) systems that use suboptimal, mismatched distortion measures is presented, and the application of the analysis to the problem of quantizing the linear predictive coding (LPC) parameters in speech coding systems is described.

...read moreread less

Abstract: The paper presents a theoretical analysis of high-rate vector quantization (VQ) systems that use suboptimal, mismatched distortion measures, and describes the application of the analysis to the problem of quantizing the linear predictive coding (LPC) parameters in speech coding systems. First, it is shown that in many high-rate VQ systems the quantization distortion approaches a simple quadratically weighted error measure, where the weighting matrix is a "sensitivity matrix" that is an extension of the concept of the scalar sensitivity. The approximate performance of VQ systems that train and quantize using mismatched distortion measures is derived, and is used to construct better distortion measures. Second, these results are used to determine the performance of LPC vector quantizers, as measured by the log spectral distortion (LSD) measure, which have been trained using other error measures, such as mean-squared (MSE) or weighted mean-squared error (WMSE) measures of LEPC parameters, reflection coefficients and transforms thereof, and line spectral pair (LSP) frequencies. Computationally efficient algorithms for computing the sensitivity matrices of these parameters are described. In particular, it is shown that the sensitivity matrix for the LSP frequencies is diagonal, implying that a WMSE measured LSP frequencies converges to the LSD measure in high-rate VQ systems. Experimental results to support the theoretical performance estimates are provided. >

...read moreread less

182 citations

Patent•

Methods and apparatus for noise conditioning in digital speech compression systems using linear predictive coding

[...]

H. S. Peter Yue¹, Rafi Rabipour¹•Institutions (1)

Nortel¹

03 May 1995

TL;DR: In this paper, the subjectively annoying "swishing" or "waterfall" effects encountered in conventional LPC speech processing systems are reduced or eliminated using LPC coefficients calculated as described above.

...read moreread less

Abstract: In methods and apparatus for processing a speech signals comprising a plurality of successive signal intervals, each signal interval containing no speech sounds is classified as a noise interval, and LPC coefficients are calculated for each noise interval based on the samples of that noise interval and on the samples of a plurality of preceding signal intervals. When noise intervals encoded using LPC coefficients calculated as described above are reconstructed, the subjectively annoying "swishing" or "waterfall" effects encountered in conventional LPC speech processing systems are reduced or eliminated.

...read moreread less

167 citations

Patent•DOI•

Speech coding system and method using voicing probability determination

[...]

Suat Yeldener¹, Joseph Gerard Aguilar¹•Institutions (1)

Princeton University¹

13 Sep 1995-Journal of the Acoustical Society of America

TL;DR: A modular system and method is provided for encoding and decoding of speech signals using voicing probability determination and the use of the system in the generation of a variety of voice effects.

...read moreread less

Abstract: A modular system and method is provided for encoding and decoding of speech signals using voicing probability determination. The continuous input speech is divided into time segments of a predetermined length. For each segment the encoder of the system computes the signal pitch and a parameter which is related to the relative content of voiced and unvoiced portions in the spectrum of the signal, which is expressed as a ratio Pv, defined as a voicing probability. The voiced portion of the signal spectrum, as determined by the parameter Pv, is encoded using a set of harmonically related amplitudes corresponding to the estimated pitch. The unvoiced portion of the signal is processed in a separate processing branch which uses a modified linear predictive coding algorithm. Parameters representing both the voiced and the unvoiced portions of a speech segment are combined in data packets for transmission. In the decoder, speech is synthesized from the transmitted parameters representing voiced and unvoiced portions of the speech in a reverse order. Boundary conditions between voiced and unvoiced segments are established to ensure amplitude and phase continuity for improved output speech quality. Perceptually smooth transition between frames is ensured by using an overlap and add method of synthesis. Also disclosed is the use of the system in the generation of a variety of voice effects.

...read moreread less

151 citations

Journal Article•DOI•

A comparison of signal processing front ends for automatic word recognition

[...]

Charles Jankowski¹, H. D. H. Vo¹, Richard P. Lippmann¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jul 1995-IEEE Transactions on Speech and Audio Processing

TL;DR: MFB cepstra significantly outperform LPC cepstral under noisy conditions and techniques using an optimal linear combination of features for data reduction were evaluated.

...read moreread less

Abstract: This paper compares the word error rate of a speech recognizer using several signal processing front ends based on auditory properties. Front ends were compared with a control mel filter bank (MFB) based cepstral front end in clean speech and with speech degraded by noise and spectral variability, using the TI-105 isolated word database. MFB recognition error rates ranged from 0.5 to 26.9% in noise, depending on the SNR, and auditory models provided error rates as much as four percentage points lower. With speech degraded by linear filtering, MFB error rates ranged from 0.5 to 3.1%, and the reduction in error rates provided by auditory models was less than 0.5 percentage points. Some earlier studies that demonstrated considerably more improvement with auditory models used linear predictive coding (LPC) based control front ends. This paper shows that MFB cepstra significantly outperform LPC cepstra under noisy conditions. Techniques using an optimal linear combination of features for data reduction were also evaluated. >

...read moreread less

133 citations

Proceedings Article•DOI•

High-quality audio-coding at less than 64 kbit/s by using transform-domain weighted interleave vector quantization (TwinVQ)

[...]

N. Iwakami, Takehiro Moriya, S. Miki

09 May 1995

TL;DR: A new audio-coding method is proposed, called transform-domain weighted interleave vector quantization (TwinVQ), which achieves high-quality reproduction at less than 64 kbit/s and exceeded that of an MPEG Layer II coder at the same bitrate.

...read moreread less

Abstract: A new audio-coding method is proposed. This method is called transform-domain weighted interleave vector quantization (TwinVQ) and achieves high-quality reproduction at less than 64 kbit/s. The method is a transform coding using modified discrete cosine transform (MDCT). There are three novel techniques in this method: flattening of the MDCT coefficients by the spectrum of linear predictive coding (LPC) coefficients; interframe backward prediction for flattening the MDCT coefficients; and weighted interleave vector quantization. Subjective evaluation tests showed that the quality of the reproduction of TwinVQ exceeded that of an MPEG Layer II coder at the same bitrate.

...read moreread less

87 citations

Proceedings Article•DOI•

Spectral dynamics is more important than spectral distortion

[...]

H. Petter Knagenhjelm¹, W. Bastiaan Kleijn¹•Institutions (1)

Bell Labs¹

09 May 1995

TL;DR: This paper shows that the audible distortion in low-bit-rate coders is often more a function of the dynamics of the power-spectrum envelope than of the spectral distortion as usually evaluated.

...read moreread less

Abstract: Linear prediction coefficients are used to describe the power-spectrum envelope in the majority of low-bit-rate coders. The performance of quantizers for the linear-prediction coefficients is generally evaluated in terms of spectral distortion. This paper shows that the audible distortion in low-bit-rate coders is often more a function of the dynamics of the power-spectrum envelope than of the spectral distortion as usually evaluated. Smoothing the evolution of the power-spectrum envelope over time increases the reconstructed speech quality. A reasonable objective is to find the smoothest path that keeps the quantized parameters within the Voronoi regions associated with the transmitted quantization index. We demonstrate increased quantizer performance by such smoothing of the line-spectral frequencies.

...read moreread less

Patent•

Speech coding apparatus, linear prediction coefficient analyzing apparatus and noise reducing apparatus

[...]

Toshiyuki Morii¹•Institutions (1)

Panasonic¹

27 Nov 1995

TL;DR: In this article, a speech is analyzed by a speech analyzing unit to obtain sample characteristic parameters, and a coding distortion is calculated from the sampled characteristic parameters in each of a plurality of coding modules.

...read moreread less

Abstract: A sample speech is analyzed by a speech analyzing unit to obtain sample characteristic parameters, and a coding distortion is calculated from the sample characteristic parameters in each of a plurality of coding modules. The sample characteristic parameters and the coding distortions are statistically processed by a statistical processing unit to obtain a coding module selecting rule. Thereafter, when a speech is analyzed by the speech analyzing unit to obtain characteristic parameters, an appropriate coding module is selected by a coding module selecting unit from the coding modules according to the coding module selecting rule on condition that a coding distortion for the characteristic parameters is minimized in the appropriate coding module. Thereafter, the characteristic parameters of the speech are coded in the appropriate coding module, and a coded speech is obtained. When the coded speech is decoded, a reproduced speech is obtained. Accordingly, because an appropriate coding module can be easily selected from a plurality of coding modules according to the coding module selecting rule, any allophone occurring in a reproduced speech can be prevented at a low calculation volume.

...read moreread less

Patent•DOI•

Spectral magnitude representation for multi-band excitation speech coders

[...]

Daniel W. Griffin, John C. Hardwick

22 Feb 1995-Journal of the Acoustical Society of America

TL;DR: A method for encoding a speech signal into digital bits including the steps of dividing thespeech signal into speech frames representing time intervals of the speech signal, determining voicing information for frequency bands of thespeech frames, and determining spectral magnitudes representative of the magnitudes of the spectrum at determined frequencies across the frequency bands.

...read moreread less

Abstract: A method for encoding a speech signal into digital bits including the steps of dividing the speech signal into speech frames representing time intervals of the speech signal, determining voicing information for frequency bands of the speech frames, and determining spectral magnitudes representative of the magnitudes of the spectrum at determined frequencies across the frequency bands. The method further includes quantizing and encoding the spectral magnitudes and the voicing information. The steps of determining, quantizing and encoding the spectral magnitudes is done is such a manner that the spectral magnitudes independent of voicing information are available for later synthesizing.

...read moreread less

Patent•

Speech-rate modification for linear-prediction based analysis-by-synthesis speech coders

[...]

Willem Bastiaan Kleijn¹•Institutions (1)

Alcatel-Lucent¹

04 Apr 1995

TL;DR: In this paper, the authors exploit the synergy between operations performed by a speech rate modification system and those operations performed in a speech coding system to provide a speech-rate modification system with reduced hardware requirements.

...read moreread less

Abstract: Synergy between operations performed by a speech-rate modification system and those operations performed in a speech coding system is exploited to provide a speech-rate modification system with reduced hardware requirements. The speech rate of an input signal is modified based on a signal representing a predetermined change in speech rate. The modified speech-rate signal is then filtered to generate a speech signal having increased short-term correlation. Modification of the input speech signal may be performed by inserting in the input speech signal a previous sequence of samples corresponding substantially to a pitch cycle. Alternatively, the input speech signal may be modified by removing from the input speech signal a sequence of samples corresponding substantially to a pitch cycle.

...read moreread less

Journal Article•DOI•

Lip synchronization using speech-assisted video processing

[...]

T. Chen¹, H.P. Graf¹, Kuansan Wang²•Institutions (2)

Bell Labs¹, AT&T²

01 Apr 1995-IEEE Signal Processing Letters

TL;DR: The marriage of speech analysis and image processing can solve problems related to lip synchronization and speech information is utilized to improve the quality of audio-visual communications such as videotelephony and videoconferencing.

...read moreread less

Abstract: We utilize speech information to improve the quality of audio-visual communications such as videotelephony and videoconferencing. In particular, the marriage of speech analysis and image processing can solve problems related to lip synchronization. We present a technique called speech-assisted frame-rate conversion. Demonstration sequences are presented. Other applications, including speech-assisted video coding, are outlined. >

...read moreread less

Proceedings Article•DOI•

Subband analysis for robust speech recognition in the presence of car noise

[...]

Engin Erzin¹, Ahmet Enis Cetin², Yasemin Yardimci³•Institutions (3)

Bilkent University¹, Koç University², Boğaziçi University³

09 May 1995

TL;DR: A new set of speech feature representations for robust speech recognition in the presence of car noise is proposed, based on subband analysis of the speech signal, and the performances of the new feature representations are compared to mel scale cepstral coefficients.

...read moreread less

Abstract: A new set of speech feature representations for robust speech recognition in the presence of car noise is proposed. These parameters are based on subband analysis of the speech signal. Line spectral frequency (LSF) representation of the linear prediction (LP) analysis in subbands and cepstral coefficients derived from subband analysis (SUBCEP) are introduced, and the performances of the new feature representations are compared to mel scale cepstral coefficients (MELCEP) in the presence of car noise. Subband analysis based parameters are observed to be more robust than the commonly employed MELCEP representations.

...read moreread less

Proceedings Article•DOI•

Description Of The Proposed ITU-T 8 Kb/S Speech Coding Standard

[...]

R. Salami¹, C. Laftamme, J.-P. Adoul, A. Kataoba, S. Hayashi, C. Lamblin, D. Massaloux, S. Proust, P. Kroon, Y. Shoham - Show less +6 more•Institutions (1)

Université de Sherbrooke¹

20 Sep 1995

Patent•

Analysis of audio quality using speech recognition and synthesis

[...]

Hollier Michael Peter, Philip Julian Sheppard

17 Aug 1995

TL;DR: In this paper, an apparatus for monitoring signal quality in a communications link is provided which recognizes speech elements in signals received over the communications link and generates therefrom an estimate of the original speech signal, and compares the estimated signal with the actual received signal to provide an output based on the comparison.

...read moreread less

Abstract: An apparatus for monitoring signal quality in a communications link is provided which recognizes speech elements in signals received over the communications link and generates therefrom an estimate of the original speech signal, and compares the estimated signal with the actual received signal to provide an output based on the comparison.

...read moreread less

Journal Article•DOI•

A comparative study of robust linear predictive analysis methods with applications to speaker identification

[...]

Ravi P. Ramachandran¹, M.S. Zilovic¹, Richard J. Mammone¹•Institutions (1)

Rutgers University¹

01 Mar 1995-IEEE Transactions on Speech and Audio Processing

TL;DR: It is revealed that the most robust method depends on the type of noise, and the authors propose three other approaches with the above point as a motivation, which lead to the weighted least absolute value solution.

...read moreread less

Abstract: Various linear predictive (LP) analysis methods are studied and compared from the points of view of robustness to noise and of application to speaker identification. The key to the success of the LP techniques is in separating the vocal tract information from the pitch information present in a speech signal even under noisy conditions. In addition to considering the conventional, one-shot weighted least-squares methods, the authors propose three other approaches with the above point as a motivation. The first is an iterative approach that leads to the weighted least absolute value solution. The second is an extension of the one-shot least-squares approach and achieves an iterative update of the weights. The update is a function of the residual and is based on minimizing a Mahalanobis distance. Third, the weighted total least-squares formulation is considered. A study of the deviations in the LP parameters is done when noise (white Gaussian and impulsive) is added to the speech. It is revealed that the most robust method depends on the type of noise. Closed-set speaker identification experiments with 20 speakers are conducted using a vector quantizer classifier trained on clean speech. The relative performance of the various LP approaches depends on the type of speech material used for testing. >

...read moreread less

Patent•DOI•

Synthesis of MBE-based coded speech using regenerated phase information

[...]

Daniel W. Griffin, John C. Hardwick

22 Feb 1995-Journal of the Acoustical Society of America

TL;DR: A method for decoding and synthesizing a synthetic digital speech signal from digital bits of the type produced by dividing a speech signal into frames and encoding the speech signal by an MBE based encoder.

...read moreread less

Abstract: A method for decoding and synthesizing a synthetic digital speech signal from digital bits of the type produced by dividing a speech signal into frames and encoding the speech signal by an MBE based encoder. The method includes the steps of decoding the bits to provide spectral envelope and voicing information for each of the frames, processing the spectral envelope information to determine regenerated spectral phase information for each of the frames based on local envelope smoothness determining from the voicing information whether frequency bands for a particular frame are voiced or unvoiced. The method further includes synthesizing speech components for voiced frequency bands using the regenerated spectral phase information, synthesizing a speech component representing the speech signal in at least one unvoiced frequency band, and synthesizing the speech signal by combining the synthesized speech components for voiced and unvoiced frequency bands.

...read moreread less

Patent•DOI•

Estimation of excitation parameters

[...]

Daniel W. Griffin, Jae Lim

04 Apr 1995-Journal of the Acoustical Society of America

TL;DR: In this article, a method of encoding speech by analyzing a digitized speech signal to determine excitation parameters for the digitised speech signal is disclosed, which is useful in encoding speech.

...read moreread less

Abstract: A method of encoding speech by analyzing a digitized speech signal to determine excitation parameters for the digitized speech signal is disclosed. The method includes dividing the digitized speech signal into at least two frequency bands, determining a first preliminary excitation parameter by performing a nonlinear operation on at least one of the frequency band signals to produce a modified frequency band signal and determining the first preliminary excitation parameter using the modified frequency band signal, determining a second preliminary excitation parameter using a method different from the first method, and using the first and second preliminary excitation parameters to determine an excitation parameter for the digitized speech signal. The method is useful in encoding speech. Speech synthesized using the parameters estimated based on the invention generates high quality speech at various bit rates useful for applications such as satellite voice communication.

...read moreread less

Patent•

Speech synthesizing method and apparatus for combining natural speech segments and synthesized speech segments

[...]

Takahiro Kamai¹, Kenji Matsui¹, Noriyo Hara¹•Institutions (1)

Panasonic¹

30 Nov 1995

TL;DR: In this paper, a plurality of speech segment data units are prepared for all desired speech waveforms, and a desired pitch is obtained by overlapping the appropriate speech segments data units according to a pitch period interval.

...read moreread less

Abstract: A method and apparatus for synthesizing speech. According to one variation of the method and apparatus, a plurality of speech segment data units is prepared for all desired speech waveforms. Speech is then synthesized by reading out from memory the appropriate speech segment data units, and a desired pitch is obtained by overlapping the appropriate speech segment data units according to a pitch period interval. According to a second variation of the method and apparatus, speech segment data units are prepared for only initial speech waveforms and first pitch waveforms, and differential waveforms. With this variation, subsequent pitch waveforms for speech synthesis are generated by combining the first pitch waveform with the corresponding differential waveform. According to a third variation of the method and apparatus, a natural speech segment channel produces natural speech segment data units in the same manner as the first variation, and a synthesized speech segment channel produces speech segment data units according to a parameter method, such as a formant method. The natural speech segments and synthesized speech segments are then mixed to produce synthesized speech.

...read moreread less

Patent•

Pitch lag estimation using frequency-domain lowpass filtering of the linear predictive coding (LPC) residual

[...]

Huan-Yu Su¹•Institutions (1)

Rockwell International¹

30 May 1995

TL;DR: In this paper, a pitch estimation device and method utilizing a multi-resolution approach to estimate a pitch lag value of input speech is presented. But the system includes determining the LPC residual of the speech and sampling the residual.

...read moreread less

Abstract: A pitch estimation device and method utilizing a multi-resolution approach to estimate a pitch lag value of input speech. The system includes determining the LPC residual of the speech and sampling the LPC residual. A discrete Fourier transform is applied and the result is squared. A lowpass filtering step is carried out and a DFT on the squared amplitude is then performed to transform the LPC residual samples into another domain. An initial pitch lag can then be found with lower resolution. After getting the low-resolution pitch lag estimate, a refinement algorithm is applied to get a higher-resolution pitch lag. The refinement algorithm is based on minimizing the prediction error in the time domain. The refined pitch lag then can be used directly in the speech coding.

...read moreread less

Proceedings Article•DOI•

Harmonic and noise coding of LPC residuals with classified vector quantization

[...]

Masayuki Nishiguchi¹, Jun Matsumoto¹•Institutions (1)

Sony Broadcast & Professional Research Laboratories¹

09 May 1995

TL;DR: An efficient coding scheme for linear predictive coding (LPC) residuals is proposed based on harmonic and noise representation, which reduces the high complexity of the direct sinusoidal synthesis method with interpolated magnitudes and phases.

...read moreread less

Abstract: An efficient coding scheme for linear predictive coding (LPC) residuals is proposed based on harmonic and noise representation. New features of the scheme include classified vector quantization of the spectral envelope of LPC residuals with a weighted distortion measure. The improvement in performance obtained by classifying codebooks based on a voiced/unvoiced (V/UV) decision is shown. Sequences of the short-term RMS power of the time domain waveforms are also vector quantized and transmitted for unvoiced signals. A fast synthesis algorithm for voiced signals using an FFT is also presented, which reduces the high complexity of the direct sinusoidal synthesis method with interpolated magnitudes and phases. Informal listening tests indicate that, in combination with a known LSP quantization technique, this residual coding scheme provides good communication quality at a total bit rate of less than 2.0 kbps.

...read moreread less

Proceedings Article•DOI•

Pre-emphasis and speech recognition

[...]

R. Vergin¹, Douglas O'Shaughnessy¹•Institutions (1)

Institut national de la recherche scientifique¹

05 Sep 1995

TL;DR: A new enhancement speech procedure is presented which allows a more relevant spectrum without the need for an a priori knowledge of the noise intensity, and it is shown that this approach can greatly increase the recognition rate.

...read moreread less

Abstract: It is known that noise can significantly decrease the performance of a speech recognition system. To solve this problem, many speech processing algorithms have been developed. Most of them assume that the noise level is constant, or is to be evaluated in the course of the algorithm. The paper addresses a particular kind of noise: the type introduced by pre-emphasis of the speech signal. A new enhancement speech procedure is presented which allows a more relevant spectrum without the need for an a priori knowledge of the noise intensity. It is shown that this approach can greatly increase the recognition rate.

...read moreread less

Patent•DOI•

Method and apparatus for generating modified speech from pitch-synchronous segmented speech waveforms

[...]

George S. Kang¹, Lawrence J. Fransen¹•Institutions (1)

United States Department of the Navy¹

07 Nov 1995-Journal of the Acoustical Society of America

TL;DR: In this article, the pitch waveform center is determined by finding a local minimum of a centroid histogram waveform of the low-pass filtered speech waveform for one pitch period.

...read moreread less

Abstract: A system that synchronously segments a speech waveform using pitch period and a center of the pitch waveform. The pitch waveform center is determined by finding a local minimum of a centroid histogram waveform of the low-pass filtered speech waveform for one pitch period. The speech waveform can then be represented by one or more of such pitch waveforms or segments during speech compression, reconstruction or synthesis. The pitch waveform can be modified by frequency enhancement/filtering, waveform stretching/shrinking in speech synthesis or speech disguise. The utterance rate can also be controlled to speed up or slow down the speech.

...read moreread less

Book•DOI•

Modern methods of speech processing

[...]

Ravi P. Ramachandran, Richard J. Mammone

01 Jan 1995

TL;DR: The use of pitch prediction in speech coding and the application in speech processing, as well as current methods in continuous speech recognition, are studied.

...read moreread less

Abstract: Contributors. Preface. Part 1: Speech Coding. 1. The use of pitch prediction in speech coding R.P. Ramachandran. 2. Vector quantization of linear predictor coefficients J.S. Collura. 3. Linear predictive analysis by synthesis coding P. Kroon, W.B. Kleijn. 4. Waveform interpolation J. Haggen, W.B. Kleijn. 5. Variable rate speech coding V. Cuperman, P. Lupini. Part 2: Speech Recognition. 6. Word spotting J.R. Rohlicek. 7. Speech recognition using neural networks S.V. Kosonocky. 8. Current methods in continuous speech recognition P.S. Gopalakrishnan. 9. Large vocabulary isolated word recognition V. Gupta, M. Lennig. 10. Recent developments in robust speech recognition B.H. Juang. 11. How do humans process and recognize speech? J.B. Allen. Part 3: Speaker Recognition. 12. Data fusion techniques for speaker recognition K.R. Farrell, R.J. Mammone. 13. Speaker recognition over telephone channels Y.-H. Kao, et al. Part 4: Text to speech synthesis. 14. Approaches to improve automatic speech synthesis D. O'Shaughnessy. Part 5: Applications of Models. 15. Microphone array for hands-free voice communication in a car S. Oh, V. Viswanathan. 16. The pitch mode modulation model and its application in speech processing O. Ghitza. 17. Auditory models and human performance in tasks related to speech coding and speech recognition O. Ghitza. 18. Applications of wavelets to speech processing: a case study of a Celp coder J. Ooi, V. Viswanathan. Index.

...read moreread less

Patent•DOI•

Speech encoding method, speech decoding method and speech encoding/decoding method

[...]

Masayuki Nishiguchi¹, Jun Matsumoto¹•Institutions (1)

Sony Broadcast & Professional Research Laboratories¹

23 Aug 1995-Journal of the Acoustical Society of America

TL;DR: A speech encoding/decoding method calculates a short-term prediction error of an input speech signal that is divided on a time axis into blocks, represents the short- term prediction residue by a synthesized sine wave and a noise and encodes a frequency spectrum of each of the synthesised sineWave and noise to encode the speech signal.

...read moreread less

Abstract: A speech encoding/decoding method calculates a short-term prediction error of an input speech signal that is divided on a time axis into blocks, represents the short-term prediction residue by a synthesized sine wave and a noise and encodes a frequency spectrum of each of the synthesized sine wave and the noise to encode the speech signal. The speech encoding/decoding method decodes the speech signal on a block basis and finds a short-term prediction residue waveform by sine wave synthesis and noise synthesis of the encoded speech signal. The speech encoding/decoding method then synthesizes the time-axis waveform signal based on the short-term prediction residue waveform of the encoded speech signal.

...read moreread less

Journal Article•DOI•

A two codebook format for robust quantization of line spectral frequencies

[...]

Ravi P. Ramachandran¹, Man Mohan Sondhi, N. Seshadri, B.S. Atal•Institutions (1)

Rutgers University¹

01 May 1995-IEEE Transactions on Speech and Audio Processing

TL;DR: The authors derive a 30 bit two-quantizer scheme which achieves a performance equivalent to this scalar quantizer for line spectral frequencies (LSFs) and formulates a new adaptation algorithm for the vector quantizer and does a dynamic programming search for both quantizers.

...read moreread less

Abstract: An important problem in speech coding is the quantization of linear predictive coefficients (LPCs) with the smallest possible number of bits while maintaining robustness to a large variety of speech material and transmission media. Since direct quantization of LPCs is known to be unsatisfactory, the authors consider this problem for an equivalent representation, namely, the line spectral frequencies (LSFs). To achieve an acceptable level of distortion a scalar quantizer for LSFs requires a 36 bit codebook. The authors derive a 30 bit two-quantizer scheme which achieves a performance equivalent to this scalar quantizer. The two-quantizer format consists of both a vector and a scalar quantizer such that for each input, the better quantizer is used. The vector quantizer is designed from a training set that reflects the joint density (for coding efficiency) and which ensures coverage (for robustness). The scalar quantizer plays a pivotal role in dealing better with regions of the space that are sparsely covered by its vector quantizer counterpart. A further reduction of 1 bit is obtained by formulating a new adaptation algorithm for the vector quantizer and doing a dynamic programming search for both quantizers. The method of adaptation takes advantage of the ordering of the LSFs and imposes no overhead in memory requirements. The dynamic programming search is feasible due to the ordering property. Subjective tests in a speech coder reveal that the 29 bit scheme produces equivalent perceptual quality to that when the parameters are unquantized. >

...read moreread less

Patent•

Telephone apparatus for multiplexing digital speech samples and data signals using variable rate speech coding

[...]

Toru Nakahara¹•Institutions (1)

NEC¹

30 Oct 1995

TL;DR: In this paper, a speech coder is provided for converting an analog speech signal to a digital speech signal, and a storage level detector determines the amount of a data signal waiting for transmission from a non-telephonic communications apparatus.

...read moreread less

Abstract: In a mobile telephone set, a speech coder is provided for converting an analog speech signal to a digital speech signal. A voice activity sensor determining whether the analog speech signal is present or not. A storage level detector determines the amount of a data signal waiting for transmission from a non-telephonic communications apparatus. A switching control logic determines the rate for the speech coder according to the outputs of the voice activity sensor and the storage level detector. When the analog speech signal is present and the amount of the waiting data signal is zero, the data rate is set at a high rate and a digital speech signal from the speech coder is transmitted along with an indication of the high data rate. When the analog speech signal is present and the amount of the waiting data signal is non-zero, the data rate of the speech coder is set at a rate which is lower than the high data rate and is variable in accordance with the amount of the waiting data signal and the digital speech signal from the speech coder and the data signal from the non-telephonic communications apparatus are transmitted along with an indication of the variable data rate. When the analog speech signal is absent and the amount of the waiting data signal is non-zero, the data signal from the apparatus is exclusively transmitted.

...read moreread less

Collapse