scispace - formally typeset
Search or ask a question

Showing papers on "Linear predictive coding published in 1995"


Journal ArticleDOI
TL;DR: The survey indicates that the essential points in noisy speech recognition consist of incorporating time and frequency correlations, giving more importance to high SNR portions of speech in decision making, exploiting task-specific a priori knowledge both of speech and of noise, using class-dependent processing, and including auditory models in speech processing.

712 citations


Book
01 Nov 1995
TL;DR: An introduction to speech coding, W.B. Kleijn evaluation of speech coders, and a robust algorithm for pitch tracking (RAPT), D. McAulay and T.F. Quatieri waveform interpolation for coding and synthesis.
Abstract: An introduction to speech coding, W.B. Kleijn and K.K. Paliwal speech coding standards, R.V. Cox linear-prediction based analysis-by-synthesis coding, P. Kroon and W.B. Kleijn sinusoidal coding, R.J. McAulay and T.F. Quatieri waveform interpolation for coding and synthesis, W.B. Kleijn and J. Haagen low-delay coding of speech, J.-H. Chen multimode and variable-rate coding of speech, A. Das et al wideband speech coding, J.-P. Adoul and R. Lefebvre vector quantization for speech transmission, P. Hedelin et al theory for transmission of vector quantization data, P. Hedelin et al waveform coding and auditory masking, R. Veldhuis and A. Kohlrausch quantization of LPC parameters, K.K. Paliwal and W.B. Kleijn evaluation of speech coders, P. Kroon a robust algorithm for pitch tracking (RAPT), D. Talkin time-domain and frequency-domain techniques for prosodic modification of speech, E. Moulines and W. Verhelst nonlinear processing of speech, G. Kubin an approach to text-to-speech synthesis, R. Sproat and J. Olive the generation of prosodic structure and intonation in speech synthesis, J. Terken and R. Collier computation of timing in text-to-speech synthesis, J.P.H. van Santen objective optimization in algorithms for text-to-speech synthesis, Y. Sagisaka and N. Iwahashi quality evaluation of synthesized speech, V.J. van Heuven and R. van Bezooijen.

621 citations


Book
01 Feb 1995
TL;DR: A detailed account of the most recently developed digital speech coders designed specifically for use in the evolving communications systems, including an in-depth examination of the important topic of code excited linear prediction (CELP).
Abstract: From the Publisher: A detailed account of the most recently developed digital speech coders designed specifically for use in the evolving communications systems. Discusses the variety of speech coders utilized with such new systems as MBE IMMARSAT-M. Includes an in-depth examination of the important topic of code excited linear prediction (CELP).

453 citations


Journal ArticleDOI
TL;DR: A new mixed excitation LPC vocoder model is presented that preserves the low bit rate of a fully parametric model but adds more free parameters to the excitation signal so that the synthesizer can mimic more characteristics of natural human speech.
Abstract: Traditional pitch-excited linear predictive coding (LPC) vocoders use a fully parametric model to efficiently encode the important information in human speech. These vocoders can produce intelligible speech at low data rates (800-2400 b/s), but they often sound synthetic and generate annoying artifacts such as buzzes, thumps, and tonal noises. These problems increase dramatically if acoustic background noise is present at the speech input. This paper presents a new mixed excitation LPC vocoder model that preserves the low bit rate of a fully parametric model but adds more free parameters to the excitation signal so that the synthesizer can mimic more characteristics of natural human speech. The new model also eliminates the traditional requirement for a binary voicing decision so that the vocoder performs well even in the presence of acoustic background noise. A 2400-b/s LPC vocoder based on this model has been developed and implemented in simulations and in a real-time system. Formal subjective testing of this coder confirms that it produces natural sounding speech even in a difficult noise environment. In fact, diagnostic acceptability measure (DAM) test scores show that the performance of the 2400-b/s mixed excitation LPC vocoder is close to that of the government standard 4800-b/s CELP coder. >

352 citations


Journal ArticleDOI
TL;DR: A new method based on the global phase characteristics of minimum phase signals for determining the instants of significant excitation in speech signals is proposed, which works well for all types of voiced speech in male as well as female speech but, in all cases, under noise-free conditions only.
Abstract: A new method for determining the instants of significant excitation in speech signals is proposed. In the paper, significant excitation refers primarily to the instant of glottal closure within a pitch period in voiced speech. The method is based on the global phase characteristics of minimum phase signals. The average slope of the unwrapped phase of the short-time Fourier transform of linear prediction residual is calculated as a function of time. Instants where the phase slope function makes a positive zero-crossing are identified as significant excitations. The method is discussed in a source-filter context of speech production. The method is not sensitive to the characteristics of the filter. The influence of the type, length, and position of the analysis window is discussed. The method works well for all types of voiced speech in male as well as female speech but, in all cases, under noise-free conditions only. >

209 citations


Journal ArticleDOI
TL;DR: A theoretical analysis of high-rate vector quantization (VQ) systems that use suboptimal, mismatched distortion measures is presented, and the application of the analysis to the problem of quantizing the linear predictive coding (LPC) parameters in speech coding systems is described.
Abstract: The paper presents a theoretical analysis of high-rate vector quantization (VQ) systems that use suboptimal, mismatched distortion measures, and describes the application of the analysis to the problem of quantizing the linear predictive coding (LPC) parameters in speech coding systems. First, it is shown that in many high-rate VQ systems the quantization distortion approaches a simple quadratically weighted error measure, where the weighting matrix is a "sensitivity matrix" that is an extension of the concept of the scalar sensitivity. The approximate performance of VQ systems that train and quantize using mismatched distortion measures is derived, and is used to construct better distortion measures. Second, these results are used to determine the performance of LPC vector quantizers, as measured by the log spectral distortion (LSD) measure, which have been trained using other error measures, such as mean-squared (MSE) or weighted mean-squared error (WMSE) measures of LEPC parameters, reflection coefficients and transforms thereof, and line spectral pair (LSP) frequencies. Computationally efficient algorithms for computing the sensitivity matrices of these parameters are described. In particular, it is shown that the sensitivity matrix for the LSP frequencies is diagonal, implying that a WMSE measured LSP frequencies converges to the LSD measure in high-rate VQ systems. Experimental results to support the theoretical performance estimates are provided. >

182 citations


Patent
H. S. Peter Yue1, Rafi Rabipour1
03 May 1995
TL;DR: In this paper, the subjectively annoying "swishing" or "waterfall" effects encountered in conventional LPC speech processing systems are reduced or eliminated using LPC coefficients calculated as described above.
Abstract: In methods and apparatus for processing a speech signals comprising a plurality of successive signal intervals, each signal interval containing no speech sounds is classified as a noise interval, and LPC coefficients are calculated for each noise interval based on the samples of that noise interval and on the samples of a plurality of preceding signal intervals. When noise intervals encoded using LPC coefficients calculated as described above are reconstructed, the subjectively annoying "swishing" or "waterfall" effects encountered in conventional LPC speech processing systems are reduced or eliminated.

167 citations


PatentDOI
TL;DR: A modular system and method is provided for encoding and decoding of speech signals using voicing probability determination and the use of the system in the generation of a variety of voice effects.
Abstract: A modular system and method is provided for encoding and decoding of speech signals using voicing probability determination. The continuous input speech is divided into time segments of a predetermined length. For each segment the encoder of the system computes the signal pitch and a parameter which is related to the relative content of voiced and unvoiced portions in the spectrum of the signal, which is expressed as a ratio Pv, defined as a voicing probability. The voiced portion of the signal spectrum, as determined by the parameter Pv, is encoded using a set of harmonically related amplitudes corresponding to the estimated pitch. The unvoiced portion of the signal is processed in a separate processing branch which uses a modified linear predictive coding algorithm. Parameters representing both the voiced and the unvoiced portions of a speech segment are combined in data packets for transmission. In the decoder, speech is synthesized from the transmitted parameters representing voiced and unvoiced portions of the speech in a reverse order. Boundary conditions between voiced and unvoiced segments are established to ensure amplitude and phase continuity for improved output speech quality. Perceptually smooth transition between frames is ensured by using an overlap and add method of synthesis. Also disclosed is the use of the system in the generation of a variety of voice effects.

151 citations


Journal ArticleDOI
TL;DR: MFB cepstra significantly outperform LPC cepstral under noisy conditions and techniques using an optimal linear combination of features for data reduction were evaluated.
Abstract: This paper compares the word error rate of a speech recognizer using several signal processing front ends based on auditory properties. Front ends were compared with a control mel filter bank (MFB) based cepstral front end in clean speech and with speech degraded by noise and spectral variability, using the TI-105 isolated word database. MFB recognition error rates ranged from 0.5 to 26.9% in noise, depending on the SNR, and auditory models provided error rates as much as four percentage points lower. With speech degraded by linear filtering, MFB error rates ranged from 0.5 to 3.1%, and the reduction in error rates provided by auditory models was less than 0.5 percentage points. Some earlier studies that demonstrated considerably more improvement with auditory models used linear predictive coding (LPC) based control front ends. This paper shows that MFB cepstra significantly outperform LPC cepstra under noisy conditions. Techniques using an optimal linear combination of features for data reduction were also evaluated. >

133 citations


Proceedings ArticleDOI
09 May 1995
TL;DR: A new audio-coding method is proposed, called transform-domain weighted interleave vector quantization (TwinVQ), which achieves high-quality reproduction at less than 64 kbit/s and exceeded that of an MPEG Layer II coder at the same bitrate.
Abstract: A new audio-coding method is proposed. This method is called transform-domain weighted interleave vector quantization (TwinVQ) and achieves high-quality reproduction at less than 64 kbit/s. The method is a transform coding using modified discrete cosine transform (MDCT). There are three novel techniques in this method: flattening of the MDCT coefficients by the spectrum of linear predictive coding (LPC) coefficients; interframe backward prediction for flattening the MDCT coefficients; and weighted interleave vector quantization. Subjective evaluation tests showed that the quality of the reproduction of TwinVQ exceeded that of an MPEG Layer II coder at the same bitrate.

87 citations


Proceedings ArticleDOI
09 May 1995
TL;DR: This paper shows that the audible distortion in low-bit-rate coders is often more a function of the dynamics of the power-spectrum envelope than of the spectral distortion as usually evaluated.
Abstract: Linear prediction coefficients are used to describe the power-spectrum envelope in the majority of low-bit-rate coders. The performance of quantizers for the linear-prediction coefficients is generally evaluated in terms of spectral distortion. This paper shows that the audible distortion in low-bit-rate coders is often more a function of the dynamics of the power-spectrum envelope than of the spectral distortion as usually evaluated. Smoothing the evolution of the power-spectrum envelope over time increases the reconstructed speech quality. A reasonable objective is to find the smoothest path that keeps the quantized parameters within the Voronoi regions associated with the transmitted quantization index. We demonstrate increased quantizer performance by such smoothing of the line-spectral frequencies.

Patent
Toshiyuki Morii1
27 Nov 1995
TL;DR: In this article, a speech is analyzed by a speech analyzing unit to obtain sample characteristic parameters, and a coding distortion is calculated from the sampled characteristic parameters in each of a plurality of coding modules.
Abstract: A sample speech is analyzed by a speech analyzing unit to obtain sample characteristic parameters, and a coding distortion is calculated from the sample characteristic parameters in each of a plurality of coding modules. The sample characteristic parameters and the coding distortions are statistically processed by a statistical processing unit to obtain a coding module selecting rule. Thereafter, when a speech is analyzed by the speech analyzing unit to obtain characteristic parameters, an appropriate coding module is selected by a coding module selecting unit from the coding modules according to the coding module selecting rule on condition that a coding distortion for the characteristic parameters is minimized in the appropriate coding module. Thereafter, the characteristic parameters of the speech are coded in the appropriate coding module, and a coded speech is obtained. When the coded speech is decoded, a reproduced speech is obtained. Accordingly, because an appropriate coding module can be easily selected from a plurality of coding modules according to the coding module selecting rule, any allophone occurring in a reproduced speech can be prevented at a low calculation volume.

PatentDOI
TL;DR: A method for encoding a speech signal into digital bits including the steps of dividing thespeech signal into speech frames representing time intervals of the speech signal, determining voicing information for frequency bands of thespeech frames, and determining spectral magnitudes representative of the magnitudes of the spectrum at determined frequencies across the frequency bands.
Abstract: A method for encoding a speech signal into digital bits including the steps of dividing the speech signal into speech frames representing time intervals of the speech signal, determining voicing information for frequency bands of the speech frames, and determining spectral magnitudes representative of the magnitudes of the spectrum at determined frequencies across the frequency bands. The method further includes quantizing and encoding the spectral magnitudes and the voicing information. The steps of determining, quantizing and encoding the spectral magnitudes is done is such a manner that the spectral magnitudes independent of voicing information are available for later synthesizing.

Patent
04 Apr 1995
TL;DR: In this paper, the authors exploit the synergy between operations performed by a speech rate modification system and those operations performed in a speech coding system to provide a speech-rate modification system with reduced hardware requirements.
Abstract: Synergy between operations performed by a speech-rate modification system and those operations performed in a speech coding system is exploited to provide a speech-rate modification system with reduced hardware requirements. The speech rate of an input signal is modified based on a signal representing a predetermined change in speech rate. The modified speech-rate signal is then filtered to generate a speech signal having increased short-term correlation. Modification of the input speech signal may be performed by inserting in the input speech signal a previous sequence of samples corresponding substantially to a pitch cycle. Alternatively, the input speech signal may be modified by removing from the input speech signal a sequence of samples corresponding substantially to a pitch cycle.

Journal ArticleDOI
T. Chen1, H.P. Graf1, Kuansan Wang2
TL;DR: The marriage of speech analysis and image processing can solve problems related to lip synchronization and speech information is utilized to improve the quality of audio-visual communications such as videotelephony and videoconferencing.
Abstract: We utilize speech information to improve the quality of audio-visual communications such as videotelephony and videoconferencing. In particular, the marriage of speech analysis and image processing can solve problems related to lip synchronization. We present a technique called speech-assisted frame-rate conversion. Demonstration sequences are presented. Other applications, including speech-assisted video coding, are outlined. >

Proceedings ArticleDOI
09 May 1995
TL;DR: A new set of speech feature representations for robust speech recognition in the presence of car noise is proposed, based on subband analysis of the speech signal, and the performances of the new feature representations are compared to mel scale cepstral coefficients.
Abstract: A new set of speech feature representations for robust speech recognition in the presence of car noise is proposed. These parameters are based on subband analysis of the speech signal. Line spectral frequency (LSF) representation of the linear prediction (LP) analysis in subbands and cepstral coefficients derived from subband analysis (SUBCEP) are introduced, and the performances of the new feature representations are compared to mel scale cepstral coefficients (MELCEP) in the presence of car noise. Subband analysis based parameters are observed to be more robust than the commonly employed MELCEP representations.


Patent
17 Aug 1995
TL;DR: In this paper, an apparatus for monitoring signal quality in a communications link is provided which recognizes speech elements in signals received over the communications link and generates therefrom an estimate of the original speech signal, and compares the estimated signal with the actual received signal to provide an output based on the comparison.
Abstract: An apparatus for monitoring signal quality in a communications link is provided which recognizes speech elements in signals received over the communications link and generates therefrom an estimate of the original speech signal, and compares the estimated signal with the actual received signal to provide an output based on the comparison.

Journal ArticleDOI
TL;DR: It is revealed that the most robust method depends on the type of noise, and the authors propose three other approaches with the above point as a motivation, which lead to the weighted least absolute value solution.
Abstract: Various linear predictive (LP) analysis methods are studied and compared from the points of view of robustness to noise and of application to speaker identification. The key to the success of the LP techniques is in separating the vocal tract information from the pitch information present in a speech signal even under noisy conditions. In addition to considering the conventional, one-shot weighted least-squares methods, the authors propose three other approaches with the above point as a motivation. The first is an iterative approach that leads to the weighted least absolute value solution. The second is an extension of the one-shot least-squares approach and achieves an iterative update of the weights. The update is a function of the residual and is based on minimizing a Mahalanobis distance. Third, the weighted total least-squares formulation is considered. A study of the deviations in the LP parameters is done when noise (white Gaussian and impulsive) is added to the speech. It is revealed that the most robust method depends on the type of noise. Closed-set speaker identification experiments with 20 speakers are conducted using a vector quantizer classifier trained on clean speech. The relative performance of the various LP approaches depends on the type of speech material used for testing. >

PatentDOI
TL;DR: A method for decoding and synthesizing a synthetic digital speech signal from digital bits of the type produced by dividing a speech signal into frames and encoding the speech signal by an MBE based encoder.
Abstract: A method for decoding and synthesizing a synthetic digital speech signal from digital bits of the type produced by dividing a speech signal into frames and encoding the speech signal by an MBE based encoder. The method includes the steps of decoding the bits to provide spectral envelope and voicing information for each of the frames, processing the spectral envelope information to determine regenerated spectral phase information for each of the frames based on local envelope smoothness determining from the voicing information whether frequency bands for a particular frame are voiced or unvoiced. The method further includes synthesizing speech components for voiced frequency bands using the regenerated spectral phase information, synthesizing a speech component representing the speech signal in at least one unvoiced frequency band, and synthesizing the speech signal by combining the synthesized speech components for voiced and unvoiced frequency bands.

PatentDOI
TL;DR: In this article, a method of encoding speech by analyzing a digitized speech signal to determine excitation parameters for the digitised speech signal is disclosed, which is useful in encoding speech.
Abstract: A method of encoding speech by analyzing a digitized speech signal to determine excitation parameters for the digitized speech signal is disclosed. The method includes dividing the digitized speech signal into at least two frequency bands, determining a first preliminary excitation parameter by performing a nonlinear operation on at least one of the frequency band signals to produce a modified frequency band signal and determining the first preliminary excitation parameter using the modified frequency band signal, determining a second preliminary excitation parameter using a method different from the first method, and using the first and second preliminary excitation parameters to determine an excitation parameter for the digitized speech signal. The method is useful in encoding speech. Speech synthesized using the parameters estimated based on the invention generates high quality speech at various bit rates useful for applications such as satellite voice communication.

Patent
30 Nov 1995
TL;DR: In this paper, a plurality of speech segment data units are prepared for all desired speech waveforms, and a desired pitch is obtained by overlapping the appropriate speech segments data units according to a pitch period interval.
Abstract: A method and apparatus for synthesizing speech. According to one variation of the method and apparatus, a plurality of speech segment data units is prepared for all desired speech waveforms. Speech is then synthesized by reading out from memory the appropriate speech segment data units, and a desired pitch is obtained by overlapping the appropriate speech segment data units according to a pitch period interval. According to a second variation of the method and apparatus, speech segment data units are prepared for only initial speech waveforms and first pitch waveforms, and differential waveforms. With this variation, subsequent pitch waveforms for speech synthesis are generated by combining the first pitch waveform with the corresponding differential waveform. According to a third variation of the method and apparatus, a natural speech segment channel produces natural speech segment data units in the same manner as the first variation, and a synthesized speech segment channel produces speech segment data units according to a parameter method, such as a formant method. The natural speech segments and synthesized speech segments are then mixed to produce synthesized speech.

Patent
30 May 1995
TL;DR: In this paper, a pitch estimation device and method utilizing a multi-resolution approach to estimate a pitch lag value of input speech is presented. But the system includes determining the LPC residual of the speech and sampling the residual.
Abstract: A pitch estimation device and method utilizing a multi-resolution approach to estimate a pitch lag value of input speech. The system includes determining the LPC residual of the speech and sampling the LPC residual. A discrete Fourier transform is applied and the result is squared. A lowpass filtering step is carried out and a DFT on the squared amplitude is then performed to transform the LPC residual samples into another domain. An initial pitch lag can then be found with lower resolution. After getting the low-resolution pitch lag estimate, a refinement algorithm is applied to get a higher-resolution pitch lag. The refinement algorithm is based on minimizing the prediction error in the time domain. The refined pitch lag then can be used directly in the speech coding.

Proceedings ArticleDOI
09 May 1995
TL;DR: An efficient coding scheme for linear predictive coding (LPC) residuals is proposed based on harmonic and noise representation, which reduces the high complexity of the direct sinusoidal synthesis method with interpolated magnitudes and phases.
Abstract: An efficient coding scheme for linear predictive coding (LPC) residuals is proposed based on harmonic and noise representation. New features of the scheme include classified vector quantization of the spectral envelope of LPC residuals with a weighted distortion measure. The improvement in performance obtained by classifying codebooks based on a voiced/unvoiced (V/UV) decision is shown. Sequences of the short-term RMS power of the time domain waveforms are also vector quantized and transmitted for unvoiced signals. A fast synthesis algorithm for voiced signals using an FFT is also presented, which reduces the high complexity of the direct sinusoidal synthesis method with interpolated magnitudes and phases. Informal listening tests indicate that, in combination with a known LSP quantization technique, this residual coding scheme provides good communication quality at a total bit rate of less than 2.0 kbps.

Proceedings ArticleDOI
05 Sep 1995
TL;DR: A new enhancement speech procedure is presented which allows a more relevant spectrum without the need for an a priori knowledge of the noise intensity, and it is shown that this approach can greatly increase the recognition rate.
Abstract: It is known that noise can significantly decrease the performance of a speech recognition system. To solve this problem, many speech processing algorithms have been developed. Most of them assume that the noise level is constant, or is to be evaluated in the course of the algorithm. The paper addresses a particular kind of noise: the type introduced by pre-emphasis of the speech signal. A new enhancement speech procedure is presented which allows a more relevant spectrum without the need for an a priori knowledge of the noise intensity. It is shown that this approach can greatly increase the recognition rate.

PatentDOI
TL;DR: In this article, the pitch waveform center is determined by finding a local minimum of a centroid histogram waveform of the low-pass filtered speech waveform for one pitch period.
Abstract: A system that synchronously segments a speech waveform using pitch period and a center of the pitch waveform. The pitch waveform center is determined by finding a local minimum of a centroid histogram waveform of the low-pass filtered speech waveform for one pitch period. The speech waveform can then be represented by one or more of such pitch waveforms or segments during speech compression, reconstruction or synthesis. The pitch waveform can be modified by frequency enhancement/filtering, waveform stretching/shrinking in speech synthesis or speech disguise. The utterance rate can also be controlled to speed up or slow down the speech.

BookDOI
01 Jan 1995
TL;DR: The use of pitch prediction in speech coding and the application in speech processing, as well as current methods in continuous speech recognition, are studied.
Abstract: Contributors. Preface. Part 1: Speech Coding. 1. The use of pitch prediction in speech coding R.P. Ramachandran. 2. Vector quantization of linear predictor coefficients J.S. Collura. 3. Linear predictive analysis by synthesis coding P. Kroon, W.B. Kleijn. 4. Waveform interpolation J. Haggen, W.B. Kleijn. 5. Variable rate speech coding V. Cuperman, P. Lupini. Part 2: Speech Recognition. 6. Word spotting J.R. Rohlicek. 7. Speech recognition using neural networks S.V. Kosonocky. 8. Current methods in continuous speech recognition P.S. Gopalakrishnan. 9. Large vocabulary isolated word recognition V. Gupta, M. Lennig. 10. Recent developments in robust speech recognition B.H. Juang. 11. How do humans process and recognize speech? J.B. Allen. Part 3: Speaker Recognition. 12. Data fusion techniques for speaker recognition K.R. Farrell, R.J. Mammone. 13. Speaker recognition over telephone channels Y.-H. Kao, et al. Part 4: Text to speech synthesis. 14. Approaches to improve automatic speech synthesis D. O'Shaughnessy. Part 5: Applications of Models. 15. Microphone array for hands-free voice communication in a car S. Oh, V. Viswanathan. 16. The pitch mode modulation model and its application in speech processing O. Ghitza. 17. Auditory models and human performance in tasks related to speech coding and speech recognition O. Ghitza. 18. Applications of wavelets to speech processing: a case study of a Celp coder J. Ooi, V. Viswanathan. Index.

PatentDOI
TL;DR: A speech encoding/decoding method calculates a short-term prediction error of an input speech signal that is divided on a time axis into blocks, represents the short- term prediction residue by a synthesized sine wave and a noise and encodes a frequency spectrum of each of the synthesised sineWave and noise to encode the speech signal.
Abstract: A speech encoding/decoding method calculates a short-term prediction error of an input speech signal that is divided on a time axis into blocks, represents the short-term prediction residue by a synthesized sine wave and a noise and encodes a frequency spectrum of each of the synthesized sine wave and the noise to encode the speech signal. The speech encoding/decoding method decodes the speech signal on a block basis and finds a short-term prediction residue waveform by sine wave synthesis and noise synthesis of the encoded speech signal. The speech encoding/decoding method then synthesizes the time-axis waveform signal based on the short-term prediction residue waveform of the encoded speech signal.

Journal ArticleDOI
TL;DR: The authors derive a 30 bit two-quantizer scheme which achieves a performance equivalent to this scalar quantizer for line spectral frequencies (LSFs) and formulates a new adaptation algorithm for the vector quantizer and does a dynamic programming search for both quantizers.
Abstract: An important problem in speech coding is the quantization of linear predictive coefficients (LPCs) with the smallest possible number of bits while maintaining robustness to a large variety of speech material and transmission media. Since direct quantization of LPCs is known to be unsatisfactory, the authors consider this problem for an equivalent representation, namely, the line spectral frequencies (LSFs). To achieve an acceptable level of distortion a scalar quantizer for LSFs requires a 36 bit codebook. The authors derive a 30 bit two-quantizer scheme which achieves a performance equivalent to this scalar quantizer. The two-quantizer format consists of both a vector and a scalar quantizer such that for each input, the better quantizer is used. The vector quantizer is designed from a training set that reflects the joint density (for coding efficiency) and which ensures coverage (for robustness). The scalar quantizer plays a pivotal role in dealing better with regions of the space that are sparsely covered by its vector quantizer counterpart. A further reduction of 1 bit is obtained by formulating a new adaptation algorithm for the vector quantizer and doing a dynamic programming search for both quantizers. The method of adaptation takes advantage of the ordering of the LSFs and imposes no overhead in memory requirements. The dynamic programming search is feasible due to the ordering property. Subjective tests in a speech coder reveal that the 29 bit scheme produces equivalent perceptual quality to that when the parameters are unquantized. >

Patent
Toru Nakahara1
30 Oct 1995
TL;DR: In this paper, a speech coder is provided for converting an analog speech signal to a digital speech signal, and a storage level detector determines the amount of a data signal waiting for transmission from a non-telephonic communications apparatus.
Abstract: In a mobile telephone set, a speech coder is provided for converting an analog speech signal to a digital speech signal. A voice activity sensor determining whether the analog speech signal is present or not. A storage level detector determines the amount of a data signal waiting for transmission from a non-telephonic communications apparatus. A switching control logic determines the rate for the speech coder according to the outputs of the voice activity sensor and the storage level detector. When the analog speech signal is present and the amount of the waiting data signal is zero, the data rate is set at a high rate and a digital speech signal from the speech coder is transmitted along with an indication of the high data rate. When the analog speech signal is present and the amount of the waiting data signal is non-zero, the data rate of the speech coder is set at a rate which is lower than the high data rate and is variable in accordance with the amount of the waiting data signal and the digital speech signal from the speech coder and the data signal from the non-telephonic communications apparatus are transmitted along with an indication of the variable data rate. When the analog speech signal is absent and the amount of the waiting data signal is non-zero, the data signal from the apparatus is exclusively transmitted.