scispace - formally typeset
Search or ask a question

Showing papers on "Code-excited linear prediction published in 2000"


PatentDOI
TL;DR: The authors used subband cepstral features to improve the recognition string accuracy rates for speech inputs for first training and then recognizing speech, using a method and apparatus for first classifying speech.
Abstract: A method and apparatus for first training and then recognizing speech. The method and apparatus use subband cepstral features to improve the recognition string accuracy rates for speech inputs.

107 citations


Book
01 Jan 2000
TL;DR: Auditory Processing of Speech Perceptual Coding Considerations Research in PerceptUAL Speech Coding APPENDIX: RELATED INTERNET SITES.
Abstract: INTRODUCTION SPEECH PRODUCTION The Speech Chain Articulation Source-Filter Model SPEECH ANALYSIS TECHNIQUES Sampling and the Speech Waveform Systems and Filtering z Transform Fourier Transform Discrete Fourier Transform Windowing Signal Segments LINEAR PREDICTION VOCAL TRACT MODELING Sound Propagation in the Vocal Tract Estimation of LP Parameters Transformations of LP Parameters for Quantization Examples of LP Modeling PITCH EXTRACTION Autocorrelation Pitch Extraction Cepstral Pitch Extraction Frequency-Domain Error Minimization Pitch Tracking AUDITORY INFORMATION PROCESSING The Basilar Membrane: A Spectrum Analyzer Critical Bands Thresholds of Audibility and Detectability Monaural Masking QUANTIZATION AND WAVEFORM CODERS Uniform Quantization Nonlinear Quantization Adaptive Quantization Vector Quantization QUALITY EVALUATION Objective Measures Subjective Measures Perceptual Objective Measures VOICE CODING CONCEPTS Channel Vocoder Formant Vocoders The Sinusoidal Speech Coder Linear Prediction Vocoder LINEAR PREDICTION ANALYSIS BY SYNTHESIS Analysis by Synthesis Estimation of Excitation Multi-Pulse Linear Prediction Coder Regular Pulse Excited LP Coder Code Excited Linear Prediction Coder MIXED EXCITATION CODING Multi-Band Excitation Vocoder Mixed Excitation Linear Prediction Coder Split Band LPC Coder Harmonic Vector Excitation Coder Waveform Interpolation Coding PERCEPTUAL SPEECH CODING Auditory Processing of Speech Perceptual Coding Considerations Research in Perceptual Speech Coding APPENDIX: RELATED INTERNET SITES

83 citations


Proceedings ArticleDOI
05 Jun 2000
TL;DR: Subjective tests show that the proposed bandwidth-scalable coding scheme based on the G.729 standard as a base layer coder achieves better performance than the 16 kbit/s MPEG-4 CELP with bandwidth scalability.
Abstract: This paper proposes a bandwidth-scalable coding scheme based on the G.729 standard as a base layer coder. In the scheme, according to the channel conditions, the output speech of the decoder can be selected to be narrowband (4-kHz bandwidth) or wideband (8-kHz bandwidth). The proposed scheme consists of two layers: base and enhancement. The base coder uses the G.729 algorithm to encode narrowband speech. The enhancement coder is based on a full-band CELP model and it encodes wideband speech while making use of the available base layer information. Two bandwidth-scalable coders are designed: one is scalable with the 8 kbit/s G.729 base coder and another with the 6.4 kbit/s G.729 (Annex D) base coder. Subjective tests show that, for wideband speech, the proposed coders at 16 kbit/s achieve better performance than the 16 kbit/s MPEG-4 CELP with bandwidth scalability.

62 citations


Proceedings ArticleDOI
17 Sep 2000
TL;DR: A speech/music discrimination procedure for multi-mode wideband coding that is suitable for combined speech and audio coding and shows improved performance when compared to single-mode encoding is described.
Abstract: We propose in this paper a general solution for combined speech and audio coding. Particularly, we describe a speech/music discrimination procedure for multi-mode wideband coding. The speech/music decision is updated only when a low-energy frame is detected, and kept unchanged otherwise. The signal is classified using second-order statistics of discriminant parameters. An experimental CELP/transform coder operating at 16 kbit/s is demonstrated. Results show improved performance when compared to single-mode encoding.

56 citations


01 Jan 2000
TL;DR: It is shown in this work that by selectively constructing a cepstral feature vector from the GSM codec parameters it is possible to reduce the effect of coding on recognition, and weighted acoustic modeling is introduced as an alternative to the method based on average distortion information.
Abstract: The growth of cellular telephony combined with recent advances in speech recognition technology results in sizeable potential opportunities for mobile speech recognition applications. Classic robustness techniques that have been previously proposed for speech recognition yield limited improvements of the degradation introduced by idiosyncrasies of the mobile networks. These sources of degradation include distortion introduced by the speech codec as well as artifacts arising from channel errors and discontinuous transmission. In this thesis we focus on characterizing the distortion introduced to the speech signal by the speech codec and we propose methods for reducing the detrimental effect of coding on recognition accuracy. The initial focus of this thesis is on the full rate GSM codec (FR-GSM). We propose a method to generate recognition features directly from codec parameters. It is shown in this work that by selectively constructing a cepstral feature vector from the GSM codec parameters it is possible to reduce the effect of coding on recognition. The later parts of this work are related to weighted acoustic modeling for robust speech recognition. The motivation for this approach is based on the observation that not all phones in a GSM-coded corpus are distorted to the same extent due to coding. We first establish a set of phonetic distortion classes through an analysis of the distribution of the log spectral distortion introduced to each phone by the codec. These classes are then employed to estimate an optimal weighted combination of acoustic models according to the average distortion encountered by the class. A relative reduction of almost 70% of the degradation introduced by the GSM codec was achieved using this method. The technique of weighted acoustic modeling based on instantaneous distortion is introduced as an alternative to the method based on average distortion information. When the extent of cepstral distortion introduced by coding is known, weighted acoustic modeling provides a reduction of about 50% in the word error rate introduced by concurrent GSM and CELP. We propose two methods to estimate the instantaneous distortion information: one based on recoding sensitivity and another based on long-term predictability. Due to the non linear relation between the time and the log-spectral domain, the proposed estimates of the instantaneous distortion do not perform as well as algorithms based on knowledge of cepstral distortion. However, we show that employing the proposed instantaneous distortion information estimates can help obtain the best recognition results established in the baseline conditions employing only 50% of the baseline Gaussian density computations.

47 citations


Patent
16 Feb 2000
TL;DR: In this article, a codec (coder and decoder) in which LP analysis and LP synthesis of a full wideband speech signal is performed, and, in an excitation search part of the coder (searching for a codeword in case of CELP), the signal is divided into a lower band and a higher band with the lower band searched using a decimated target signal obtained by decimating the input speech signal after filtering it through a wideband LP analysis filter.
Abstract: A codec (coder and decoder) in which LP analysis and LP synthesis of a full wideband speech signal is performed, and, in an excitation search part of the coder (searching for a codeword in case of CELP), the signal is divided into a lower band and a higher band with the lower band searched using a decimated target signal obtained by decimating the input speech signal after filtering it through a wideband LP analysis filter. White noise is optionally used for the higher band excitation. In the decoder, the lower band excitation is first interpolated, and then the two excitations (lower band and higher band) are added together and filtered through a wideband LP synthesis filter. Thus, an LP encoding is provided in which the sampling rate used for the search for a lower band excitation is less than the wideband sampling rate used in the LP analysis and synthesis.

44 citations


01 Jan 2000

44 citations


Proceedings ArticleDOI
17 Sep 2000
TL;DR: Preliminary subjective test results indicate that the CELP technique provides very high quality speech that meets or exceeds all requirements for both the ETSI/3GPP and ITU-T wideband speech standardization efforts.
Abstract: Recent standardization efforts have provided motivation for research in the area of wideband speech coding (50 Hz to 7 kHz). Both the European Telecommunications Standards Institute (ETSI) in conjunction with the Third Generation Partnership Project (3GPP), and the ITU-T are currently in the process of evaluating candidate algorithms at bit rates from around 12 kbps to 32 kbps. This paper describes a code-excited linear prediction (CELP) technique used in the Motorola candidate algorithm that is scalable to a wide range of bit rates, thereby allowing the use of a ubiquitous speech model. Preliminary subjective test results indicate that the technique provides very high quality speech that meets or exceeds all requirements for both the ETSI/3GPP and ITU-T wideband speech standardization efforts.

42 citations


Proceedings ArticleDOI
05 Jun 2000
TL;DR: Applied to the ITU-T G.729 ACELP 8 kb/s speech coding standard, both interpolation- and repetition-based techniques outperform standard concealment in informal listening tests.
Abstract: This paper describes new techniques for concealing frame erasures for CELP-based speech coders Two main approaches were followed: interpolative, where both past and future information are used to reconstruct the missing data, and repetition-based, where no future information is required Key features of the repetition-based approach include improved muting, pitch delay jittering, and LPC bandwidth expansion The interpolative approach can be employed in voice over IP scenarios at no extra cost in terms of delay Applied to the ITU-T G729 ACELP 8 kb/s speech coding standard, both interpolation- and repetition-based techniques outperform standard concealment in informal listening tests

32 citations


Patent
28 Dec 2000
TL;DR: In this article, a method for improving process time and speech quality of G.723.1 and reducing bit rate in a CLEP (Code Excited Linear Prediction) voice coder (or, called as vocoder) is presented.
Abstract: A method for improving process time and speech quality of G.723.1 and reducing bit rate in a CLEP (Code Excited Linear Prediction) voice coder (or, called as vocoder) includes: a method of searching MP-MLQ fixed codebook through bit predetermination includes the steps of generating a target vector with amplitude, reducing time to search an optimal pulse array through the bit predetermination and searching all of pulses if two errors have an identical value; a formant post-filtering method of extracting a reflection coefficient of a slope compensation filter to apply a multi-degree slope compensation thereto; a pitch post-filtering method including an energy level standardization step and a step of generating a signal approximate to an average energy level; a VAD algorithm method using an energy, a pitch gain and a LSP distance; and a method of enhancing a processing time of G.723.1, improving speech quality and reducing a bit rate by using a determination logic algorithm in setting a SID frame for the voice inactive interval, and a CELP vocoder using one of the methods.

30 citations


Proceedings ArticleDOI
05 Jun 2000
TL;DR: It is shown that Hi-BIN offers a low bit-rate representation of the higher band and is backwards compatible with existing narrowband speech coding systems.
Abstract: In this paper, an encoding technique called Hi-BIN (High Band Injection), which can be combined with any narrowband coder to achieve good quality wideband speech, is described. The principle behind this technique is to model frequencies above 4 kHz by noise with an appropriate spectral shape. This simple way of injecting synthetic noise in the higher frequencies gives surprisingly good quality when compared to very widely used computationally intensive waveform coding techniques such as CELP. We show that Hi-BIN offers a low bit-rate representation of the higher band and is backwards compatible with existing narrowband speech coding systems.

Proceedings ArticleDOI
05 Jun 2000
TL;DR: With CELP-based diversity schemes, transmission schemes that allocate bandwidth resources among diversity stages during congestion give significantly better performance than schemes that use no diversity during congestion, for the same bandwidth usage.
Abstract: Diversity schemes include information about packet n in future packets or send information about packet n via separate paths. If packet n is lost, it is reconstructed from information included in future packets or information received via separate paths. This paper presents CELP-based diversity schemes for voice over packet applications. The diversity schemes reduce the impact of packet losses while being efficient in terms of both bandwidth requirement and computational complexity. With our diversity schemes, transmission schemes that allocate bandwidth resources among diversity stages during congestion give significantly better performance than schemes that use no diversity during congestion, for the same bandwidth usage.

Patent
05 Dec 2000
TL;DR: In this article, a CELP type voice encoding device and an encoding device with a noise code book that can be searched in two modes in accordance with linear predictive analysis results, a pitch gain and a pitch cycle, all of which are obtained as analysis results of an input voice.
Abstract: A CELP type voice encoding device and a CELP type encoding device. Both the CELP type encoding device and the CELP type encoding device have a noise code book that can be searched in two modes in accordance with linear predictive analysis results, a pitch gain and a pitch cycle, all of which are obtained as analysis results of an input voice. Also the number of pulses forming a noise code vector is switched between a first case where a variation in pitch cycle is small througtout continuous sub-frames and in a second case where the variation is not small througtout continuous sub-frames.

Proceedings ArticleDOI
S.A. Ramprashad1
17 Sep 2000
TL;DR: This paper investigates the application of cross-channel prediction in the framework of stereophonic code excited linear predictive (CELP) coding.
Abstract: One step towards more realistic speech communication is the move from monophonic to stereophonic sound transmission. Stereophonic speech coding has been explored in the past with the use of cross-channel "cancellation" prediction combined with ADPCM. While this solution and other multi-channel audio techniques can be applied to speech coding there are some fundamental reasons that may limit their use in newer speech coding technologies. This paper investigates the application of cross-channel prediction in the framework of stereophonic code excited linear predictive (CELP) coding.

Proceedings ArticleDOI
05 Jun 2000
TL;DR: In alignment-phase encoding and zero-phase equalization, the phase component of the CELP target signal is removed making the target waveform more similar to the MELP-synthesized speech.
Abstract: The paper describes a hybrid multi-modal codec with MELP and CELP coders used for different speech regions Three modes are used: strongly voiced, weakly voiced, and unvoiced The weakly voiced mode includes transitions and plosives; it is used when neither strong voicing nor unvoiced region are clearly identified In the strongly voiced mode the MELP coder is used, while in the weakly voiced and unvoiced modes the CELP coder is employed To limit switching artifacts between the coders, alignment phase is estimated and transmitted in the MELP mode making the original and MELP-synthesized speech time-synchronous Additionally, in zero-phase equalization, the phase component of the CELP target signal is removed making the target waveform more similar to the MELP-synthesized speech These two techniques, alignment-phase encoding and zero-phase equalization, greatly reduce switching artifacts in MELP/CELP transition regions Formal listening test results of the 4 kb/s hybrid coder show that it can achieve speech quality equivalent to 32 kb/s ADPCM


Proceedings ArticleDOI
17 Sep 2000
TL;DR: A new efficient algorithm for quantizing the spectral information for a pitch-synchronous CELP (PSCELP) speech coder that employs linear interpolation at the decoder to recover the spectral parameters for the individual pitch periods used in the pitch- synchronous reconstruction of the speech signal.
Abstract: A new efficient algorithm for quantizing the spectral information for a pitch-synchronous CELP (PSCELP) speech coder is proposed. LPC analysis in the PSCELP is carried out once per pitch period. Direct quantization of the pitch synchronous LSF vectors would lead to a variable-rate codec, which is inconsistent with the objective of achieving a fixed-rate speech coder operating at 4 kb/s. Hence, a linear trajectory of LSF vectors is selected which can be encoded by one LSF vector each 20 ms. This conversion exploits the high correlation between successive pitch periods of the LSF parameters to achieve joint quantization. A coding rate of 1.2 kb/s is achieved for the LSF information with no noticeable degradation. The proposed algorithm employs linear interpolation at the decoder to recover the spectral parameters for the individual pitch periods used in the pitch-synchronous reconstruction of the speech signal. The comparison simulation results show that this algorithm produces comparable performance to that of LSF's linear interpolation quantization in a time-synchronous CELP coder.

Proceedings ArticleDOI
17 Sep 2000
TL;DR: In this article, an LSP quantization design method for bandwidth scalable coders such as the MPEG-4 CELP coder is presented, where the LSP parameters are quantized using both interframe and intra-frame predictors.
Abstract: This paper presents an LSP quantization design method for bandwidth scalable coders such as the MPEG-4 CELP coder. In the enhancement layer of these coders, the LSP parameters are quantized using both interframe and intraframe predictors. The proposed design algorithm enables us to jointly optimize these predictors. Objective and subjective test results show that the quantizer obtained with the proposed algorithm provides better performance than that used in the MPEG-4 CELP.

Proceedings ArticleDOI
17 Sep 2000
TL;DR: A novel post-processing technique is proposed to improve the coding quality of CELP under background noise that adaptively smoothes both the spectral envelope and the energy of the estimated excitation signal to reduce their temporal fluctuations, which cause the perceptual degradation.
Abstract: This paper proposes a novel post-processing technique to improve the coding quality of CELP under background noise. It adaptively smoothes both the spectral envelope and the energy of the estimated excitation signal to reduce their temporal fluctuations, which cause the perceptual degradation. The excitation signal is calculated using the synthesized signal and the spectral parameters given from the decoder. Thus, the proposed post-processing is performed separately from the decoder. The smoothing is applied only in non-speech periods and the smoothing strength is controlled depending on the characteristics of the synthesized signal to avoid the degradation in speech and non-stationary noise periods. Subjective test results show that the proposed post-processing improves degradation mean opinion score (DMOS) by 0.2 to 0.4 for noisy speech signals, which are coded by the GSM adaptive multi-rate (AMR) codec.

Proceedings ArticleDOI
24 Sep 2000
TL;DR: The cross-correlation technique is described that can be used to obtain the pitch information more accurately than the auto-correlations method for certain speech samples while having the advantage of requiring less computation.
Abstract: To generate high quality speech using the linear predictive coding (LPC) technique, a method for detecting pitch contour is critical since the human ear is sensitive to small pitch variation in speech. The auto-correlation method, though simple to implement with digital signal processors (DSPs), can result in perceptible unnaturalness. This paper describes the cross-correlation technique that can be used to obtain the pitch information more accurately than the auto-correlation method for certain speech samples. Experimental results illustrate the pitch contour detected using both techniques. In general, the cross-correlation method generates less error than the auto-correlation method for pitch determination in an LPC scheme while having the advantage of requiring less computation.

Proceedings ArticleDOI
05 Jun 2000
TL;DR: Modifications to the pitch estimation, LP analysis/synthesis and post filtering stages of the original MELP model are discussed to achieve a reasonably good subjective quality for the decoded speech while maintaining a low operating bit rate.
Abstract: This paper presents our study on the feasibility and effectiveness of using the MELP (mixed excitation linear prediction) model for coding wideband (7 kHz) speech signals at a transmission bit rate of 8 kbps. In order to achieve a reasonably good subjective quality for the decoded speech while maintaining a low operating bit rate at the same time, modifications to the pitch estimation, LP analysis/synthesis and post filtering stages of the original MELP model are discussed. Informal listening tests show that the subjective quality of the decoded speech of the proposed coder is rated to be slightly better than the MPEG-4 CELP coder operating at 14.4 kbps for both male and female utterances. The subjective quality of the decoded female utterances from the proposed coder operating at 8.4 kbps is rated to be comparable to that produced by the ITU G.722 coder operating at 48 kbps.

Patent
21 Dec 2000
TL;DR: In this article, a method and system for improving performance of an echo canceller with low additional complexity is presented, which discloses using information from internal variables of Code Excited Linear Prediction (CELP) based codecs in a digital communication network to significantly improve the rate of convergence of the echo cancellation.
Abstract: A method and system for improving performance of an echo canceller with low additional complexity. Specifically, the present invention discloses using information from internal variables of Code Excited Linear Prediction (CELP) based codecs in a digital communication network to significantly improve the rate of convergence of the echo canceller. In one embodiment of the present invention, an error signal associated with a voice signal is filtered by a transversal filter using Linear Predictive Coefficients (LPC) coefficients to provide filter transfer functions for a Filtered-X Least Mean Squares algorithm. Additionally, an adaptive filter applies the Filtered-X Least Mean Squares algorithm using the pre-filtered voice signal available in a CELP-based decoder to create a synthetic echo signal which is subtracted from the echo signal for attenuation.

PatentDOI
TL;DR: The reception terminal receives a code series from the communication path and the separator separates the code series into a speech code series and text information, which is inputted to the synthesizer and synthesized into speech sound.
Abstract: The reception terminal receives a code series from the communication path. The separator separates the code series into a speech code series and text information. The speech code series is decoded into a pitch period, a LSP coefficient, and code numerals by the synthesizer to reproduce the speech sound in the CELP system. Also, the text information is converted into pronunciation and accent information by the language analyzer and added to prosody information, such as phoneme time length and pitch pattern by the prosody generator. The LSP coefficient, and code numerals suitable for the phoneme are read from the segment database and the pitch frequency from the prosody information is inputted to the synthesizer and synthesized into speech sound.

Patent
Gao Yang1
30 Jun 2000
TL;DR: In this paper, a bi-directional pitch enhancement system for speech coding systems is proposed, which employs forward pitch enhancement and backward pitch enhancement to maintain a high perceptual quality in reproduced speech.
Abstract: A bi-directional pitch enhancement system for speech coding systems. As speech data applications continue to operate in areas having intrinsic bandwidth limitations, the perceptual quality of reproduced speech data in typical speech coding systems suffers significantly. The present invention employs forward pitch enhancement and backward pitch enhancement to maintain a high perceptual quality in reproduced speech. If desired, the backward pitch enhancement is generated using the forward pitch enhancement itself with the backward pitch enhancement being a mirror image of the forward pitch enhancement that was previously generated. Alternatively, in other embodiments of the invention, the backward pitch enhancement is generated independent of the forward pitch enhancement. The backward pitch enhancement is usually performed on the fixed codebook in code excited linear prediction (CELP) or is performed as post-processing in the decoder.

01 Jan 2000
TL;DR: In this paper, an energy weighted interpolation technique was proposed to improve the performance of low bit rate speech coders by interpolating the linear predictive coefficients with different representations (LSF, RC, LAR, AC).
Abstract: Speech coding algorithms have different dimensions of performance. Among them, speech quality and average bit rate are the most important performance aspects. The purpose of the research is to improve the speech quality within the constraint of a low bit rate. Most of the low bit rate speech coders employ linear predictive coding (LPC) that models the short-term spectral information as an all-pole filter. The filter coefficients are called linear predictive (LP) coefficients. The LP coefficients are obtained from standard linear prediction analysis, based on blocks of input samples. In transition segments, a large variation in energy and spectral characteristics can occur in a short time interval. Therefore, there will be a large change in the LP coefficients in consecutive blocks. Abrupt changes in the LP parameters in adjacent blocks can introduce clicks in the reconstructed speech. Interpolation of the filter coefficients results in a smooth variation of the interpolated coefficients as a function of time. Thus, the interpolation of the LP coefficients in the adjacent blocks provides improved quality of the synthetic speech without using additional information for transmission. The research focuses on developing algorithms for interpolating the linear predictive coefficients with different representations (LSF, RC, LAR, AC). The LP analysis has been simulated; and its performance has been compared by changing the parameters (LP order, frame length, window offset, window length). Experiments have been performed on the subframe length and the choice of representation of LP coefficients for interpolation. Simulation results indicate that speech quality can be improved by energy weighted interpolation technique.

Journal ArticleDOI
TL;DR: This paper presents a scalable three bit- rates (8, 14.1 and 24 kbit/s) coder, that both improves narrowband quality and extends the band to [50-7000Hz], transform coding techniques are used.
Abstract: This paper presents a scalable three bit- rates (8, 14.1 and 24 kbit/s) coder. For the two embedded lowest ones, operating in the telephone bandwidth, celp coding techniques are used. For the highest rate, that both improves narrowband quality and extends the band to [50-7000Hz], transform coding techniques are used. The main applications deal with transmission over network with no guaranteed QoS.

Proceedings ArticleDOI
05 Jun 2000
TL;DR: This paper presents a dispersed-pulse codebook for CELP coder that generates an excitation vector by convoluting dispersion vectors with signed pulses in an algebraic codevector and shows that the coding distortion with this codebook is smaller than that with angebraic codebook.
Abstract: This paper presents a dispersed-pulse codebook for CELP coder. This codebook generates an excitation vector by convoluting dispersion vectors with signed pulses in an algebraic codevector. The dispersion vectors are obtained through training so the coding distortion to be reduced. An objective evaluation result shows that the coding distortion with this codebook is smaller than that with an algebraic codebook. The dispersed-pulse codebook is applied to a 4 kb/s CELP coder. Subjective evaluation results show that: (1) the fundamental performance of the 4 kb/s coder is equivalent to that of G.726 32 kb/s coder, and (2) the performances of the 4k b/s coder under some error and background noise conditions are equivalent to those of G.729 8 kb/s coder.

Proceedings ArticleDOI
05 Jun 2000
TL;DR: A method of incorporating simultaneous masking into the calculation of the linear predictor coefficients requires only a modest increase in computational complexity and results in a filter that better models the formants of the input speech spectrum.
Abstract: Whilst linear prediction is the cornerstone of most modern speech coders, few of these coders incorporate the perceptual characteristics of hearing into the calculation of the linear predictor coefficients (LPCs). This paper proposes a method of incorporating simultaneous masking into the calculation of the LPCs. This modification requires only a modest increase in computational complexity and results in the linear predictor removing more perceptually important information from the input speech signal. This results in a filter that better models the formants of the input speech spectrum. The net effect is that an improvement in quality is achieved for a given bit rate or alternately a bit rate reduction can be achieved while maintaining perceived quality. These results have been confirmed through subjective listening tests.

Proceedings ArticleDOI
01 Jun 2000
TL;DR: An efficient representation of the stochastic codebook component using a pulse density of one pulse per 2 ms and signed magnitudes specified by 2 bits per pulse-pair is introduced and speech quality comparable to 8 kb/s G.729 is achieved.
Abstract: An important step toward achieving a high-quality 4 kb/s speech codec is reducing the coding-rate of the stochastic codebook component to near 2 kb/s. The increased reconstruction error in the residual that such low-rate quantization implies motivates the search for techniques that reduce the perceptibility of the errors in the reconstructed signal. Pitch-synchronous estimation of the linear-prediction filter and pitch-synchronous updating of the adaptive codebook reduce the coefficient-estimation error and increase the relative contribution of the adaptive codebook component to the synthesized signal, thereby reducing audible noise. However, pitch synchronous analysis normally results in a variable-rate coder. To obtain a fixed-rate representation, we introduce an efficient representation of the stochastic codebook component using a pulse density of one pulse per 2 ms and signed magnitudes specified by 2 bits per pulse-pair. The resulting reconstructions are evaluated for CELP coders corresponding to classical and generalized-pitch-predictor designs. In both cases speech quality comparable to 8 kb/s G.729 is achieved.

Journal ArticleDOI
TL;DR: A joint position and amplitude search algorithm is proposed for algebraic multipulse codebooks to be used in code-excited linear predictive (CELP) coders.
Abstract: A joint position and amplitude search algorithm is proposed for algebraic multipulse codebooks to be used in code-excited linear predictive (CELP) coders. The joint search complexity is below one quarter that of the focused search and ranks below those of the G.729A and IS-641-A coders. Listening tests indicate an equivalence in perceived quality.