scispace - formally typeset
Search or ask a question

Showing papers on "Code-excited linear prediction published in 2003"


MonographDOI
18 Apr 2003

257 citations


Patent
Milan Jelinek1
18 Dec 2003
TL;DR: In this article, a method and device for quantizing linear prediction parameters in variable bit-rate sound signal decoding is proposed, in which at least one quantization index and information about classification of a sound signal frame corresponding to the quantization indices are received, a prediction vector is reconstructed, and a linear prediction parameter vector is produced in response to the recovered prediction error vector and the reconstructed prediction vector.
Abstract: The present invention relates to a method and device for quantizing linear prediction parameters in variable bit-rate sound signal coding, in which an input linear prediction parameter vector is received, a sound signal frame corresponding to the input linear prediction parameter vector is classified, a prediction vector is computed, the computed prediction vector is removed from the input linear prediction parameter vector to produce a prediction error vector, and the prediction error vector is quantized. Computation of the prediction vector comprises selecting one of a plurality of prediction schemes in relation to the classification of the sound signal frame, and processing the prediction error vector through the selected prediction scheme. The present invention further relates to a method and device for dequantizing linear prediction parameters in variable bit-rate sound signal decoding, in which at least one quantization index and information about classification of a sound signal frame corresponding to the quantization index are received, a prediction error vector is recovered by applying the index to at least one quantization table, a prediction vector is reconstructed, and a linear prediction parameter vector is produced in response to the recovered prediction error vector and the reconstructed prediction vector. Reconstruction of the prediction vector comprises processing the recovered prediction error vector through one of a plurality of prediction schemes depending on the frame classification information.

51 citations


Patent
08 Jan 2003
TL;DR: In this paper, a method for transcoding a CELP-based compressed voice bitstream from source codec to destination codec is proposed, which includes processing a source codec input cELP bitstream to unpack at least one or more CELPs from the input bitstream and interpolating a plurality of unpacked cELPs.
Abstract: A method for transcoding a CELP based compressed voice bitstream from source codec to destination codec. The method includes processing a source codec input CELP bitstream to unpack at least one or more CELP parameters from the input CELP bitstream and interpolating one or more of the plurality of unpacked CELP parameters from a source codec format to a destination codec format if a difference of one or more of a plurality of destination codec parameters including a frame size, a subframe size, and/or sampling rate of the destination codec format and one or more of a plurality of source codec parameters including a frame size, a subframe size, or sampling rate of the source codec format exist. The method includes encoding the one or more CELP parameters for the destination codec and processing a destination CELP bitstream by at least packing the one or more CELP parameters for the destination codec.

50 citations


Patent
Ari Heikkinen1, Ari Lakaniemi1
11 Feb 2003
TL;DR: In this article, a speech decoder that is responsive to the synchronization delay adjustment request for executing a time-warping operation for one of lengthening or shortening a duration of a speech frame is presented.
Abstract: A device is disclosed that makes packetized and encoded speech data audible to a listener, as is a method for operating the device. The device includes a unit for generating a synchronization request for reducing an amount of synchronization delay, and further includes a speech decoder that is responsive to the synchronization delay adjustment request for executing a time-warping operation for one of lengthening or shortening a duration of a speech frame. In one embodiment the speech decoder comprises a code excited linear prediction (CELP) speech decoder, and the CELP decoder time-warping operation is applied to a reconstructed excitation signal u(k) to derive a time-warped reconstructed signal u w (k). The time-warped reconstructed signal u w (k) is input to a Linear Predictor (LP) synthesis filter to derive a CELP decoder time-warped output signal y ^ w (k). In another embodiment the speech decoder comprises a parametric speech decoder, and where an adaptation of the frame length N in the parametric speech decoder results in the use of a modified frame length N w .

47 citations


Reference EntryDOI
15 Apr 2003
TL;DR: An overview of the most widely used algorithms, standards, and applications of wideband and narrowband speech coding, including perceptually transparent multi-rate and embedded coding, is presented.
Abstract: In this chapter, we present an overview of the most widely used algorithms, standards, and applications of wideband and narrowband speech coding. Algorithms for speech coding are classified into four broad headings: (1) waveform coding techniques (including PCM, companded PCM, and DPCM), which are typically used for landline telephony, internet telephony, and secure military communications; (2) subband coding, including perceptually transparent multi-rate and embedded coding which is mainly used for internet and digital audio applications; (3) linear predictive analysis by synthesis coding (LPC-AS) algorithms, including multipulse LPC, CELP, SELP, VSELP, and low-delay CELP, which are typically used for digital cellular and telephony; and (4) LPC vocoders, including advanced vocoder algorithms (e.g., MELP, MBE, and PWI) are used for applications such as secure telephony and satellite telephony. Applications in areas such as voiceover IP (VoIP) and digital cellular are emerging and require a speech coder to gracefully adapt to rapidly changing channel conditions—a need that is met by embedded and multirate speech coders associated with joint source-channel coding algorithms. Measures of speech coder perceptual quality include subjective measures of intelligibility (DRT and DALT) and naturalness (MOS and DAM), as well as objective measures such as segmental SNR, Bark spectral distortion, PSQM, and PESQ. Speech coding standards are set by organizations including the ITU (for landline telephony), MPEG (for multimedia applications), ETSI (for European digital cellular), TIA (for U.S. digital cellular), and DDVPC (for United States military applications). Keywords: speech coding; PCM; subband coding; CELP; LPC; digital cellular; multimedia; voiceover IP; mean opinion score (MOS)

40 citations


Patent
05 Feb 2003
TL;DR: In this article, a data processing apparatus capable of obtaining high-quality sound data was proposed for mobile phones for transmitting and receiving speech data, which can be used to transmit and receive speech data.
Abstract: The present invention relates to a data processing apparatus capable of obtaining high-quality sound data. A tap generation section 121 generates a prediction tap used for a process in a prediction section 125 by extracting decoded speech data in a predetermined positional relationship with subject data of interest within the decoded speech data such that coded data is decoded by a CELP method and by extracting an I code located in a subframe according to a position of the subject data in the subject subframe. Similarly to the tap generation section 122, a tap generation section 122 generates a class tap used for a process in a classification section 123. The classification section 123 performs classification on the basis of the class tap, and a coefficient memory 124 outputs a tap coefficient corresponding to the classification result. The prediction section 125 performs a linear prediction computation by using the prediction tap and the tap coefficient and outputs high-quality decoded speech data. The present invention can be applied to mobile phones for transmitting and receiving speech.

39 citations


Patent
24 Oct 2003
TL;DR: In this paper, an apparatus and method for mapping CELP parameters between a source codec and a destination codec is presented, which consists of an LSP mapping module, an adaptive codebook mapping module coupled to the LSP, and a fixed codebook map module coupled with the LP overflow module.
Abstract: An apparatus and method for mapping CELP parameters between a source codec and a destination codec. The apparatus includes an LSP mapping module, an adaptive codebook mapping module coupled to the LSP mapping module, and a fixed codebook mapping module coupled to the LSP mapping module and the adaptive codebook mapping module. The LSP mapping module includes an LP overflow module and an LSP parameter modification module. The adaptive codebook mapping module includes a first pitch gain codebook. The fixed codebook mapping module includes a first target processing module, a pulse search module, a fixed codebook gain estimation module, a pulse position searching module.

39 citations


Patent
28 Jul 2003
TL;DR: In this article, a method for speech processing in a code excitation linear prediction (CELP) based speech system having a plurality of modes including at least a first mode and a consecutive second mode was proposed.
Abstract: A method for speech processing in a code excitation linear prediction (CELP) based speech system having a plurality of modes including at least a first mode and a consecutive second mode. The method includes providing an input speech signal, dividing the speech signal into a plurality of frames, dividing at least one of the plurality of frames into sub-frames including a plurality of pulses, selecting a first number of pulses for the first mode, with a second number of remaining pulses in the frame plus the first number of pulses in the first mode for the second mode, providing a plurality of sub-modes between the first mode and the second mode, forming a base layer, forming an enhancement layer, generating a bit stream including a basic bit stream and an enhancement bit stream, wherein the basic bit stream is used to update memory states of the speech system.

36 citations


Proceedings ArticleDOI
15 Dec 2003
TL;DR: In this paper, the improved linear predictive coding (LPC) coefficients of the frame are employed in the feature extraction method and it is found that the improved LPCfeature extraction method is quite efficient.
Abstract: In this paper, the improved linear predictive coding (LPC) coefficients of the frame are employed in the feature extraction method. In the proposed speech recognition system, the static LPC coefficients + dynamic LPC coefficients of the frame were employed as a basic feature. The framework of linear discriminant analysis (LDA) is used to derive an efficient and reduced-dimension speech parametric speech vector space for the speech recognition system. Using the continuous hidden Markov model (HMM) as the speech recognition model, the speech recognition system was successfully constructed. Experiments are performed on the isolated-word speech recognition task. It is found that the improved LPC feature extraction method is quite efficient.

29 citations


Patent
23 Oct 2003
TL;DR: In this article, a method and apparatus for DTMF detection and voice mixing in the code-excited linear prediction (CELP) parameter space, without fully decoding and reconstructing the speech signal, is presented.
Abstract: A method and apparatus for DTMF detection and voice mixing in the code-excited linear prediction (CELP) parameter space, without fully decoding and reconstructing the speech signal. The apparatus includes a Dual Tone Multiplexed Frequency (DTMF) signal detection module and a multi-input mixing module. The DTMF signal detection module detects DTMF signals by computing characteristic features from the input CELP parameters and comparing them with known features of DTMF signals. The multi-input mixing module mixes multiple sets of input CELP parameters, that represent multiple voice signals, into a single set of CELP parameters. The mixing computation is performed by analyzing each set of input CELP parameters, determining the order of importance of the input sets, selecting a strategy for mixing the CELP parameters, and outputting the mixed CELP parameters. The method includes inputting one or more sets of CELP parameters and external commands, detecting DTMF tones, mixing multiple sets of CELP parameters and outputting the DTMF signal, if detected, and the mixed CELP parameters.

28 citations


Patent
12 Mar 2003
TL;DR: In this paper, an apparatus for processing adaptive codebook pitch lag from one CELP-based standard to another one based on the same coding scheme is presented. But the pitch lag selection module is adapted to select the desired pitch lag parameter.
Abstract: An apparatus for processing adaptive codebook pitch lag from one CELP based standard to another CELP based standard. The apparatus has various modules that perform at least the functionality described herein. The apparatus includes a time-base subframe checker inspection module, which is adapted to associate one or more incoming subframes with an outgoing subframes of a destination codec. The apparatus also has a decision module coupled to the time-base subframe inspection module. The decision module is adapted to determine a desired pitch lag parameter from a plurality of pitch lag parameters among respective two or more incoming subframes. The apparatus has a pitch lag selection module coupled to the decision module. The pitch lag selection module is adapted to select the desired pitch lag parameter.

Book
01 Jan 2003
TL;DR: This paper presents a review of Linear Algebra: Orthogonality, Basis, Linear Independence, and the Gram-Schmidt Algorithm, as well as some properties of Line Spectral Frequency, which are used in the CELP Predictor.
Abstract: Preface. Acronyms. Notation. Introduction. Signal Processing Techniques. Stochastic Processes and Models. Linear Prediction. Scalar Quantization. Pulse Code Modulation and Its Variants. Vector Quantization. Scalar Quantization of Linear Prediction Coefficient. Linear Prediction Coding. Regular-Pulse Excitation Coders. Code-Excited Linear Prediction. The Federal Standard Version of CELP. Vector Sum Excited Linear Prediction. Low-Delay CELP. Vector Quantization of Linear Prediction Coefficient. Algebraic CELP. Mixed Excitation Linear Prediction. Source-Controlled Variable Bit-Rate CELP. Speech Quality Assessment. Appendix A: Minimum-Phase Property of the Forward Prediction-Error Filter. Appendix B: Some Properties of Line Spectral Frequency. Appendix C: Research Directions in Speech Coding. Appendix D: Linear Combiner for Pattern Classification. Appendix E: CELP: Optimal Long-Term Predictor to Minimize the Weighted Difference. Appendix F: Review of Linear Algebra: Orthogonality, Basis, Linear Independence, and the Gram-Schmidt Algorithm. Bibliography. Index.

Patent
08 Jan 2003
TL;DR: In this article, the authors propose a method to decode a CELP-based compressed voice bitstream from source to destination codec by unpacking the parameters from the input CelP bistream and interpolating the unpacked parameters from a difference of destination codec parameters and source codec parameters.
Abstract: Transcoding a CELP based compressed voice bitstream from source codec to destination codec relate to embodiments of a system and method. The method includes processing a source codec input bitstream to unpack (1) CELP parameters from the input CELP bistream and may interpolate (2) the unpacked CELP parameters from is a difference of destination codec parameters and source codec parameters exists. If the method maps (4) CELP from source codec format to a destination codec format, the parameter mapping strategy may be singly preset or selected (3). The method inludes encoding the CELP parameters for the destination codec and processing a destination CELP bitstream by packing (7) the CELP parameters for the destination codec.

Patent
06 Nov 2003
TL;DR: In this article, a transcoding apparatus and method between CELP-based codecs using bandwidth extension is provided, which can reduce degradation of voice quality, delay and computational load, and by additionally generating information corresponding to the high band of wideband voice, enables high quality voice communications between networks having different bandwidths.
Abstract: A transcoding apparatus and method between CELP-based codecs using bandwidth extension are provided. The transcoding apparatus between CELP-based codes using bandwidth extension comprises a formant parameter converter which extracts formant parameters in a narrowband CELP format from an input narrowband bitstream, and converts the extracted CELP format formant parameters into formant parameters in a wideband CELP format; an excitation signal parameter converter which converts excitation signal parameters in a narrowband CELP format of an input narrowband bitstream, into excitation signal parameters in a wideband CELP format; and a quantizer which quantizes the wideband CELP format formant parameters converted in the formant parameter converter and the wideband CELP formant excitation signal parameter converted in the excitation signal parameter converter, respectively, in an output CELP format. The transcoding apparatus can reduce degradation of voice quality, delay, and computational load, and by additionally generating information corresponding to the high band of wideband voice, enables high quality voice communications between networks having different bandwidths.

Patent
17 Oct 2003
TL;DR: An apparatus and method for encoding and decoding a voice signal is described in this article, which includes an encoder configured to generate an output bitstream signal from an input voice signal.
Abstract: An apparatus and method for encoding and decoding a voice signal. The apparatus includes an encoder configured to generate an output bitstream signal from an input voice signal. The output bitstream signal is associated with at least a first standard of a first plurality of CELP voice compression standards. Additionally, the apparatus includes a decoder configured to generate an output voice signal from an input bitstream signal. The input bitstream signal is associated with at least a first standard of a second plurality of CELP voice compression standards. The CELP encoder includes a plurality of codec-specific encoder modules. Additionally, the CELP encoder includes a plurality of generic encoder modules. The CELP decoder includes a plurality of codec-specific decoder modules. Additionally, the CELP decoder includes a plurality of generic decoder modules.

Journal ArticleDOI
TL;DR: The pitch-adaptive method is adopted in the design of a novel multimode variable-rate speech coder applicable to CDMA-based cellular telephony and yields excellent voice quality and intelligibility at average bit-rates in the range of 2.5-4.0 kbps.
Abstract: A novel paradigm based on pitch-adaptive windows is proposed for solving the problem of encoding the fixed codebook excitation in low bit-rate CELP coders. In this method, the nonzero excitation in the fixed codebook is substantially localized to a set of time intervals called windows. The positions of the windows are adaptive to the pitch peaks in the linear prediction residual signal. Thus, high coding efficiency is achieved by allocating most of the available FCB bits to the perceptually important segments of the excitation signal. The pitch-adaptive method is adopted in the design of a novel multimode variable-rate speech coder applicable to CDMA-based cellular telephony. Results demonstrate that the adaptive windows method yields excellent voice quality and intelligibility at average bit-rates in the range of 2.5-4.0 kbps.

Proceedings ArticleDOI
06 Apr 2003
TL;DR: The perceptual evaluation of speech quality (PESQ) and AB preference tests under various packet loss conditions verify that the proposed algorithm is superior to the concealment algorithm embedded in the G.729 standard speech coder.
Abstract: We propose a packet loss concealment algorithm for a code-excited linear prediction (CELP) speech coder. We perform a time-scale modification (TSM) using a waveform similarity overlap-add (WSOLA) technique to reconstruct the excitation signal of the lost or dropped frames. In addition, when a lost frame is classified as a voiced, an adaptive codebook gain and a fixed codebook gain are estimated by a modified gain parameter re-estimation (GRE) technique. By applying these techniques, we can reduce quality degradation of the decoded speech and error propagation effect through the adaptive codebook memory. We apply the proposed scheme to the ITU-T G.729 standard speech coder to evaluate the performance of the proposed method. The perceptual evaluation of speech quality (PESQ) and AB preference tests under various packet loss conditions verify that the proposed algorithm is superior to the concealment algorithm embedded in the G.729.

Patent
30 Oct 2003
TL;DR: In this paper, an apparatus for trans-coding between code excited linear prediction (CELP) type codecs with different bandwidths, including a format parameter translating unit for generating output formant parameters and a formant parameter quantizing unit for receiving the output format formant coefficients.
Abstract: The present invention overcomes problems of tandem coding method such as degradation of speech quality, increased system latency and computations. An apparatus for trans-coding between code excited linear prediction (CELP) type codecs with different bandwidths, includes: a format parameter translating unit for generating output formant parameters by translating formant parameters from input CELP format to output CELP format; a formant parameter quantizing unit for receiving the output format formant parameters and quantizing the output format formant filter coefficients; an excited parameter translating unit for generating output excitation parameters by translating excitation parameters from input CELP format to output CELP format; and an excitation quantizing unit for receiving the output format excitation parameters and quantizing the output format excitation parameters.

Journal ArticleDOI
TL;DR: Ratings of speech coder performance indicated that the listeners are less sensitive to coder-induced distortions with abnormal speech samples, and that MELP, FS1015 LPC and to a certain extent FS1016 CELP exhibited degraded performance with speech samples from disordered talkers.

Patent
26 Sep 2003
TL;DR: In this paper, a fixed codebook search with low complexity used in a sound codec according to the Code Excited Linear Prediction (CELP) coding algorithm is presented. But the method is not suitable for fixed codebooks.
Abstract: There are provided a method and apparatus for fixed codebook search with low complexity used in a sound codec according to the Code Excited Linear Prediction (CELP) coding algorithm. The method includes: calculating absolute values of pulse position likelihood estimation vectors for respective pulse positions for each track in a plurality of tracks; selecting a predetermined number of the pulse position for each track in a descending order of the absolute values of the pulse position likelihood estimation vectors; selecting one pulse position among the selected pulse positions for each track, per each track, creating all possible pulse position combinations consisting of the selected pulse positions, and conducting complete search for the all possible pulse position combinations; and selecting one pulse position combination among the all possible pulse position combinations subjected to the complete search. Therefore, it is possible to significantly reduce the calculation amount required for fixed codebook search of a sound codec.

Proceedings Article
01 Jan 2003
TL;DR: A gradient-descent based optimization procedure is applied to the window sequence used for linear prediction (LP) analysis of the ITU-T G.729 CS-ACELP coder, and an optimization strategy is described to find the line spectral frequency (LSF) interpolation factor.
Abstract: A gradient-descent based optimization procedure is applied to the window sequence used for linear prediction (LP) analysis of the ITU-T G.729 CS-ACELP coder. By replacing the original window of the standard by the optimized versions, similar subjective quality is obtainable at reduced computational cost and / or lowered coding delay. In addition, an optimization strategy is described to find the line spectral frequency (LSF) interpolation factor. The t gradie reflec outlin w[k],

Journal ArticleDOI
TL;DR: An efficient block-based trellis quantization (BTQ) scheme is proposed for the quantization of the line spectral frequencies (LSF) in speech coding applications and results demonstrate that the proposed BTQ schemes outperform the above systems.
Abstract: An efficient block-based trellis quantization (BTQ) scheme is proposed for the quantization of the line spectral frequencies (LSF) in speech coding applications. The scheme is based on the modeling of the LSF intraframe dependencies with a trellis structure. The ordering property and the fact that LSF parameters are bounded within a range is explicitly incorporated in the trellis model. BTQ search and design algorithms are discussed and an efficient algorithm for the index generation (finding the index of a path in the trellis) is presented. Also the sequential vector decorrelation technique is presented to effectively exploit the intraframe correlation of LSF parameters within the trellis. Based on the proposed block-based trellis quantizer, two intraframe schemes and one interframe scheme are proposed. Comparisons to the split-VQ, the trellis coded quantization of LSF parameters, and the multi-stage VQ, as well as the interframe scheme used in IS-641 EFRC and the GSM AMR codec are provided. These results demonstrate that the proposed BTQ schemes outperform the above systems.

Patent
06 Nov 2003
TL;DR: In this paper, a CELP encoder is provided that optimizes excitation vector-related parameters in a more efficient manner than the encoders of the prior art.
Abstract: A CELP encoder is provided that optimizes excitation vector-related parameters in a more efficient manner than the encoders of the prior art. In one embodiment, a CELP encoder (400) optimizes excitation vector-related parameters (τ, β, λ, and η) based on a computed correlation matrix (Ζ'), which matrix is in turn based on a filtered first excitation vector (yτ(n)). The encoder then evaluates error minimization criteria based on at least in part on a target signal (xw(n)), which target signal is based on an input signal (s(n)), and the correlation matrix and generates a excitation vector-related index in response to the error minimization criteria. In another embodiment, a CELP encoder (600) is provided that is capable of jointly optimizing and/or sequentially optimizing multiple excitation vector-related parameters by reference to a joint search weighting factor (μ), thereby invoking an optimal error minimization process.

Proceedings ArticleDOI
06 Apr 2003
TL;DR: A novel transcoding algorithm for the adaptive multi rate (AMR) codec and the enhanced variable rate codec (EVRC) is proposed, which transcodes the parameters of one codec to the other without synthesizing the speech.
Abstract: A novel transcoding algorithm for the adaptive multi rate (AMR) codec and the enhanced variable rate codec (EVRC) is proposed. In contrast to the conventional tandem transcoding algorithm, the proposed algorithm transcodes the parameters of one codec to the other without synthesizing the speech. The proposed algorithm decodes the parameters of source codec from the input bitstream, and based on frame classification and mode decision, it appropriately transforms the parameters of source codec to those of the target codec in the parametric domain. Finally, the transformed parameters are encoded into a bitstream that is decodable by the target codec. The parameters transcoded by the proposed algorithm are line-spectral pair (LSP), pitch delay, fixed codevector, codebook gains, and frame energy. Evaluation results show that while reducing both the computational complexity and delay by 50%, the proposed algorithm produces speech quality equivalent to that of produced by the tandem transcoding algorithm. The general idea is not restricted to the AMR and EVRC but is applicable to various other code-excited linear prediction (CELP) based codecs.

Journal ArticleDOI
TL;DR: It is shown that a new fast converging adaptive algorithm yields a more accurate estimate of the spectral envelope of the speech spectrum by minimizing COSH distance rather than the Itakura-Saito distance measure.
Abstract: In this letter, a new discrete spectral modeling method is proposed based on a comparison of distance measure performance in an adaptive filtering context. It is shown that a new fast converging adaptive algorithm yields a more accurate estimate of the spectral envelope of the speech spectrum by minimizing COSH distance rather than the Itakura-Saito distance measure. We apply discrete spectral all-pole modeling to code-excited linear predictive coding by refining the short-term synthesis filter coefficients originally obtained by the linear prediction method. Simulation results show the enhanced harmonic and formant structure in the speech spectrum and that better speech quality is obtained.

Proceedings ArticleDOI
06 Apr 2003
TL;DR: This work introduces the FGS feature to the code excited linear prediction (CELP) based speech coding algorithm by adjusting the amount of transmitted fixed excitation information and improves the algorithm by relaxing the constraints and re-ordering the sequence of pulses.
Abstract: General audio and video with fine granularity scalability (FGS) has become favored in next generation multimedia coding standards due to its high flexibility in channel rate adaption. However, the FGS phenomenon has not yet been fitted into existing speech codecs. We introduce the FGS feature to the code excited linear prediction (CELP) based speech coding algorithm by adjusting the amount of transmitted fixed excitation information. We further improve the algorithm by relaxing the constraints and re-ordering the sequence of pulses. To achieve this target, we need to make modifications to the conventional coding algorithm, but the computation overhead is little and affected modules are few. As a consequence, developers can, in a short time, easily migrate their existing codec to one with the FGS advantage.

Journal ArticleDOI
N.R. Chong-White1, R.V. Cox
TL;DR: The technique is applied as a preprocessor to improve the intelligibility of the mixed excitation linear prediction (MELP) coder and achieves a level of speech intelligibility comparable to the 8 kb/s G.729 Annex A coder.
Abstract: We recently proposed a technique to increase the robustness of perceptually important acoustic cues in speech, based on modification of the phoneme timing structure. We now apply the technique as a preprocessor to improve the intelligibility of the mixed excitation linear prediction (MELP) coder. The enhancement strategy, however, requires adaptations to minimize the introduction of unnatural speech characteristics that are introduced and intensified by parametric coding schemes. The enhanced 2.4 kb/s fixed-point MELP coder achieves a level of speech intelligibility comparable to the 8 kb/s G.729 Annex A coder.

01 Jan 2003
TL;DR: This thesis introduces a novel method for accurate pitch detection and speech segmentation, named Multi-feature, Autocorrelation (ACR), Wavelet Technique (MAWT), which uses feature extraction, and ACR applied on Linear Predictive Coding residuals, with a wavelet-based refinement step.
Abstract: This thesis introduces a novel method for accurate pitch detection and speech segmentation, named Multi-feature, Autocorrelation (ACR) and Wavelet Technique (MAWT). MAWT uses feature extraction, and ACR applied on Linear Predictive Coding (LPC) residuals, with a wavelet-based refinement step. MAWT opens the way for a unique approach to modeling: although speech is divided into segments, the success of voicing decisions is not crucial. Experiments demonstrate the superiority of MAWT in pitch period detection accuracy over existing methods, and illustrate its advantages for speech segmentation. These advantages are more pronounced for gain-varying and transitional speech, and under noisy conditions.

Journal ArticleDOI
TL;DR: This paper proposes a novel approach, bandwidth-adjusted linear predictive coding (BLPC) analysis, for robust speech recognition, that estimates and adjusts the dispersion of formant bandwidths according to the maximum likelihood criteria.

Patent
24 Sep 2003
TL;DR: In this article, a combined, fixed codebook searching method and apparatus used in a code excited linear prediction (CELP) speech codec is presented, which includes searching for a fixed code book using a full search method that searches for the fixed codebooks at all pulse positions.
Abstract: Provided are a combined, fixed codebook searching method and apparatus used in a code excited linear prediction (CELP) speech codec. The method is used in a code excited linear prediction (CELP) speech codec, and includes searching for a fixed codebook using a full search method that searches for the fixed codebook at all pulse positions; selecting a fixed codebook searching method by counting the number of users who are accessing a gateway, comparing the number of users with a predetermined threshold, and selecting a proper fixed codebook searching method based on the result of comparison; searching for the fixed codebook using the selected fixed codebook searching method; and checking whether the search for the fixed codebook is complete for all tracks of the CELP speech codec, terminating a routine of searching for the fixed codebook when it is determined the search is complete for all the tracks, and selecting a fixed codebook searching method again in consideration of the number of gateway users when there remains a track to be searched for. Accordingly, a fixed codebook searching method is selected in consideration of the number of users who are accessing a gateway, thereby enabling an effective adjustment of either the quality of sound or the channel capacity of the gateway.