scispace - formally typeset
Search or ask a question

Showing papers on "Code-excited linear prediction published in 2008"


Patent
03 Nov 2008
TL;DR: In this article, a residual signal from a Code Excited Linear Prediction (CELP)-based encoding layer is obtained, where the residual signal is a difference between an original audio signal and a reconstructed version of the original signal, which is then transformed at a Discrete Cosine Transform (DCT) type transform layer to obtain a corresponding transform spectrum.
Abstract: Codebook indices for a scalable speech and audio codec may be efficiently encoded based on anticipated probability distributions for such codebook indices. A residual signal from a Code Excited Linear Prediction (CELP)-based encoding layer may be obtained, where the residual signal is a difference between an original audio signal and a reconstructed version of the original audio signal. The residual signal may be transformed at a Discrete Cosine Transform (DCT)-type transform layer to obtain a corresponding transform spectrum. The transform spectrum is divided into a plurality of spectral bands, where each spectral band having a plurality of spectral lines. A plurality of different codebooks are then selected for encoding the spectral bands, where each codebook is associated with a codebook index. A plurality of codebook indices associated with the selected codebooks are then encoded together to obtain a descriptor code that more compactly represents the codebook indices.

77 citations


Patent
Yuriy Reznik1, Pengjun Huang1
21 Oct 2008
TL;DR: In this paper, a scalable speech and audio codec is provided that implements combinatorial spectrum encoding, where a residual signal is obtained from a Code Excited Linear Prediction (CELP)-based encoding layer.
Abstract: A scalable speech and audio codec is provided that implements combinatorial spectrum encoding. A residual signal is obtained from a Code Excited Linear Prediction (CELP)-based encoding layer, where the residual signal is a difference between an original audio signal and a reconstructed version of the original audio signal. The residual signal is transformed at a Discrete Cosine Transform (DCT)-type transform layer to obtain a corresponding transform spectrum having a plurality of spectral lines. The transform spectrum spectral lines are transformed using a combinatorial position coding technique. The combinatorial position coding technique includes generating a lexicographical index for a selected subset of spectral lines, where each lexicographic index represents one of a plurality of possible binary strings representing the positions of the selected subset of spectral lines. The lexicographical index represents non-zero spectral lines in a binary string in fewer bits than the length of the binary string.

55 citations


Proceedings ArticleDOI
25 Aug 2008
TL;DR: ITU-T Embedded Variable Bit-Rate (EV-VBR) codec is presented, being standardized by Question 9 of Study Group 16 (Q9/16) as recommendation G.718, robust to significant rates of frame erasures or packet losses and several technologies are used to encode the MDCT coefficients for best performance both for speech and music.
Abstract: This paper presents ITU-T Embedded Variable Bit-Rate (EV-VBR) codec being standardized by Question 9 of Study Group 16 (Q9/16) as recommendation G.718. The codec provides a scalable solution for compression of 16 kHz sampled speech and audio signals at rates between 8 kbit/s and 32 kbit/s, robust to significant rates of frame erasures or packet losses. It comprises 5 layers where higher layer bitstreams can be discarded without affecting the lower layer decoding. The core layer takes advantage of signal-classification based CELP encoding. The second layer reduces the coding error from the first layer by means of additional pitch contribution and another algebraic codebook. The higher layers encode the weighted error signal from lower layers using MDCT transform coding. Sev-eral technologies are used to encode the MDCT coefficients for best performance both for speech and music. The codec performance is demonstrated with selected results from ITU-T Characterization test.

42 citations


Proceedings ArticleDOI
01 Nov 2008
TL;DR: This approach uses a template of a speakerpsilas normal phonated speech for extraction of excitation parameters such as pitch and gain, and then injects these estimated excitations into whispered signal to synthesize normal-sounding speech through the CELP codec.
Abstract: In the following paper, a method for the real-time conversion of whispers to normal phonated speech through a code excited linear prediction analysis-by-synthesis codec is discussed. This approach uses a template of a speakerpsilas normal phonated speech for extraction of excitation parameters such as pitch and gain, and then injects these estimated excitations into whispered signal to synthesize normal-sounding speech through the CELP codec. Furthermore, since restoring pitch to whispered speech requires some considerations of quality and accuracy, spectral enhancements are required in terms of formant shifting (LSPs modification) and pitch injection based on voiced/unvoiced decision. Spectral shifting is accomplished through line-spectral pair adjustment. Implementing such methods by using the popular CELP codec allows integration of the technique with any modern speech applications and devices. Subjective testing results are presented to determine the effectiveness of the technique.

31 citations


Proceedings ArticleDOI
12 May 2008
TL;DR: The Q9/16 codec is an embedded codec comprising 5 layers where higher layer bitstreams can be discarded without affecting the decoding of the lower layers, and has been designed with the primary objective of a high-performance wideband speech coding for error- prone telecommunications channels, without compromising the quality for narrowband/wideband speech or wideband music signals.
Abstract: We present the Q.EV-VBR winning candidate codec recently selected by Question 9 of Study Group 16 (Q9/16) of ITU-T as a baseline for the development of a scalable solution for wideband speech and audio compression at rates between 8 kb/s and 32 kb/s. The Q9/16 codec is an embedded codec comprising 5 layers where higher layer bitstreams can be discarded without affecting the decoding of the lower layers. The two lower layers are based on the CELP technology where the core layer takes advantage of signal classification based encoding. The higher layers encode the weighted error signal from lower layers using overlap-add transform coding. The codec has been designed with the primary objective of a high-performance wideband speech coding for error- prone telecommunications channels, without compromising the quality for narrowband/wideband speech or wideband music signals. The codec performance is demonstrated with selected test results.

25 citations


Patent
Tenkasi V. Ramabadran1
08 Oct 2008
TL;DR: In this paper, a communication system (100) includes devices (102, 104, 200) for transmitting and receiving digital audio, which use audio encoders (210, 804) and decoders(222, 916) such as ACELP or DCT/IDCT to compress and decompress audio.
Abstract: A communication system (100) includes devices (102, 104, 200) for transmitting and receiving digital audio. The devices use audio encoders (210, 804) and decoders (222, 916) such as ACELP or DCT/IDCT to compress and decompress audio and use arithmetic encoders (212) and decoders (220) to encode and decode the compressed audio on-the-fly (without a codebook of pre-stored codes).

22 citations


01 Jul 2008
TL;DR: A packet loss concealment (PLC) algorithm for CELP-type speech coders is proposed which improves the quality of decoded speech under burst packet loss conditions and provides significanlty better speech quality than the PLC of G.729, especially under burst packets losses.
Abstract: In this paper, a packet loss concealment (PLC) algorithm for CELP-type speech coders is proposed which improves the quality of decoded speech under burst packet loss conditions. The proposed PLC algorithm is based on the reconstruction of excitation by combining voiced excitation and random excitation, where the voice excitation is obtained from the adaptive codebook excitation scaled by a voicing probability and the random excitation is generated by permutating the previous decoded excitation. The voicing probability is estimated from the correlation using the decoded excitation and pitch of the previous frames. In addition, a linear regression-based gain amplitude is estimated and applied to the reconstructed excitation for the compensatation of the undesirable amplitude change under a burst packet loss condition. The proposed algorithm is implemented as a PLC algorithm for G.729 and its performance is compared with PLC employed in G.729 by means of perceptual evaluation of speech quality (PESQ), a waveform comparison, and an A-B preference test under random and burst packet loss rates of 3% and 5%. It is shown that the proposed algorithm provides significanlty better speech quality than the PLC of G.729, especially under burst packet losses.

14 citations



Proceedings ArticleDOI
01 Mar 2008
TL;DR: The experimental results show that the combined codec can achieve a performance close to that of iLBC at different loss conditions but with a smaller bit-rate, and scalability is achieved by modifying the number of inserted ACELP-coded frames.
Abstract: While VoIP (voice over IP) is gaining importance in comparison with other types of telephony, packet loss remains as the main source of degradation in VoIP systems. Traditional speech codecs, such as those based on the CELP (code excited linear prediction) paradigm, can achieve low bit-rates at the cost of introducing interframe dependencies. As a result, the effect of a packet loss burst is propagated to the frames correctly received after the burst. iLBC (internet low bit-rate codec) alleviates this problem by removing the interframe dependencies at the cost of a higher bit-rate. In this paper we propose a combination of iLBC with an ACELP (algebraic CELP) codec in which a variable number of ACELP-coded frames is inserted between every two iLBC-coded frames. The experimental results show that the combined codec can achieve a performance close to that of iLBC at different loss conditions but with a smaller bit-rate. Also, scalability is achieved by modifying the number of inserted ACELP-coded frames.

11 citations


Patent
Yang Gao1
13 Jun 2008
TL;DR: In this article, a method of speech encoding comprises generating a first synthesized speech signal from a first excitation signal, weighting the first synthesised speech signal using a first error weighting filter to generate a first weighted speech signal, and generating an error signal using the first weighted signal and the second signal.
Abstract: A method of speech encoding comprises generating a first synthesized speech signal from a first excitation signal, weighting the first synthesized speech signal using a first error weighting filter to generate a first weighted speech signal, generating a second synthesized speech signal from a second excitation signal, weighting the second synthesized speech signal using a second error weighting filter to generate a second weighted speech signal, and generating an error signal using the first weighted speech signal and the second weighted speech signal, wherein the first error weighting filter is different from the second error weighting filter. The method may further generate the error signal by weighting the speech signal using a third error weighting filter to generate a third weighted speech signal, and subtracting the first weighted speech signal and the second weighted speech signal from the third weighted speech signal to generate the error signal.

11 citations


Proceedings ArticleDOI
12 May 2008
TL;DR: A technique that significantly limits the error propagation by replacing inter-frame long-term prediction with a non-predictive glottal-shape codebook is presented.
Abstract: CELP-based codecs typically rely on prediction to achieve their high coding efficiency. On the other hand, the prediction makes these codecs sensitive to frame erasures as errors propagate beyond the erased frame. We present a technique that significantly limits the error propagation by replacing inter-frame long-term prediction with a non-predictive glottal-shape codebook. The technique was implemented in the winning candidate of the EV-VBR baseline codec selection by ITU-T in March 2007. To maintain the performance in clean channel, this transition mode coding technique was used only in frames following voiced onsets frames, i.e. the frames most sensitive to frame errors.

Journal Article
TL;DR: Familiarity, ease of access, trust, and awareness of benefits and risks are important.
Abstract: 線形予測分析に基づく時系列信号の可逆圧縮符号化 方式は,時系列信号を線形予測分析し,その結果求ま る線形予測係数あるいは PARCOR 係数と予測誤差 (残差)を符号化して伝送し,受信側で無歪に復号化 できる仕組みとなっている.線形予測分析により生成 される残差信号の振幅は通常 0付近に集中するという 性質を利用し,出現頻度の高い値ほど短い符号を割り 当てる Golomb-Rice符号 [1]をはじめとするエントロ ピー符号化を残差の符号化に用いることで全体の符号 量を小さく抑えようとする点がこの方式の特徴である. Golomb-Rice符号の場合,残差振幅に割り当てられる 符号量は,振幅の絶対値とほぼ比例関係にあるため, 残差符号量は残差振幅の絶対値和である程度よく近似 される.したがって,残差振幅の絶対値和が小さいほ ど符号量はより小さくできる可能性がある.しかしな がら,従来の線形予測分析による可逆圧縮符号化方式 では,残差振幅の二乗和を最小化するように予測係数 を求めるため,残差符号量を直接的に最小化する規準

Patent
22 Oct 2008
TL;DR: In this paper, a scalable speech and audio codec is provided that implements combinatorial spectrum encoding, where a residual signal is obtained from a Code Excited Linear Prediction (CELP)-based encoding layer.
Abstract: A scalable speech and audio codec is provided that implements combinatorial spectrum encoding. A residual signal is obtained from a Code Excited Linear Prediction (CELP)-based encoding layer, where the residual signal is a difference between an original audio signal and a reconstructed version of the original audio signal. The residual signal is transformed at a Discrete Cosine Transform (DCT)-type transform layer to obtain a corresponding transform spectrum having a plurality of spectral lines. The transform spectrum spectral lines are transformed using a combinatorial position coding technique. The combinatorial position coding technique includes generating a lexicographical index for a selected subset of spectral lines, where each lexicographic index represents one of a plurality of possible binary strings representing the positions of the selected subset of spectral lines. The lexicographical index represents non-zero spectral lines in a binary string in fewer bits than the length of the binary string.

Proceedings ArticleDOI
25 Aug 2008
TL;DR: This paper presents a new method to address pre-echo detection and reduction of transform coding at low bit rates, implemented as an adaptive limiter at the decoder side and does not need transmission of auxiliary data.
Abstract: Pre-echo is a well-known artefact of transform coding at low bit rates. In this paper we present a new method to address this problem. The input signal is assumed to be coded in two stages: in time domain first, and then in transform domain. This is for instance the case in CELP+transform embedded coding. The first stage reconstructs a signal that is usually free of pre-echo. Therefore transform coding can exploit this reconstructed signal as side information for pre-echo detection and reduction. The proposed method is implemented as an adaptive limiter at the decoder side and does not need transmission of auxiliary data. It is part of the recently standardized ITU-T G.729.1 coder, in which it is used in two separate subbands. Experimental test results show that this method has a significant impact on quality in G.729.1 with very small complexity.

Patent
29 Feb 2008
TL;DR: In this paper, an audio encoding device and an audio decoding device which reduce degradation of subjective quality of a decoding signal caused by power mismatch of decoding signal which is generated by a concealing process upon disappearance of a frame is described.
Abstract: Disclosed are an audio encoding device and an audio decoding device which reduce degradation of subjective quality of a decoding signal caused by power mismatch of a decoding signal which is generated by a concealing process upon disappearance of a frame. When a frame is lost, a past encoding parameter is used to obtain a concealed LPC of the current frame and a concealed sound source parameter. A normal CELP decoding is performed from the obtained concealed sound source parameter. Correction is performed by using a conceal parameter on the obtained concealed LPC and the concealed sound source signal. The power of the corrected concealed sound source signal is adjusted to match a reference sound source power. A filter gain of the synthesis filter is adjusted so as to adjust the power of a decoded sound signal to the power of a decoded sound signal during an error-free state. Moreover, a synthesis filter gain adjusting coefficient is calculated by using an estimated normalized residual power so that a filter gain of a synthesis filter formed by using a concealed LPC is a filter gain during an error-free state.

Journal ArticleDOI
Hiroyuki Ehara1, K. Yoshida1
TL;DR: Results demonstrate that synchronization of the internal states is effective in cases of erasure of onset, and the DT technique requires no additional algorithmic delay and would be a better choice for particular applications for which the delay has a significant impact.
Abstract: The authors present and evaluate a technique for synchronizing the internal states of a code-excited-linear-prediction (CELP) encoder and decoder after the occurrence of frame erasure. The designed technique, called ldquoduplicated transmission (DT),rdquo uses some redundant information for realizing synchronization. The encoder performs encoding processes twice and sends two codes for each frame. One code is encoded by an encoder that is initialized. The code is used in cases where the previous frame is erased. An onset detector is combined with the DT technique to select the frames to which the DT should be applied. Subjective test results suggest that, by introducing DT selectively, the number of DT frames is reducible by about 80% without degrading the subjective quality. Results demonstrate that synchronization of the internal states is effective in cases of erasure of onset. The DT technique requires no additional algorithmic delay. For that reason, it would a better choice for particular applications for which the delay has a significant impact.

Journal ArticleDOI
TL;DR: Investigations show that most low-rate (8kbits/s and below) speech coders show bias towards non-accented English, and quality bias toward the English language is shown.
Abstract: This paper investigates the performance of speech codec's that uses linear predictive coding (LPC), over different languages. Investigations show that most low-rate (8kbits/s and below) speech coders show bias towards non-accented English. When the coders are used for heavily accented English or other languages, significant performance degradation is noted. In order to judge the performance of the most popular speech codec’s (Speex and AMR), we encoded and decoded the speech samples from three different languages: English, Arabic and Lithuanian. The quality of transformed speech signals was estimated using two quality estimation techniques 3SQM and PESQ algorithms according to ITU recommendations P.563 and P.862. The results showed quality bias toward the English language – the scores were hgiher and the performance was more stable.

Proceedings ArticleDOI
05 May 2008
TL;DR: In this study the application of CELP in AMR is observed and MATLAB program simulation is used to observe and calculate errors occur in the system.
Abstract: In cellular communication technology, quality of voice output at destination depends on the channel condition. Bad channel condition will produce many errors in the voice output and hence the voice quality. To maintain the voice quality in various channel condition AMR is used. Various modes of bit rate is used in AMR, from low to high bit rate is used depend on the channel condition. Low bit rate modes is used in a bad channel condition to allow more bits for channel coding, while high bit rate on the contrary. Recently various speech (source) coding techniques, such as: CELP, ACELP, RPE-LTP, are used in different applications. In this study the application of CELP in AMR is observed. MATLAB program simulation is used to observe and calculate errors occur in the system. The difference of resulted error produced in AMR using CELP is not significant. From low bit rate (5.9 kbps) to high bit rate (12.2 kbps), the error difference is less than 1%.

Proceedings ArticleDOI
12 May 2008
TL;DR: An efficient method for estimating frame energy of speech from enhanced variable rate coder (EVRC) bitstream for network-based speech processing applications in transcoder free operation (TrFO) environments, where speech signals are represented as speech coding parameters.
Abstract: This paper proposes an efficient method for estimating frame energy of speech from enhanced variable rate coder (EVRC) bitstream for network-based speech processing applications in transcoder free operation (TrFO) environments, where speech signals are represented as speech coding parameters. A frame of speech energy is decomposed into the energy of excitation and vocal tract filter, and the frame energy estimation method is derived for each component. Among many parameters of EVRC bitstream, the fixed codebook gain and adaptive codebook gain are used for the estimation of excitation energy, and line spectrum pair (LSP) information is used to estimate the energy of vocal tract filter. Experimental results demonstrated the novelty of the proposed method. The correlation coefficient between the actual and estimated frame energy can be maintained at a value of 0.994 with just 5% multiplicative operations of full decoding.

Patent
03 Apr 2008
TL;DR: In this paper, a layered code-excited linear prediction (CELP) encoder, an adaptive multirate wideband (AMR-WB), and methods of CELP encoding and decoding are presented.
Abstract: A layered code-excited linear prediction (CELP) encoder, an Adaptive Multirate Wideband (AMR-WB) encoder and methods of CELP encoding and decoding. In one embodiment, the encoder includes: (1) a core layer subencoder and (2) at least one enhancement layer subencoder, at least one of the core layer subencoder and the enhancement layer subencoder having first and second adaptive codebooks and configured to retrieve a pitch lag estimate from the second adaptive codebook and perform a closed-loop search of the first adaptive codebook based on the pitch lag estimate.

Proceedings ArticleDOI
16 Mar 2008
TL;DR: The perceptual evaluation of speech quality and enhanced modified bark spectral distortion tests under various packet loss conditions confirm that the proposed algorithm is superior to the concealment algorithm embedded in the G729.
Abstract: In this paper, we propose a method for packet loss concealment (PLC) based on time scale modification for code excited linear prediction (CELP) based coders in packet network. We perform a time scale modification (TSM) using a waveform similarity overlap-add (WSOLA) technique which is an interpolation-based method operating entirely in the time domain, to reconstruct the excitation signal of the lost frame. We applied the proposed scheme to the standard ITU-T G729 standard speech coder to evaluate the proposed method. The perceptual evaluation of speech quality (PESQ) and enhanced modified bark spectral distortion (EMBSD) tests under various packet loss conditions confirm that the proposed algorithm is superior to the concealment algorithm embedded in the G729.

Proceedings ArticleDOI
06 Aug 2008
TL;DR: A new code excited linear predictive (CELP) vocoder based on Adaptive Multi Rate (AMR) 7.4 kbit/s mode that achieves a better compression rate in an environment of Speaker Dependent Coding System (SDSC) and is efficiently used for systems that stores the speech data of a particular speaker.
Abstract: A new code excited linear predictive (CELP) vocoder based on Adaptive Multi Rate (AMR) 7.4 kbit/s mode is proposed in this paper. The proposed vocoder achieves a better compression rate in an environment of Speaker Dependent Coding System (SDSC) and is efficiently used for systems, such as OGM (Outgoing message) and TTS (Text To Speech), that stores the speech data of a particular speaker. In order to enhance the compression rate of a coder, a new Line Spectral Pairs (LSP) codebook is employed by using Centroid Neural Network (CNN) algorithm. Moreover, applying the predicted pulses used in fixed code book searching enhances the quality of synthesis speech. In comparison with original (traditional) AMR 7.4 Kbit/s coder, the new coder shows a superior compression rate and an equivalent quality to AMR coder in term of informal subjective testing Mean Opinion Score(MOS).

Proceedings ArticleDOI
12 May 2008
TL;DR: This paper shows that the G.729.1 extension layers are quite generic for scalable codec design in the sense that they can be applied to EFR with limited adjustments, and proposes a minor modification of the bit allocation procedure in TDAC stage, exploiting spectral masking only for higher frequency bands.
Abstract: This paper describes a 12.2-32 kbps scalable wideband speech and audio coder interoperable with GSM enhanced full-rate (EFR). This coder, referred to as EFR-EV, is designed using the ITU-T G.729.1 multi-stage coding structure. Specifically, EFR-EV consists of three stages: a code-excited linear prediction (CELP) stage derived from EFR, time-domain bandwidth extension (TDBWE), and time-domain aliasing cancellation (TDAC). In this paper, we show that the G.729.1 extension layers (i.e. TDBWE and TDAC) are quite generic for scalable codec design in the sense that they can be applied to EFR with limited adjustments. In addition, we propose a minor modification of the bit allocation procedure in TDAC stage, exploiting spectral masking only for higher frequency bands. The performance of EFR- EV and G.729.1 are evaluated in terms of objective/subjective quality, algorithmic delay, and complexity.

Patent
03 Apr 2008
TL;DR: In this article, a layered code-excited linear prediction (CELP) encoder, an adaptive multirate wideband (AMR-WB), and methods of CELP encoding and decoding are presented.
Abstract: A layered code-excited linear prediction (CELP) encoder, an Adaptive Multirate Wideband (AMR-WB) encoder and methods of CELP encoding and decoding. In one embodiment, the encoder includes: (1) a core layer subencoder and (2) at least one enhancement layer subencoder having an adaptive-gain multiplier configured to apply a gain for an adaptive contribution to excitation and a fixed-gain multiplier configured to apply a gain for a fixed contribution to the excitation that is separate from the gain for the adaptive contribution.

Proceedings ArticleDOI
25 May 2008
TL;DR: The principle of the direct vector quantization (DVQ) algorithm which was applied to simulated decoder module and codebooks search module in LD-CELP speech coding algorithm showed that the DVQ algorithm decreased calculation quantity and improved the efficiency of codebook search.
Abstract: This paper described the principle of the direct vector quantization (DVQ) algorithm which was applied to simulated decoder module and codebook search module in LD-CELP speech coding algorithm. The synthesis filter in simulated decoder module was replaced by the inverse-perceptual weighting filter, removing the operation of impulse response h(n) in the codebook search module. The result showed that the DVQ algorithm decreased calculation quantity and improved the efficiency of codebook search. The multiplication operation amount of energy calculator and time-reversed convolution module could be reduced by 75%, and the addition operation amount could be reduced by 77.78% in an adaptation cycle of four vectors (20 samples), while SNR was equivalent to that of LD-CELP and speech quality had not almost change.

Proceedings Article
01 Oct 2008
TL;DR: An efficient probabilistic neural networks (PNN) model-based voice activity detection (VAD) algorithm that achieves better performance than G.729 Annex B at any noise level.
Abstract: In this paper we introduce an efficient probabilistic neural networks (PNN) model-based voice activity detection (VAD) algorithm. The inputs for PNN are code excited linear prediction coder parameters, which are stable under background noise. The PNN network output is 1 or 0 to determine the nature of the period (speech or NonSpeech). Experimental results show that the proposed VAD algorithm achieves better performance than G.729 Annex B at any noise level. The performance compares very favorably with Adaptive MultiRate VAD, phase 2 (AMR2).

Proceedings ArticleDOI
01 Nov 2008
TL;DR: This work proposes a new method of using one bit computation instead of the 16 bit computation in the codebook search part of the speech codec, and shows that effective codebook size and hence the computational time is reduced by 50%.
Abstract: This paper presents an algorithm for fast codebook search of code excited linear prediction (CELP) coders and its descendants. The problem of reducing the bit rate of speech while preserving the quality of speech reconstructed from such a representation has received continuous attention. Real time implementation of adaptive codebook search in code excited linear prediction (CELP) and CELP based speech coders is identified as the computationally most complex module. Thus in this work, we propose a new method of using one bit computation instead of the 16 bit computation in the codebook search part of the speech codec. The simulation results show that effective codebook size and hence the computational time is reduced by 50%.

Book ChapterDOI
20 Oct 2008
TL;DR: A novel rate-distortion (R-D) model is proposed, capturing the propagation of quantization errors in open-loop predictive coding systems, and shows that allocating rate based on the proposed R-D model provides gains compared to a straightforward rate allocation not accounting for drift.
Abstract: This paper investigates the application of open-loop coding principles in predictive coding systems. In order to cope with the drift, which is inherent in open-loop predictive coding, a novel rate-distortion (R-D) model is proposed, capturing the propagation of quantization errors in such systems. Additionally, a novel intra-frame video codec employing the transform and spatial prediction modes from H.264 is proposed. The results obtained with the proposed codec show that allocating rate based on the proposed R-D model provides gains of up to 1.9 dB compared to a straightforward rate allocation not accounting for drift. Furthermore, the proposed open-loop predictive codec provides gains of up to 2.3 dB compared to an equivalent closed-loop intra-frame video codec using the transform, prediction modes and rate-allocation from H.264. One concludes that the considered open-loop predictive coding paradigm retains the advantages of open-loop coding, and offers the possibility of further improving the compression performance in predictive coding systems.

01 Jan 2008
TL;DR: The proposed spectrum estimation method clearly outperforms the FFT, linear prediction and minimum variance distorti onless response (MVDR) methods in terms of noise robustness.
Abstract: Stabilized weighted linear prediction (SWLP) is a recently developed method to compute stable all-pole models of speech by applying temporal weighting of the residual energy. In th is study, SWLP is used for spectrum estimation in the first stage of the MFCC computation. The resulting acoustic feature rep resentation is tested in a speech recognition front-end in sim ulated noisy conditions. When compared to other spectrum estimation methods as a part of the MFCC framework, the proposed spectrum estimation method clearly outperforms the FFT (pe riodogram), linear prediction and minimum variance distorti onless response (MVDR) methods in terms of noise robustness.

Patent
17 Apr 2008
TL;DR: In this article, a speech coding method according to a code-excited linear prediction (CELP) speech coding, a noise level of a speech in a concerning coding period is evaluated by using a code or coding results of at least one of spectrum information, power information, and pitch information.
Abstract: PROBLEM TO BE SOLVED: To reproduce a high quality speech with a small data amount in speech coding and decoding for performing compression coding and decoding of a speech signal to a digital signal. SOLUTION: In a speech coding method according to a code-excited linear prediction (CELP) speech coding, a noise level of a speech in a concerning coding period is evaluated by using a code or coding results of at least one of spectrum information, power information, and pitch information, and various excitation codebooks 19 and 20 are used based on evaluation results. COPYRIGHT: (C)2008,JPO&INPIT