Showing papers on "Code-excited linear prediction published in 2008"

PDF

Open Access

Patent•

Technique for encoding/decoding of codebook indices for quantized mdct spectrum in scalable speech and audio codecs

[...]

Yuriy Reznik¹, Pengjun Huang, Naveen B. Srinivasamurthy, Ravi Kiran Chivukula•Institutions (1)

03 Nov 2008

TL;DR: In this article, a residual signal from a Code Excited Linear Prediction (CELP)-based encoding layer is obtained, where the residual signal is a difference between an original audio signal and a reconstructed version of the original signal, which is then transformed at a Discrete Cosine Transform (DCT) type transform layer to obtain a corresponding transform spectrum.

...read moreread less

Abstract: Codebook indices for a scalable speech and audio codec may be efficiently encoded based on anticipated probability distributions for such codebook indices. A residual signal from a Code Excited Linear Prediction (CELP)-based encoding layer may be obtained, where the residual signal is a difference between an original audio signal and a reconstructed version of the original audio signal. The residual signal may be transformed at a Discrete Cosine Transform (DCT)-type transform layer to obtain a corresponding transform spectrum. The transform spectrum is divided into a plurality of spectral bands, where each spectral band having a plurality of spectral lines. A plurality of different codebooks are then selected for encoding the spectral bands, where each codebook is associated with a codebook index. A plurality of codebook indices associated with the selected codebooks are then encoded together to obtain a descriptor code that more compactly represents the codebook indices.

...read moreread less

77 citations

Patent•

Low-complexity encoding/decoding of quantized MDCT spectrum in scalable speech and audio codecs

[...]

Yuriy Reznik¹, Pengjun Huang¹•Institutions (1)

Qualcomm¹

21 Oct 2008

TL;DR: In this paper, a scalable speech and audio codec is provided that implements combinatorial spectrum encoding, where a residual signal is obtained from a Code Excited Linear Prediction (CELP)-based encoding layer.

...read moreread less

Abstract: A scalable speech and audio codec is provided that implements combinatorial spectrum encoding. A residual signal is obtained from a Code Excited Linear Prediction (CELP)-based encoding layer, where the residual signal is a difference between an original audio signal and a reconstructed version of the original audio signal. The residual signal is transformed at a Discrete Cosine Transform (DCT)-type transform layer to obtain a corresponding transform spectrum having a plurality of spectral lines. The transform spectrum spectral lines are transformed using a combinatorial position coding technique. The combinatorial position coding technique includes generating a lexicographical index for a selected subset of spectral lines, where each lexicographic index represents one of a plurality of possible binary strings representing the positions of the selected subset of spectral lines. The lexicographical index represents non-zero spectral lines in a binary string in fewer bits than the length of the binary string.

...read moreread less

55 citations

Proceedings Article•DOI•

ITU-T EV-VBR: A robust 8-32 kbit/s scalable coder for error prone telecommunications channels

[...]

Tommy Vaillancourt¹, Milan Jelinek¹, A. Erdem Ertan², Jacek Stachurski², Anssi Rämö³, Lasse Juhani Laaksonen³, Jon Gibbs⁴, Udar Mittal⁴, Stefan Bruhn⁵, Volodya Grancharov⁵, Masahiro Oshikiri⁶, Hiroyuki Ehara⁶, Dejun Zhang⁷, Fuwei Ma⁷, David Virette⁸, Stéphane Ragot⁸ - Show less +12 more•Institutions (8)

Université de Sherbrooke¹, Texas Instruments², Nokia³, Motorola⁴, Ericsson⁵, Panasonic⁶, Huawei⁷, Orange S.A.⁸

25 Aug 2008

TL;DR: ITU-T Embedded Variable Bit-Rate (EV-VBR) codec is presented, being standardized by Question 9 of Study Group 16 (Q9/16) as recommendation G.718, robust to significant rates of frame erasures or packet losses and several technologies are used to encode the MDCT coefficients for best performance both for speech and music.

...read moreread less

Abstract: This paper presents ITU-T Embedded Variable Bit-Rate (EV-VBR) codec being standardized by Question 9 of Study Group 16 (Q9/16) as recommendation G.718. The codec provides a scalable solution for compression of 16 kHz sampled speech and audio signals at rates between 8 kbit/s and 32 kbit/s, robust to significant rates of frame erasures or packet losses. It comprises 5 layers where higher layer bitstreams can be discarded without affecting the lower layer decoding. The core layer takes advantage of signal-classification based CELP encoding. The second layer reduces the coding error from the first layer by means of additional pitch contribution and another algebraic codebook. The higher layers encode the weighted error signal from lower layers using MDCT transform coding. Sev-eral technologies are used to encode the MDCT coefficients for best performance both for speech and music. The codec performance is demonstrated with selected results from ITU-T Characterization test.

...read moreread less

42 citations

Proceedings Article•DOI•

Analysis-by-synthesis method for whisper-speech reconstruction

[...]

Farzaneh Ahmadi¹, Ian McLoughlin¹, Hamid Sharifzadeh¹•Institutions (1)

Nanyang Technological University¹

01 Nov 2008

TL;DR: This approach uses a template of a speakerpsilas normal phonated speech for extraction of excitation parameters such as pitch and gain, and then injects these estimated excitations into whispered signal to synthesize normal-sounding speech through the CELP codec.

...read moreread less

Abstract: In the following paper, a method for the real-time conversion of whispers to normal phonated speech through a code excited linear prediction analysis-by-synthesis codec is discussed. This approach uses a template of a speakerpsilas normal phonated speech for extraction of excitation parameters such as pitch and gain, and then injects these estimated excitations into whispered signal to synthesize normal-sounding speech through the CELP codec. Furthermore, since restoring pitch to whispered speech requires some considerations of quality and accuracy, spectral enhancements are required in terms of formant shifting (LSPs modification) and pitch injection based on voiced/unvoiced decision. Spectral shifting is accomplished through line-spectral pair adjustment. Implementing such methods by using the popular CELP codec allows integration of the technique with any modern speech applications and devices. Subjective testing results are presented to determine the effectiveness of the technique.

...read moreread less

31 citations

Proceedings Article•DOI•

ITU-T G.EV-VBR baseline codec

[...]

Milan Jelinek, Tommy Vaillancourt, A.E. Ertan¹, Jacek Stachurski¹, Anssi Rämö², Lasse Juhani Laaksonen², Jon Gibbs³, Stefan Bruhn⁴ - Show less +4 more•Institutions (4)

Texas Instruments¹, Nokia², Motorola³, Ericsson⁴

12 May 2008

TL;DR: The Q9/16 codec is an embedded codec comprising 5 layers where higher layer bitstreams can be discarded without affecting the decoding of the lower layers, and has been designed with the primary objective of a high-performance wideband speech coding for error- prone telecommunications channels, without compromising the quality for narrowband/wideband speech or wideband music signals.

...read moreread less

Abstract: We present the Q.EV-VBR winning candidate codec recently selected by Question 9 of Study Group 16 (Q9/16) of ITU-T as a baseline for the development of a scalable solution for wideband speech and audio compression at rates between 8 kb/s and 32 kb/s. The Q9/16 codec is an embedded codec comprising 5 layers where higher layer bitstreams can be discarded without affecting the decoding of the lower layers. The two lower layers are based on the CELP technology where the core layer takes advantage of signal classification based encoding. The higher layers encode the weighted error signal from lower layers using overlap-add transform coding. The codec has been designed with the primary objective of a high-performance wideband speech coding for error- prone telecommunications channels, without compromising the quality for narrowband/wideband speech or wideband music signals. The codec performance is demonstrated with selected test results.

...read moreread less

25 citations

Patent•

Arithmetic encoding for celp speech encoders

[...]

Tenkasi V. Ramabadran¹•Institutions (1)

Motorola¹

08 Oct 2008

TL;DR: In this paper, a communication system (100) includes devices (102, 104, 200) for transmitting and receiving digital audio, which use audio encoders (210, 804) and decoders(222, 916) such as ACELP or DCT/IDCT to compress and decompress audio.

...read moreread less

Abstract: A communication system (100) includes devices (102, 104, 200) for transmitting and receiving digital audio. The devices use audio encoders (210, 804) and decoders (222, 916) such as ACELP or DCT/IDCT to compress and decompress audio and use arithmetic encoders (212) and decoders (220) to encode and decode the compressed audio on-the-fly (without a codebook of pre-stored codes).

...read moreread less

22 citations

A Packet Loss Concealment Algorithm Robust to Burst Packet Loss for CELP-type Speech Coders

[...]

Choong Sang Cho, Nam In Park, Hong Kook Kim

01 Jul 2008

TL;DR: A packet loss concealment (PLC) algorithm for CELP-type speech coders is proposed which improves the quality of decoded speech under burst packet loss conditions and provides significanlty better speech quality than the PLC of G.729, especially under burst packets losses.

...read moreread less

Abstract: In this paper, a packet loss concealment (PLC) algorithm for CELP-type speech coders is proposed which improves the quality of decoded speech under burst packet loss conditions. The proposed PLC algorithm is based on the reconstruction of excitation by combining voiced excitation and random excitation, where the voice excitation is obtained from the adaptive codebook excitation scaled by a voicing probability and the random excitation is generated by permutating the previous decoded excitation. The voicing probability is estimated from the correlation using the decoded excitation and pitch of the previous frames. In addition, a linear regression-based gain amplitude is estimated and applied to the reconstructed excitation for the compensatation of the undesirable amplitude change under a burst packet loss condition. The proposed algorithm is implemented as a PLC algorithm for G.729 and its performance is compared with PLC employed in G.729 by means of perceptual evaluation of speech quality (PESQ), a waveform comparison, and an A-B preference test under random and burst packet loss rates of 3% and 5%. It is shown that the proposed algorithm provides significanlty better speech quality than the PLC of G.729, especially under burst packet losses.

...read moreread less

14 citations

Book Chapter•DOI•

Embedded Speech Coding: From G.711 to G.729.1

[...]

Bernd Geiser¹, Stéephane Ragot, Hervée Taddei²•Institutions (2)

Ruhr University Bochum¹, Nokia Networks²

10 Jan 2008

12 citations

Proceedings Article•DOI•

A scalable coding scheme based on interframe dependency limitation

[...]

Jose L. Carmona¹, José L. Pérez-Córdoba¹, Antonio M. Peinado¹, Angel M. Gomez¹, José A. González¹ - Show less +1 more•Institutions (1)

University of Granada¹

01 Mar 2008

TL;DR: The experimental results show that the combined codec can achieve a performance close to that of iLBC at different loss conditions but with a smaller bit-rate, and scalability is achieved by modifying the number of inserted ACELP-coded frames.

...read moreread less

Abstract: While VoIP (voice over IP) is gaining importance in comparison with other types of telephony, packet loss remains as the main source of degradation in VoIP systems. Traditional speech codecs, such as those based on the CELP (code excited linear prediction) paradigm, can achieve low bit-rates at the cost of introducing interframe dependencies. As a result, the effect of a packet loss burst is propagated to the frames correctly received after the burst. iLBC (internet low bit-rate codec) alleviates this problem by removing the interframe dependencies at the cost of a higher bit-rate. In this paper we propose a combination of iLBC with an ACELP (algebraic CELP) codec in which a variable number of ACELP-coded frames is inserted between every two iLBC-coded frames. The experimental results show that the combined codec can achieve a performance close to that of iLBC at different loss conditions but with a smaller bit-rate. Also, scalability is achieved by modifying the number of inserted ACELP-coded frames.

...read moreread less

11 citations

Patent•

Method and apparatus for improved weighting filters in a CELP encoder

[...]

Yang Gao¹•Institutions (1)

Mindspeed Technologies¹

13 Jun 2008

TL;DR: In this article, a method of speech encoding comprises generating a first synthesized speech signal from a first excitation signal, weighting the first synthesised speech signal using a first error weighting filter to generate a first weighted speech signal, and generating an error signal using the first weighted signal and the second signal.

...read moreread less

Abstract: A method of speech encoding comprises generating a first synthesized speech signal from a first excitation signal, weighting the first synthesized speech signal using a first error weighting filter to generate a first weighted speech signal, generating a second synthesized speech signal from a second excitation signal, weighting the second synthesized speech signal using a second error weighting filter to generate a second weighted speech signal, and generating an error signal using the first weighted speech signal and the second weighted speech signal, wherein the first error weighting filter is different from the second error weighting filter. The method may further generate the error signal by weighting the speech signal using a third error weighting filter to generate a third weighted speech signal, and subtracting the first weighted speech signal and the second weighted speech signal from the third weighted speech signal to generate the error signal.

...read moreread less

11 citations

Proceedings Article•DOI•

Transition mode coding for source controlled celp codecs

[...]

Vaclav Eksler¹, Milan Jelinek¹•Institutions (1)

Université de Sherbrooke¹

12 May 2008

TL;DR: A technique that significantly limits the error propagation by replacing inter-frame long-term prediction with a non-predictive glottal-shape codebook is presented.

...read moreread less

Abstract: CELP-based codecs typically rely on prediction to achieve their high coding efficiency. On the other hand, the prediction makes these codecs sensitive to frame erasures as errors propagate beyond the erased frame. We present a technique that significantly limits the error propagation by replacing inter-frame long-term prediction with a non-predictive glottal-shape codebook. The technique was implemented in the winning candidate of the EV-VBR baseline codec selection by ITU-T in March 2007. To maintain the performance in clean channel, this transition mode coding technique was used only in frames following voiced onsets frames, i.e. the frames most sensitive to frame errors.

...read moreread less

Journal Article•

A Linear Predictive Coding Algorithm Minimizing the Golomb-Rice Code Length of the Residual Signal

[...]

Hirokazu Kameoka, Yutaka Kamamoto, Noboru Harada, Takehiro Moriya

01 Nov 2008-The Transactions of the Institute of Electronics,Information and Communication Engineers. A

TL;DR: Familiarity, ease of access, trust, and awareness of benefits and risks are important.

...read moreread less

Abstract: 線形予測分析に基づく時系列信号の可逆圧縮符号化方式は,時系列信号を線形予測分析し,その結果求まる線形予測係数あるいは PARCOR 係数と予測誤差 (残差)を符号化して伝送し,受信側で無歪に復号化できる仕組みとなっている.線形予測分析により生成される残差信号の振幅は通常 0付近に集中するという性質を利用し,出現頻度の高い値ほど短い符号を割り当てる Golomb-Rice符号 [1]をはじめとするエントロピー符号化を残差の符号化に用いることで全体の符号量を小さく抑えようとする点がこの方式の特徴である. Golomb-Rice符号の場合,残差振幅に割り当てられる符号量は,振幅の絶対値とほぼ比例関係にあるため, 残差符号量は残差振幅の絶対値和である程度よく近似される.したがって,残差振幅の絶対値和が小さいほど符号量はより小さくできる可能性がある.しかしながら,従来の線形予測分析による可逆圧縮符号化方式では,残差振幅の二乗和を最小化するように予測係数を求めるため,残差符号量を直接的に最小化する規準

...read moreread less

Patent•

Scalable speech and audio encoding using combinatorial encoding of mdct spectrum

[...]

Yuriy Reznik¹, Naveen B. Srinivasamurthy¹, Ravi Kiran Chivukula, Pengjun Huang¹•Institutions (1)

Qualcomm¹

22 Oct 2008

...read moreread less

Proceedings Article•DOI•

Pre-echo reduction in the ITU-T G.729.1 embedded coder

[...]

Balazs Kovesi¹, Stéphane Ragot¹, Martin Gartner², Herve Taddei²•Institutions (2)

Orange S.A.¹, Siemens²

25 Aug 2008

TL;DR: This paper presents a new method to address pre-echo detection and reduction of transform coding at low bit rates, implemented as an adaptive limiter at the decoder side and does not need transmission of auxiliary data.

...read moreread less

Abstract: Pre-echo is a well-known artefact of transform coding at low bit rates. In this paper we present a new method to address this problem. The input signal is assumed to be coded in two stages: in time domain first, and then in transform domain. This is for instance the case in CELP+transform embedded coding. The first stage reconstructs a signal that is usually free of pre-echo. Therefore transform coding can exploit this reconstructed signal as side information for pre-echo detection and reduction. The proposed method is implemented as an adaptive limiter at the decoder side and does not need transmission of auxiliary data. It is part of the recently standardized ITU-T G.729.1 coder, in which it is used in two separate subbands. Experimental test results show that this method has a significant impact on quality in G.729.1 with very small complexity.

...read moreread less

Patent•

Audio encoding device and audio decoding device

[...]

Takuya Kawashima¹, Hiroyuki Ehara¹, Koji Yoshida¹•Institutions (1)

Panasonic¹

29 Feb 2008

TL;DR: In this paper, an audio encoding device and an audio decoding device which reduce degradation of subjective quality of a decoding signal caused by power mismatch of decoding signal which is generated by a concealing process upon disappearance of a frame is described.

...read moreread less

Abstract: Disclosed are an audio encoding device and an audio decoding device which reduce degradation of subjective quality of a decoding signal caused by power mismatch of a decoding signal which is generated by a concealing process upon disappearance of a frame. When a frame is lost, a past encoding parameter is used to obtain a concealed LPC of the current frame and a concealed sound source parameter. A normal CELP decoding is performed from the obtained concealed sound source parameter. Correction is performed by using a conceal parameter on the obtained concealed LPC and the concealed sound source signal. The power of the corrected concealed sound source signal is adjusted to match a reference sound source power. A filter gain of the synthesis filter is adjusted so as to adjust the power of a decoded sound signal to the power of a decoded sound signal during an error-free state. Moreover, a synthesis filter gain adjusting coefficient is calculated by using an estimated normalized residual power so that a filter gain of a synthesis filter formed by using a concealed LPC is a filter gain during an error-free state.

...read moreread less

Journal Article•DOI•

Decoder Initializing Technique for Improving Frame-Erasure Resilience of a CELP Speech Codec

[...]

Hiroyuki Ehara¹, K. Yoshida¹•Institutions (1)

Panasonic¹

01 Apr 2008-IEEE Transactions on Multimedia

TL;DR: Results demonstrate that synchronization of the internal states is effective in cases of erasure of onset, and the DT technique requires no additional algorithmic delay and would be a better choice for particular applications for which the delay has a significant impact.

...read moreread less

Abstract: The authors present and evaluate a technique for synchronizing the internal states of a code-excited-linear-prediction (CELP) encoder and decoder after the occurrence of frame erasure. The designed technique, called ldquoduplicated transmission (DT),rdquo uses some redundant information for realizing synchronization. The encoder performs encoding processes twice and sends two codes for each frame. One code is encoded by an encoder that is initialized. The code is used in cases where the previous frame is erased. An onset detector is combined with the DT technique to select the frames to which the DT should be applied. Subjective test results suggest that, by introducing DT selectively, the number of DT frames is reducible by about 80% without degrading the subjective quality. Results demonstrate that synchronization of the internal states is effective in cases of erasure of onset. The DT technique requires no additional algorithmic delay. For that reason, it would a better choice for particular applications for which the delay has a significant impact.

...read moreread less

Journal Article•DOI•

Influence of languages on celp codecs performance

[...]

Mohamad Itani¹, Šarūnas Paulikas¹•Institutions (1)

Kaunas University of Technology¹

25 Jun 2008-Information Technology and Control

TL;DR: Investigations show that most low-rate (8kbits/s and below) speech coders show bias towards non-accented English, and quality bias toward the English language is shown.

...read moreread less

Abstract: This paper investigates the performance of speech codec's that uses linear predictive coding (LPC), over different languages. Investigations show that most low-rate (8kbits/s and below) speech coders show bias towards non-accented English. When the coders are used for heavily accented English or other languages, significant performance degradation is noted. In order to judge the performance of the most popular speech codec’s (Speex and AMR), we encoded and decoded the speech samples from three different languages: English, Arabic and Lithuanian. The quality of transformed speech signals was estimated using two quality estimation techniques 3SQM and PESQ algorithms according to ITU recommendations P.563 and P.862. The results showed quality bias toward the English language – the scores were hgiher and the performance was more stable.

...read moreread less

Proceedings Article•DOI•

Speech compression using CELP speech coding technique in GSM AMR

[...]

E. Pryadi¹, Kuniwati Gandi¹, Herman Kanalebe¹•Institutions (1)

University of Pelita Harapan¹

05 May 2008

TL;DR: In this study the application of CELP in AMR is observed and MATLAB program simulation is used to observe and calculate errors occur in the system.

...read moreread less

Abstract: In cellular communication technology, quality of voice output at destination depends on the channel condition. Bad channel condition will produce many errors in the voice output and hence the voice quality. To maintain the voice quality in various channel condition AMR is used. Various modes of bit rate is used in AMR, from low to high bit rate is used depend on the channel condition. Low bit rate modes is used in a bad channel condition to allow more bits for channel coding, while high bit rate on the contrary. Recently various speech (source) coding techniques, such as: CELP, ACELP, RPE-LTP, are used in different applications. In this study the application of CELP in AMR is observed. MATLAB program simulation is used to observe and calculate errors occur in the system. The difference of resulted error produced in AMR using CELP is not significant. From low bit rate (5.9 kbps) to high bit rate (12.2 kbps), the error difference is less than 1%.

...read moreread less

Proceedings Article•DOI•

Frame energy estimation based on speech codec parameters

[...]

Doh-Suk Kim¹, Binshi Cao¹, Ahmed A. Tarraf¹•Institutions (1)

Alcatel-Lucent¹

12 May 2008

TL;DR: An efficient method for estimating frame energy of speech from enhanced variable rate coder (EVRC) bitstream for network-based speech processing applications in transcoder free operation (TrFO) environments, where speech signals are represented as speech coding parameters.

...read moreread less

Abstract: This paper proposes an efficient method for estimating frame energy of speech from enhanced variable rate coder (EVRC) bitstream for network-based speech processing applications in transcoder free operation (TrFO) environments, where speech signals are represented as speech coding parameters. A frame of speech energy is decomposed into the energy of excitation and vocal tract filter, and the frame energy estimation method is derived for each component. Among many parameters of EVRC bitstream, the fixed codebook gain and adaptive codebook gain are used for the estimation of excitation energy, and line spectrum pair (LSP) information is used to estimate the energy of vocal tract filter. Experimental results demonstrated the novelty of the proposed method. The correlation coefficient between the actual and estimated frame energy can be maintained at a value of 0.994 with just 5% multiplicative operations of full decoding.

...read moreread less

Patent•

Layered Code-Excited Linear Prediction Speech Encoder and Decoder in Which Closed-Loop Pitch Estimation is Performed with Linear Prediction Excitation Corresponding to Optimal Gains and Methods of Layered CELP Encoding and Decoding

[...]

Jacek Stachurski¹•Institutions (1)

Texas Instruments¹

03 Apr 2008

TL;DR: In this paper, a layered code-excited linear prediction (CELP) encoder, an adaptive multirate wideband (AMR-WB), and methods of CELP encoding and decoding are presented.

...read moreread less

Abstract: A layered code-excited linear prediction (CELP) encoder, an Adaptive Multirate Wideband (AMR-WB) encoder and methods of CELP encoding and decoding. In one embodiment, the encoder includes: (1) a core layer subencoder and (2) at least one enhancement layer subencoder, at least one of the core layer subencoder and the enhancement layer subencoder having first and second adaptive codebooks and configured to retrieve a pitch lag estimate from the second adaptive codebook and perform a closed-loop search of the first adaptive codebook based on the pitch lag estimate.

...read moreread less

Proceedings Article•DOI•

Packet Loss Concealment Using Time Scale Modification for CELP Based Coders in Packet Network

[...]

F. Merazka¹•Institutions (1)

École Normale Supérieure¹

16 Mar 2008

TL;DR: The perceptual evaluation of speech quality and enhanced modified bark spectral distortion tests under various packet loss conditions confirm that the proposed algorithm is superior to the concealment algorithm embedded in the G729.

...read moreread less

Abstract: In this paper, we propose a method for packet loss concealment (PLC) based on time scale modification for code excited linear prediction (CELP) based coders in packet network. We perform a time scale modification (TSM) using a waveform similarity overlap-add (WSOLA) technique which is an interpolation-based method operating entirely in the time domain, to reconstruct the excitation signal of the lost frame. We applied the proposed scheme to the standard ITU-T G729 standard speech coder to evaluate the proposed method. The perceptual evaluation of speech quality (PESQ) and enhanced modified bark spectral distortion (EMBSD) tests under various packet loss conditions confirm that the proposed algorithm is superior to the concealment algorithm embedded in the G729.

...read moreread less

Proceedings Article•DOI•

A New Vocoder based on AMR 7.4kbit/s Mode in Speaker Dependent Coding System

[...]

Vu Thi Lan Huong, Byung-Jae Min, Dong-Chul Park, Dong-Min Woo

06 Aug 2008

TL;DR: A new code excited linear predictive (CELP) vocoder based on Adaptive Multi Rate (AMR) 7.4 kbit/s mode that achieves a better compression rate in an environment of Speaker Dependent Coding System (SDSC) and is efficiently used for systems that stores the speech data of a particular speaker.

...read moreread less

Abstract: A new code excited linear predictive (CELP) vocoder based on Adaptive Multi Rate (AMR) 7.4 kbit/s mode is proposed in this paper. The proposed vocoder achieves a better compression rate in an environment of Speaker Dependent Coding System (SDSC) and is efficiently used for systems, such as OGM (Outgoing message) and TTS (Text To Speech), that stores the speech data of a particular speaker. In order to enhance the compression rate of a coder, a new Line Spectral Pairs (LSP) codebook is employed by using Centroid Neural Network (CNN) algorithm. Moreover, applying the predicted pulses used in fixed code book searching enhances the quality of synthesis speech. In comparison with original (traditional) AMR 7.4 Kbit/s coder, the new coder shows a superior compression rate and an equivalent quality to AMR coder in term of informal subjective testing Mean Opinion Score(MOS).

...read moreread less

Proceedings Article•DOI•

An embedded variable bit-rate coder based on GSM EFR: EFR-EV

[...]

Sung-Kyo Jung¹, Stéphane Ragot¹, Claude Lamblin¹, S. Proust¹•Institutions (1)

Orange S.A.¹

12 May 2008

TL;DR: This paper shows that the G.729.1 extension layers are quite generic for scalable codec design in the sense that they can be applied to EFR with limited adjustments, and proposes a minor modification of the bit allocation procedure in TDAC stage, exploiting spectral masking only for higher frequency bands.

...read moreread less

Abstract: This paper describes a 12.2-32 kbps scalable wideband speech and audio coder interoperable with GSM enhanced full-rate (EFR). This coder, referred to as EFR-EV, is designed using the ITU-T G.729.1 multi-stage coding structure. Specifically, EFR-EV consists of three stages: a code-excited linear prediction (CELP) stage derived from EFR, time-domain bandwidth extension (TDBWE), and time-domain aliasing cancellation (TDAC). In this paper, we show that the G.729.1 extension layers (i.e. TDBWE and TDAC) are quite generic for scalable codec design in the sense that they can be applied to EFR with limited adjustments. In addition, we propose a minor modification of the bit allocation procedure in TDAC stage, exploiting spectral masking only for higher frequency bands. The performance of EFR- EV and G.729.1 are evaluated in terms of objective/subjective quality, algorithmic delay, and complexity.

...read moreread less

Patent•

Layered Code-Excited Linear Prediction Speech Encoder and Decoder Having Plural Codebook Contributions in Enhancement Layers Thereof and Methods of Layered CELP Encoding and Decoding

[...]

Jacek Stachurski¹•Institutions (1)

Texas Instruments¹

03 Apr 2008

TL;DR: In this article, a layered code-excited linear prediction (CELP) encoder, an adaptive multirate wideband (AMR-WB), and methods of CELP encoding and decoding are presented.

...read moreread less

Abstract: A layered code-excited linear prediction (CELP) encoder, an Adaptive Multirate Wideband (AMR-WB) encoder and methods of CELP encoding and decoding. In one embodiment, the encoder includes: (1) a core layer subencoder and (2) at least one enhancement layer subencoder having an adaptive-gain multiplier configured to apply a gain for an adaptive contribution to excitation and a fixed-gain multiplier configured to apply a gain for a fixed contribution to the excitation that is separate from the gain for the adaptive contribution.

...read moreread less

Proceedings Article•DOI•

Reducing the complexity of LD-CELP speech coding algorithm using direct vector quantization

[...]

Zhang Xueying¹, Zhao Qun-qun¹, Ma Zhao-yang¹•Institutions (1)

Taiyuan University of Technology¹

25 May 2008

TL;DR: The principle of the direct vector quantization (DVQ) algorithm which was applied to simulated decoder module and codebooks search module in LD-CELP speech coding algorithm showed that the DVQ algorithm decreased calculation quantity and improved the efficiency of codebook search.

...read moreread less

Abstract: This paper described the principle of the direct vector quantization (DVQ) algorithm which was applied to simulated decoder module and codebook search module in LD-CELP speech coding algorithm. The synthesis filter in simulated decoder module was replaced by the inverse-perceptual weighting filter, removing the operation of impulse response h(n) in the codebook search module. The result showed that the DVQ algorithm decreased calculation quantity and improved the efficiency of codebook search. The multiplication operation amount of energy calculator and time-reversed convolution module could be reduced by 75%, and the addition operation amount could be reduced by 77.78% in an adaptation cycle of four vectors (20 samples), while SNR was equivalent to that of LD-CELP and speech quality had not almost change.

...read moreread less

Proceedings Article•

A model-based voice Activity Detection algorithm using probabilistic neural networks

[...]

M. Farsinejad¹, M-Mehdi Mohammadi¹, Babak Nasersharif¹, Ahmad Akbari¹•Institutions (1)

Iran University of Science and Technology¹

01 Oct 2008

TL;DR: An efficient probabilistic neural networks (PNN) model-based voice activity detection (VAD) algorithm that achieves better performance than G.729 Annex B at any noise level.

...read moreread less

Abstract: In this paper we introduce an efficient probabilistic neural networks (PNN) model-based voice activity detection (VAD) algorithm. The inputs for PNN are code excited linear prediction coder parameters, which are stable under background noise. The PNN network output is 1 or 0 to determine the nature of the period (speech or NonSpeech). Experimental results show that the proposed VAD algorithm achieves better performance than G.729 Annex B at any noise level. The performance compares very favorably with Adaptive MultiRate VAD, phase 2 (AMR2).

...read moreread less

Proceedings Article•DOI•

A fast adaptive codebook search method for speech coding

[...]

C.G. Kiran, K. Rajeev

01 Nov 2008

TL;DR: This work proposes a new method of using one bit computation instead of the 16 bit computation in the codebook search part of the speech codec, and shows that effective codebook size and hence the computational time is reduced by 50%.

...read moreread less

Abstract: This paper presents an algorithm for fast codebook search of code excited linear prediction (CELP) coders and its descendants. The problem of reducing the bit rate of speech while preserving the quality of speech reconstructed from such a representation has received continuous attention. Real time implementation of adaptive codebook search in code excited linear prediction (CELP) and CELP based speech coders is identified as the computationally most complex module. Thus in this work, we propose a new method of using one bit computation instead of the 16 bit computation in the codebook search part of the speech codec. The simulation results show that effective codebook size and hence the computational time is reduced by 50%.

...read moreread less

Book Chapter•DOI•

Applying Open-Loop Coding in Predictive Coding Systems

[...]

Adrian Munteanu¹, Frederik Verbist¹, Jan Cornelis¹, Peter Schelkens¹•Institutions (1)

Vrije Universiteit Brussel¹

20 Oct 2008

TL;DR: A novel rate-distortion (R-D) model is proposed, capturing the propagation of quantization errors in open-loop predictive coding systems, and shows that allocating rate based on the proposed R-D model provides gains compared to a straightforward rate allocation not accounting for drift.

...read moreread less

Abstract: This paper investigates the application of open-loop coding principles in predictive coding systems. In order to cope with the drift, which is inherent in open-loop predictive coding, a novel rate-distortion (R-D) model is proposed, capturing the propagation of quantization errors in such systems. Additionally, a novel intra-frame video codec employing the transform and spatial prediction modes from H.264 is proposed. The results obtained with the proposed codec show that allocating rate based on the proposed R-D model provides gains of up to 1.9 dB compared to a straightforward rate allocation not accounting for drift. Furthermore, the proposed open-loop predictive codec provides gains of up to 2.3 dB compared to an equivalent closed-loop intra-frame video codec using the transform, prediction modes and rate-allocation from H.264. One concludes that the considered open-loop predictive coding paradigm retains the advantages of open-loop coding, and offers the possibility of further improving the compression performance in predictive coding systems.

...read moreread less

Enhancing noise robustness in automatic speech recognition using stabilized weighted linear prediction (SWLP)

[...]

Jouni Pohjalainen, Carlo Magi, Paavo Alku

01 Jan 2008

TL;DR: The proposed spectrum estimation method clearly outperforms the FFT, linear prediction and minimum variance distorti onless response (MVDR) methods in terms of noise robustness.

...read moreread less

Abstract: Stabilized weighted linear prediction (SWLP) is a recently developed method to compute stable all-pole models of speech by applying temporal weighting of the residual energy. In th is study, SWLP is used for spectrum estimation in the first stage of the MFCC computation. The resulting acoustic feature rep resentation is tested in a speech recognition front-end in sim ulated noisy conditions. When compared to other spectrum estimation methods as a part of the MFCC framework, the proposed spectrum estimation method clearly outperforms the FFT (pe riodogram), linear prediction and minimum variance distorti onless response (MVDR) methods in terms of noise robustness.

...read moreread less

Patent•

Speech coding method

[...]

Yamaura Tadashi

17 Apr 2008

TL;DR: In this article, a speech coding method according to a code-excited linear prediction (CELP) speech coding, a noise level of a speech in a concerning coding period is evaluated by using a code or coding results of at least one of spectrum information, power information, and pitch information.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To reproduce a high quality speech with a small data amount in speech coding and decoding for performing compression coding and decoding of a speech signal to a digital signal. SOLUTION: In a speech coding method according to a code-excited linear prediction (CELP) speech coding, a noise level of a speech in a concerning coding period is evaluated by using a code or coding results of at least one of spectrum information, power information, and pitch information, and various excitation codebooks 19 and 20 are used based on evaluation results. COPYRIGHT: (C)2008,JPO&INPIT

...read moreread less