scispace - formally typeset
Search or ask a question

Showing papers on "Code-excited linear prediction published in 2005"


Patent
15 Sep 2005
TL;DR: In this article, a low frequency component waveform encoding part calculates, based on a quantized LPC received from an LPC encoding part (202), a linear prediction residual signal of a digital audio signal received from a A/D converter (112), then performs a down sampling of the calculation result to extract the low frequency components comprising bands, which are lower than a predetermined frequency, in the audio signal.
Abstract: An audio encoding apparatus capable of improving the frame cancellation error tolerance, without increasing the number of bits of a fixed code book, in a CELP type audio encoding. In this apparatus, a low frequency component waveform encoding part (210) calculates, based on a quantized LPC received from an LPC encoding part (202), a linear prediction residual signal of a digital audio signal received from an A/D converter (112), then performs a down sampling of the calculation result to extract the low frequency components comprising bands, which are lower than a predetermined frequency, in the audio signal, and then waveform encodes the extracted low frequency components to produce encoded low-frequency component information. Then, the low frequency component waveform encoding part (210) inputs this encoded low-frequency component information to a packetizing part (231), while inputting the quantized low-frequency component waveform encoded signal (sound source waveform), which has been produced by the waveform encoding, to a high frequency component encoding part (220).

61 citations


Patent
Anisse Taleb1
09 Mar 2005
TL;DR: In this article, the pulse locations of the excitation signals of a first signal encoded by CELP are used to derive a limited set of candidate signals for a second correlated second signal.
Abstract: Information about excitation signals of a first signal encoded by CELP is used to derive a limited set of candidate excitation signals for a second correlated second signal Preferably, pulse locations of the excitation signals of the first encoded signal are used for determining the set of candidate excitation signals More preferably, the pulse locations of the set of candidate excitation signals are positioned in the vicinity of the pulse locations of the excitation signals of the first encoded signal The first and second signals may be multi-channel signals of a common speech or audio signal However, the first and second signals may also be identical, whereby the coding of the second signal can be utilized for re-encoding at a lower bit rate

57 citations


Journal ArticleDOI
TL;DR: This work proposes a novel multiple description-coding method for concealing packet losses in transmitting low bit rate-coded speech and can adapt its number of descriptions dynamically to network-loss conditions.
Abstract: A fundamental issue in real-time interactive voice transmissions over unreliable IF networks is the loss or late arrival of packets for playback. This problem is especially serious when transmitting low bit rate-coded speech with pervasive dependencies introduced. In this case, the loss or late arrival of a single packet will lead to the loss of subsequent dependent frames. We study end-to-end loss-concealment schemes for ensuring high quality in playback. We propose a novel multiple description-coding method for concealing packet losses in transmitting low bit rate-coded speech. Based on high correlations observed in linear predictor parameters-in the form of Line Spectral Pairs (LSPs)-of adjacent frames, we generate multiple descriptions in senders by interleaving LSPs, and reconstruct lost LSPs in receivers by linear interpolations. As excitation codewords have low correlations, we further enlarge the segment size for excitation generation and replicate excitation codewords in all the descriptions in order to maintain the same transmission bandwidth. Our proposed scheme can be extended easily to more than two descriptions and can adapt its number of descriptions dynamically to network-loss conditions. Experimental results on FS-1016 CELP, ITU G.723.1, and FS MELP coders show good performance of our scheme.

39 citations


Journal ArticleDOI
TL;DR: A speech watermarking scheme that is combined with CELP speech coding for speech authentication and the new codebook partition technique produces less distortion, and the statistical detection method guarantees that the error probability can be controlled under prescribed level.
Abstract: This letter presents a speech watermarking scheme that is combined with CELP (Code Excited Linear Prediction) speech coding for speech authentication. The excitation codebook of CELP is partitioned into three parts and labeled '0', '1' and 'any' according to the private key. Watermark embedding process chooses the codebook whose label is the same as the watermark bit and combines it with the codebook labeled 'any' for CELP coding. A statistical method is employed to detect the watermark, and the watermark length for authentication and detection threshold are determined by false alarm probability and missed detection probability. The new codebook partition technique produces less distortion, and the statistical detection method guarantees that the error probability can be controlledunder prescribed level.

37 citations


Journal Article
TL;DR: The paper describes the basic elements of the ALS codec and presents the latest developments in the standardization process and describes several important applications of this new lossless audio format in practice.
Abstract: MPEG-4 Audio Lossless Coding (ALS) is a new addition to the suite of MPEG-4 audio coding standards. The ALS codec is based on forward-adaptive linear prediction, which offers remarkable compression even with low predictor orders. Nevertheless, performance can be significantly improved by using higher predictor orders, more efficient quantization and encoding of the predictor coefficients, and adaptive block length switching. The paper describes the basic elements of the ALS codec with a focus on these recent improvements. It also presents the latest developments in the standardization process and describes several important applications of this new lossless audio format in practice.

34 citations


Patent
Huang Pengjun1
24 Feb 2005
TL;DR: In this article, a low-bit-rate coding technique for unvoiced segments of speech, without loss of quality compared to the conventional Code Excited Linear Prediction (CELP) method operating at a much higher bit rate.
Abstract: A low-bit-rate coding technique for unvoiced segments of speech, without loss of quality compared to the conventional Code Excited Linear Prediction (CELP) method operating at a much higher bit rate. A set of gains are derived from a residual signal after whitening the speech signal by a linear prediction filter. These gains are then quantized and applied to a randomly generated sparse excitation. The excitation is filtered, and its spectral characteristics are analyzed and compared to the spectral characteristics of the original residual signal. Based on this analysis, a filter is chosen to shape the spectral characteristics of the excitation to achieve optimal performance.

29 citations


Patent
Pasi Ojala1
18 Jan 2005
TL;DR: In this article, the authors proposed a method for compensating transient effects in transform coding and decoding of a combined speech and audio in electronic devices by using a transform based time-frequency domain codec.
Abstract: The present invention provides a method for compensating transient effects in transform coding and decoding of a combined speech and audio in electronic devices by using a transform based time-frequency domain codec. The method can combine, e.g., a CELP (code excited linear prediction) type speech codec and a transform type audio codec. The invention describes a compensation method to handle the transient (e.g., from the CELP coding to the transform coding) in transform coding when the number of quantized transform coding coefficients is lower than in the output of the transform.

25 citations


Patent
19 Oct 2005
TL;DR: In this article, the authors proposed a method to reduce the amount of mode information associated with a prediction method for generating an in-image prediction signal in a pixel region by using an entropy coding algorithm.
Abstract: PROBLEM TO BE SOLVED: To provide a technology capable of realizing efficient coding processing or decoding processing by decreasing the amount of mode information associated with a prediction method for generating an in-image prediction signal in a pixel region. SOLUTION: The image prediction coding apparatus 10 includes: an in-image prediction signal generating method determining section 15 that determines a prediction method derived on the basis of data corresponding to an adjacent region adjacent to an object region and comprising pixel signals having already been reproduced as an R mode prediction method or an L mode prediction method; an in-image prediction signal generating section 16 for generating an in-image prediction signal on the basis of the determined R mode prediction method; a subtractor 18, a conversion section 19, a quantization section 20, and an entropy coding section 25 for coding a residual signal between the pixel signal of the object region and the generated in-image prediction signal. COPYRIGHT: (C)2007,JPO&INPIT

20 citations


Patent
Ajit V. Rao1
10 Jan 2005
TL;DR: In this paper, the warp contours are modeled as points on a polynomial trace and the optimum warp contour is calculated by maximizing the model's correlation strength, and the linear shift is derived via quadratic approximation or other method.
Abstract: A signal modification technique facilitates compact voice coding by employing a continuous, rather than piece-wise continuous, time warp contour to modify an original residual signal to match an idealized contour, avoiding edge effects caused by prior art techniques Warping is executed using a continuous warp contour lacking spatial discontinuities which does not invert or overly distend the positions of adjacent end points in adjacent frames The linear shift implemented by the warp contour is derived via quadratic approximation or other method, to reduce the complexity of coding to allow for practical and economical implementation In particular, the algorithm for determining the warp contour uses only a subset of possible contours contained within a sub-range of the range of possible contours The relative correlation strengths from these contours are modeled as points on a polynomial trace and the optimum warp contour is calculated by maximizing the modeling function

20 citations


Patent
26 Dec 2005
TL;DR: In this paper, a scalable CELP encoding for stereo audio signals was proposed, where an adder and a multiplier obtain an average of first and second channel signals as a monophonic signal.
Abstract: A scalable encoding apparatus wherein stereo audio signals can be scalable encoded by use of a CELP encoding to improve the encoding efficiency. In the apparatus, an adder and a multiplier obtain an average of first and second channel signals as a monophonic signal. A CELP encoding part performs a CELP encoding of the monophonic signal. A first channel difference information encoding part performs an encoding of the first channel signal in conformance with the CELP encoding and obtains a difference between a resulting encoded parameter and an encoded parameter outputted from the CELP encoding part. The first channel difference information encoding part then encodes this difference and outputs the resulting encoded parameter.

18 citations


Journal ArticleDOI
TL;DR: An optimized trellis coded vector quantization (TCVQ) scheme for encoding the LSF parameters is developed and it is shown that the incorporated LSF TCVQ encoder performs better than the 34 bits/frame LSF scalar quantizer used originally in the FS1016 coder.

Patent
16 Jun 2005
TL;DR: In this paper, an audio encoding device capable of realizing effective encoding while using audio encoding of the CELP method in an extended layer when hierarchically encoding an audio signal was disclosed.
Abstract: There is disclosed an audio encoding device capable of realizing effective encoding while using audio encoding of the CELP method in an extended layer when hierarchically encoding an audio signal. In this device, a first encoding unit (115) subjects an input signal (S11) to audio encoding processing of the CELP method and outputs the obtained first encoded information (S12) to a parameter decoding unit (120). The parameter decoding unit (120) acquires a first quantization LSP code (L1), a first adaptive sound source lag code (A1), and the like from the first encoded information (S12), obtains a first parameter group (S13) from these codes, and outputs it to a second encoding unit (130). The second encoding unit (130) subjects the input signal (S11) to a second encoding processing by using the first parameter group (S13) and obtains second encoded information (S14). A multiplexing unit (154) multiplexes the first encoded information (S12) with the second encoded information (S14) and outputs them via a transmission path N to a decoding device (150).

Patent
Hiroyuki Ehara1
29 Aug 2005
TL;DR: In this paper, a linear prediction analyzer analyzes an input digital speech signal and outputs linear predictive coefficients, which are then quantized by a linear predictive coefficient quantizer to improve frame cancellation error tolerance without increasing a number of bits of a fixed codebook in CELP type audio encoding.
Abstract: An audio encoding apparatus capable of improving a frame cancellation error tolerance without increasing a number of bits of a fixed codebook in a CELP type audio encoding. A linear prediction analyzer analyzes an input digital speech signal and outputs linear predictive coefficients. A linear predictive coefficients quantizer quantizes the linear predictive coefficients. A low-frequency-band component encoder encodes a down-sampled linear-predictive residual signal by a pulse-code-modulation encoder and generates low-frequency-band component encoded information, while a high-frequency-band component encoder encodes an error signal between a linear-predictive residual signal and an up-sampled signal of a decoded down-sampled linear-predictive residual signal by a code-excited-linear-prediction encoder and generates high-frequency-band component encoded information.

Proceedings ArticleDOI
01 Oct 2005
TL;DR: A layered CELP speech coding scheme that adapts dynamically to the characteristics of the speech encoded and the network loss conditions in real time transmissions of voice over IP, based on the ITU G.729 CS-ACELP codec operating at 8 Kbps is proposed.
Abstract: In this paper, we propose a layered CELP speech coding (LC) scheme that adapts dynamically to the characteristics of the speech encoded and the network loss conditions in real time transmissions of voice over IP. Based on the ITU G.729 CS-ACELP codec operating at 8 Kbps, we design a variable bit-rate codec that is robust to losses and delays in IP networks. To cope with bursty losses while maintaining an acceptable end-to-end delay, our scheme employs LC with redundant piggybacking of perceptually important parameters in the base layer, with a degree of redundancy adjusted according to feedbacks from receivers. Under various delay constraints, we study trade-offs between the additional bit rate required for redundant piggybacking and the protection of perceptually important parameters. Experimental results show that our scheme works well and has quality comparable to full replication

Proceedings ArticleDOI
18 Mar 2005
TL;DR: It is shown that a constrained search of the adaptive and innovative codebooks significantly improves the recovery time of the decoder after a lost frame, at the cost of only minor quality degradation in a clear channel.
Abstract: The adaptive codebook used in CELP-like speech coders is extremely effective on voiced signals. Unfortunately, it is also the main source of error propagation at the decoder when a frame is lost. In this paper, we study several ways of limiting the energy contribution of the adaptive codebook to the synthesized speech signal. We show that a constrained search of the adaptive and innovative codebooks significantly improves the recovery time of the decoder after a lost frame, at the cost of only minor quality degradation in a clear channel. When applied to a standard codec such as the AMR-WB, this constraint only affects the encoder, and the modified codec remains fully interoperable with the standard codec.

Proceedings ArticleDOI
04 Sep 2005
TL;DR: This paper proposes a speech coder using mel-frequency cepstral coefficients (MFCCs) instead of LPCs to improve the performance of a server-based speech recognition system in network environments and proposes an 8.7 kbps MFCC-based CELP coder.
Abstract: Existing standard speech coders can provide high quality speech communication. However, they tend to degrade the performance of automatic speech recognition (ASR) systems that use the reconstructed speech. The main cause of the degradation is in that the linear predictive coefficients (LPCs), which are typical spectral envelope parameters in speech coding, are optimized to speech quality rather than to the performance of speech recognition. In this paper, we propose a speech coder using mel-frequency cepstral coefficients (MFCCs) instead of LPCs to improve the performance of a server-based speech recognition system in network environments. To develop the proposed speech coder with a low-bit rate, we first explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel errors. As a result, we propose an 8.7 kbps MFCC-based CELP coder. It is shown that the proposed speech coder has a comparable speech quality to 8 kbps G.729 and the ASR system using the proposed speech coder gives the relative word error rate reduction by 6.8% as compared to the ASR system using G.729 on a large vocabulary task (AURORA4).

Journal ArticleDOI
TL;DR: A new method is presented that offers efficient computation of Linear Prediction Coefficients (LPC) via a new Recursive Least Squares (RLS) adaptive filtering algorithm that is numerically robust, fast, parallelizable and has particularly good tracking properties.

Journal ArticleDOI
TL;DR: A novel signal modification method for wide-band code-excited linear prediction (CELP) speech codecs to improve pitch prediction at low bit rates and preserves the original time scale at the end of each frame is introduced.
Abstract: This paper introduces a novel signal modification method for wide-band code-excited linear prediction (CELP) speech codecs to improve pitch prediction at low bit rates. The method is enabled only in stable voiced speech frames, and preserves the original time scale at the end of each frame. This feature helps to avoid artifacts and simplifies an encoder implementation. The signal modification includes a classification algorithm as an integral part. The classification algorithm detects the frames most suitable for signal modification and low bit rate coding, and can be employed in a rate selection module of variable bit rate (VBR) codecs. In this paper, the signal modification method is applied in an experimental VBR wide-band speech codec derived from the 3GPP adaptive multirate wideband (AMR-WB) standard (ITU-T Recommendation G.722.2). The codec fulfills the system requirements of IS-95/CDMA2000 Rate Set II, operating at source coding bit rates 12.65, 6.2, and 1.0 kb/s. The signal modification is used in the 6.2 kb/s mode dedicated for voiced speech frames. Listening test results demonstrate the good performance of the proposed method. The signal modification method is used in the Nokia/VoiceAge codec that was declared in April 2003 as the winner of the selection phase in the 3GPP2 CDMA2000 wide-band speech codec standardization.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: A novel frequency-domain technique is proposed that attempts to shape the residual of a vocoder such that it falls below psychoacoustic thresholds.
Abstract: Code excited linear predictive (CELP) coding standards often fail to properly represent non-speech signals because they are inherently optimized for speech. Most modern CELP coders include provisions for the inclusion of indirect perceptual criteria to counteract this problem; however no direct psychoacoustic models are employed. In this paper, we present a pre- and postprocessor for the vocoder that makes use of the MPEG-1 psychoacoustic model 1 in order to enhance the quality of the coded audio. A novel frequency-domain technique is proposed that attempts to shape the residual of a vocoder such that it falls below psychoacoustic thresholds

Proceedings ArticleDOI
23 May 2005
TL;DR: A frequency-domain technique involving time-varying filters to enhance the audio quality with a small number of overhead information bits is proposed and preliminary results have shown both qualitative and quantitative improvements in the sound quality.
Abstract: Low bit rate standards involving the code excited linear predictive coder (CELP) are typically optimized for telephone speech. The performance of CELP with non-speech signals, and particularly with music, degrades considerably. It is of interest these days to improve the robustness of these coders by adding pre- and post-processing stages to the core compression algorithm. In this paper, we propose a frequency-domain technique involving time-varying filters to enhance the audio quality with a small number of overhead information bits. Preliminary results have shown both qualitative and quantitative improvements in the sound quality.

Proceedings ArticleDOI
01 Dec 2005
TL;DR: This paper aims to explore algorithm hardware-migration technologies, in particular, Levinson-Durbin based linear predictive coding algorithm into FPGAs.
Abstract: Algorithms traditionally are developed on off-the-shelf digital signal processors. Recent silicon technology advances have made FPGAs become feasible as an alternative solution. This paper aims to explore algorithm hardware-migration technologies, in particular, Levinson-Durbin based linear predictive coding algorithm into FPGAs.

Proceedings ArticleDOI
06 Jul 2005
TL;DR: A speech-adaptive layered-coding scheme for the loss concealments of real-time CELP-coded speech transmitted over IP networks that delivers good-quality speech with a level of protection similar to full replication under medium loss rates, and provides speech quality similar to the standard G.729 under very low loss rates.
Abstract: In this paper, we propose a speech-adaptive layered-coding (LC) scheme for the loss concealments of real-time CELP-coded speech transmitted over IP networks. Based on the ITU G.729 CS-ACELP codec operating at 8 Kbps, we design a loss-robust speech-adaptive codec at the same bit rate. Our scheme employs LC with redundant packetization in order to conceal losses and adapt to dynamic loss conditions characterized by the loss rate and the degree of burst, while maintaining an acceptable end-to-end delay. By protecting only the most important excitation parameters of each frame according to its speech type, our approach enables more efficient use of the bit budget. Our scheme delivers good-quality speech with a level of protection similar to full replication under medium loss rates, provides speech quality similar to the standard G.729 under very low loss rates, and outperforms both for low-to-medium loss rates.

Patent
Chanwoo Kim1
18 Jul 2005
TL;DR: In this paper, a method of voice coding/decoding is proposed, in which various parameters computed during voice coding are compressed for transmission, without degradation of voice quality and transmission delay.
Abstract: The present invention provides a method of voice coding/decoding. Various parameters computed during voice coding are compressed for transmission. CELP coding of high compressibility and decoding corresponding to CELP coding is implemented without degradation of voice quality and transmission delay. An exemplary method of the present invention comprises performing voice coding, computing a value of at least one characteristic parameter via the voice coding, compressing the computed value of the at least one characteristic parameter, and transmitting the compressed data.

Journal ArticleDOI
TL;DR: An energy extrapolation-based concealment algorithm for an excitation signal of a CELP decoder that was implemented on the 3GPP AMR standard decoder, and its performance was compared with that of the standard algorithm.
Abstract: We propose an energy extrapolation-based concealment algorithm for an excitation signal of a CELP decoder. In the concealment algorithm, adaptive codebook excitation is generated to maintain continuity of excitation energy on consecutive frames, and high-frequency regions are replaced with random excitation processed through a high-pass filter. The cut-off frequency of the filter is switched according to mode information. The algorithm was implemented on the 3GPP AMR standard decoder, and its performance was compared with that of the standard algorithm. Listening test results suggested that improved subjective quality was produced by the new algorithm.

Journal ArticleDOI
TL;DR: This paper proposes a Voice Activity Detection (VAD) algorithm using Radial Basis Function (RBF) network, which achieves better performance than G.729 Annex B at any noise level.
Abstract: This paper proposes a Voice Activity Detection (VAD) algorithm using Radial Basis Function (RBF) network. The k-means clustering and Least Mean Square (LMS) algorithm are used to update the RBF network to the underlying speech condition. The inputs for RBF are the three parameters a Code Excited Linear Prediction (CELP) coder, which works stably under various background noise levels. Adaptive hangover threshold applies in RBF-VAD for reducing error, because threshold value has trade off effect in VAD decision. The experimental results show that the proposed VAD algorithm achieves better performance than G.729 Annex B at any noise level.

Journal ArticleDOI
TL;DR: A scalable band-split wideband speech coding system that uses G.729 as a component and can decode speech from even part of the 16 kbitss encoded data is proposed and the quality is greatly improved compared with the conventional method.
Abstract: This paper proposes a scalable band-split wideband speech coding system that uses G.729 as a component and can decode speech from even part of the 16 kbitss encoded data. The configuration of the proposed scalable encoding is as follows. The 7-kHz band speech signal is split into lower and upper bands, which are encoded separately. The lower band is scalably encoded at 12 kbitss, and the upper band is encoded by CELP composed only of the noise codebook (4 kbitss). In the lower band, the lower-level encoder (core layer) is included in the upper-level encoder (enhancement layer) as part of the functional element, to improve the performance of the whole lower-band encoder. To further improve quality while retaining scalability to the core coder, an additional pitch prediction scheme is proposed in which the excitation component of the enhancement layer is used in the pitch component. Performance evaluation by objective measures and a subjective evaluation experiment for the proposed schemes, showed that the quality is greatly improved compared with the conventional method. © 2005 Wiley Periodicals, Inc. Syst Comp Jpn, 36(3): 65–74, 2005; Published online in Wiley InterScience (). DOI 10.1002sscj.10555


Patent
24 Nov 2005
TL;DR: In this article, a code-excited linear prediction (CELP) speech coding is used to reproduce high quality speech with a small data amount in speech coding and decoding for performing compression coding of a speech signal to a digital signal.
Abstract: PROBLEM TO BE SOLVED: To reproduce a high quality speech with a small data amount in speech coding and decoding for performing compression coding of a speech signal to a digital signal. SOLUTION: In a code-excited linear prediction (CELP) speech coding, a noise level of the speech in a concerning coding period is evaluated by using a code or coding result of at least one of spectrum information, power information, and pitch information, and various excitation codebooks 19 and 20 are used in response to an evaluation result. COPYRIGHT: (C)2006,JPO&NCIPI

Proceedings ArticleDOI
01 Jan 2005
TL;DR: A high quality wideband speech coder based on code-excited linear prediction (CELP) algorithm with perceptually constrained variable bitrate (VBR) with comparison with MPEG1 Layer III codecs is proposed.
Abstract: A high quality wideband speech coder based on code-excited linear prediction (CELP) algorithm with perceptually constrained variable bitrate (VBR) is proposed in this paper A VBR is achieved with the help of reconfigurable structure of multiband multistage codebook of excitation vectors controlled by psychoacoustic model based on warped discrete-Fourier transform (WDFT) Comparison with MPEG1 Layer III codecs at 16, 24 and 32 kbps is implemented

Proceedings Article
01 Jan 2005
TL;DR: The paper describes the complete method for synthesizing the speech including the pre-processing of the text and prosody analysis in the Indian style of speaking and successfully implemented ‘PALSA’ – an Indian accented English speech synthesizer.
Abstract: This paper elucidates a practical solution to an Indian accented English text to speech synthesizing system. The paper covers the complete procedure to generate the speech signal of the text, in Indian accented voice. The technique described considers the various prosodic features that need to be incorporated into the synthesized speech to make it appear natural and in the way an Indian speaks. The paper describes the complete method for synthesizing the speech including the pre-processing of the text and prosody analysis in the Indian style of speaking. The diphones extracted from an Indian’s voice are coded using Residual Excited Linear Predictive (RELP) coding technique and the resulting formants and residual values of the diphones are used to resynthesize the final output speech for the text to be synthesized. ‘PALSA’ – an Indian accented English speech synthesizer has been successfully implemented using the mentioned technique to produce the Indian accented speech and is described in this paper.