scispace - formally typeset
Search or ask a question

Showing papers on "Code-excited linear prediction published in 1999"


Patent
Allen Gersho1, Vladimir Cuperman1, Ajit V. Rao1, Tung-Chiang Yang1, Sassan Ahmadi1, Fenghua Liu1 
23 Dec 1999
TL;DR: In this paper, the authors proposed a method for speech coding wherein the speech signal is represented by an excitation signal applied to a synthesis filter, and the speech is partitioned into frames and subframes.
Abstract: A speech coder (12) and a method for speech coding wherein the speech signal is represented by an excitation signal applied to a synthesis filter. The speech is partitioned into frames and subframes. A classifier (22) identifies which of several categories the speech frame belongs to, and a different coding method is applied to represent the excitation for each category. For some categories, one or more windows are identified for the frame where all or most of the excitation signal samples are assigned by a coding scheme. Performance is enhanced by coding the important segments of the excitation more accurately. The window locations are determined from a linear prediction residual by identifying peaks of the smoothed residual energy contour. The method adjusts the frame and subframe boundaries so that each window is located entirely within a modified subframe or frame. This eliminates the artificial restriction incurred when coding a frame or subframe in isolation, without regard for the local behavior of the speech signal across frame or subframe boundaries.

83 citations


Patent
Yang Gao1
24 Aug 1999
TL;DR: In this article, a multi-rate speech codec supports a number of encoding bit rate modes by adaptively selecting encoding bits rate modes to match communication channel restrictions, and a variety of techniques are applied, many of which involve the classification of the input signal.
Abstract: A multi-rate speech codec supports a number of encoding bit rate modes by adaptively selecting encoding bit rate modes to match communication channel restrictions. In higher bit rate encoding modes, an accurate representation of speech through CELP (code-excited linear prediction) and other associated modeling parameters are generated for higher quality decoding and reproduction. To achieve high quality in high lower bit rate encoding modes, the speech encoder departs from the strict waveform matching criteria of regular CELP coders and strives to identify significant perceptual features of the input signal. To support lower bit rate encoding modes, a variety of techniques are applied, many of which involve the classification of the input signal. For each of the bit rate modes selected, a number of fixed or innovation sub-codebooks are selected in use in generating innovation vectors.

77 citations


Patent
Amitava Das1
26 Feb 1999
TL;DR: In this paper, a closed-loop, multimode, mixed-domain linear prediction (MDLP) speech coder includes a high-rate, time-domain coding mode, a low rate, frequency-domain encoding mode, and a closedloop mode-selection mechanism for selecting a coding mode for the coder based upon the speech content of frames input to the Coder.
Abstract: A closed-loop, multimode, mixed-domain linear prediction (MDLP) speech coder includes a high-rate, time-domain coding mode, a low-rate, frequency-domain coding mode, and a closed-loop mode-selection mechanism for selecting a coding mode for the coder based upon the speech content of frames input to the coder. Transition speech (i.e., from unvoiced speech to voiced speech, or vice versa) frames are encoded with the high-rate, time-domain coding mode, which may be a CELP coding mode. Voiced speech frames are encoded with the low-rate, frequency-domain coding mode, which may be a harmonic coding mode. Phase parameters are not encoded by the frequency-domain coding mode, and are instead modeled in accordance with, e.g., a quadratic phase model. For each speech frame encoded with the frequency-domain coding mode, the initial phase value is taken to be the initial phase value of the immediately preceding speech frame encoded with the frequency-domain coding mode. If the immediately preceding speech frame was encoded with the time-domain coding mode, the initial phase value of the current speech frame is computed from the decoded speech frame information of the immediately preceding, time-domain-encoded speech frame. Each speech frame encoded with the frequency-domain coding mode may be compared with the corresponding input speech frame to obtain a performance measure. If the performance measure falls below a predefined threshold value, the input speech frame is encoded with the time-domain coding mode.

66 citations


Proceedings ArticleDOI
20 Jun 1999
TL;DR: Novel solutions for pre-processing noisy speech prior to low bit rate speech coding using a new adaptive limiting algorithm for the a priori signal-to-noise ratio (SNR) estimate and a novel overlap/add scheme are presented.
Abstract: In this paper we present novel solutions for pre-processing noisy speech prior to low bit rate speech coding. We strive especially to improve the estimation of spectral parameters and to reduce the additional algorithmic delay caused by the enhancement pre-processor. While the former is achieved using a new adaptive limiting algorithm for the a priori signal-to-noise ratio (SNR) estimate, the latter makes use of a novel overlap/add scheme. Our enhancement techniques were evaluated in conjunction with the 2400 bps mixed excitation linear prediction (MELP) coder by means of formal and informal listening tests.

62 citations


Proceedings ArticleDOI
15 Mar 1999
TL;DR: A combined adaptive transform codec (ATC) and code-excited linear prediction (CELP) algorithm for the compression of wideband (7 kHz) signals is described and a switching scheme between CELP and ATC mode is proposed and a frame erasure concealment technique is proposed.
Abstract: This paper describes a combined adaptive transform codec (ATC) and code-excited linear prediction (CELP) algorithm, called ATCELP, for the compression of wideband (7 kHz) signals. The CELP algorithm applies mainly to speech, whereas the ATC mode is selected for music and noise signals. We propose a switching scheme between CELP and ATC mode and describe a frame erasure concealment technique. Subjective listening tests have shown that the ATCELP codec at bit rates of 16, 24 and 32 kbit/s achieved performances close to those of the CCITT G.722 at 48, 56 and 64 kbit/s, respectively, at most operating conditions.

48 citations


Patent
15 Nov 1999
TL;DR: In this article, a random code vector reading section was replaced with an oscillator for outputting different vector streams in accordance with values of input seeds, and a seed storage section for storing a pluralitty of seeds.
Abstract: A random code vector reading section and a random codebook of a conventional CELP type speech coder/decoder are respectively replaced with an oscillator for outputting different vector streams in accordance with values of input seeds, and a seed storage section for storing a pluralitty of seeds This makes it unnecessary to store fixed vectors as they are in a fixed codebook (ROM) thereby considerably reducing the memory capacity.

46 citations


Patent
12 Feb 1999
TL;DR: In this article, a method and apparatus for CELP-to-CELPbased vocoder packet translation is presented, which includes a formant parameter translator and an excitation parameter translator.
Abstract: A method and apparatus for CELP-based to CELP-based vocoder packet translation. The apparatus includes a formant parameter translator and an excitation parameter translator. The formant parameter translator includes a model order converter and a time base converter. The method includes the steps of translating the formant filter coefficients of the input packet from the input CELP format to the output CELP format and translating the pitch and codebook parameters of the input speech packet from the input CELP format to the output CELP format. The step of translating the formant filter coefficients includes the steps of converting the model order of the formant filter coefficients from the model order of the input CELP format to the model order of the output CELP format and converting the time base of the resulting coefficients from the input CELP format time base to the output CELP format time base.

45 citations


Proceedings ArticleDOI
S.A. Ramprashad1
20 Jun 1999
TL;DR: This multimode transform predictive coder (MTPC) shows improved performance on both speech and audio inputs when compared to a single-mode transform predictivecoder (TPC).
Abstract: Speech and audio coding are often considered to be two separate technologies, each almost independently developing different techniques for signal compression. At low bit rates the gap in performance between the two technologies begins to be noticeable; speech coders work better on speech and audio coders perform better on music. The challenge is to merge the two technologies into a single coding paradigm which will work as well as either two regardless of the input signal. Presented is a multimode speech and audio coder which can adapt almost continuously between a speech and audio coding mode. This multimode transform predictive coder (MTPC) shows improved performance on both speech and audio inputs when compared to a single-mode transform predictive coder (TPC).

44 citations


Journal ArticleDOI
TL;DR: A two stage hybrid embedded speech/audio coding structure and algorithm is proposed which can be used to enhance the quality of an existing codec without modification of the original coding algorithm.
Abstract: A two stage hybrid embedded speech/audio coding structure and algorithm are proposed. The first stage of the structure consists of a core speech coder which provides a minimum output bit rate and acceptable performance on clean speech inputs. The second stage is a perceptual/transform based coder which provides a separate optional bitstream for the enhancement of the core stage output. The two stage structure can be used to enhance the quality of an existing codec without modification of the original coding algorithm. In this regard it can be considered a value added option that can be used with a standard (existing) system. The structure can also be used in systems in which many users/systems force the coding algorithm to work simultaneously under multiple constraints of bitrate, complexity, delay, and coding quality. Informal testing of the algorithm has been done using ITU-T standard G.723.1 at 5.3 kb/s as a core coder. The maximum combined bitrate from the core and enhancement stages for the tests is 16 kb/s. The tests show that the second stage significantly improves the quality of the core output in the cases of music and speech with background noise. Compared to the non-embedded fixed rate standard LD-CELP G.728 at 16 kb/s, the quality of the two stage structure is generally lower on these inputs; the embedded feature does affect quality. On clean speech the quality of the two stage structure at 16 kb/s is close to if not better than that of G.728 at 16 kb/s.

31 citations


Proceedings ArticleDOI
15 Mar 1999
TL;DR: An adaptive multi-rate (AMR) speech coder designed to operate under the GSM digital cellular full rate and half rate channels and to maintain high quality in the presence of highly varying background noise and channel conditions is developed.
Abstract: We have developed an adaptive multi-rate (AMR) speech coder designed to operate under the GSM digital cellular full rate (22.8 kb/s) and half rate (11.4 kb/s) channels and to maintain high quality in the presence of highly varying background noise and channel conditions. Within each total rate, several codec modes with different source/channel bit rate allocations are used. The speech coders in each codec mode are based on the CELP algorithm operating at rates ranging from 11.85 kb/s down to 5.15 kb/s, where the lowest rate coder is a source controlled multi-modal speech coder. The decoders monitor the channel quality at both ends of the wireless link using the soft values for the received bits and assist the base station in selecting the codec mode that is appropriate for a given channel condition. The coder was submitted to the GSM AMR standardization competition and met the qualification requirements in an independent formal MOS test.

30 citations


Journal ArticleDOI
TL;DR: Results show a saving of at least 2 b/frame for unvoiced spectra compared to voiced spectra to achieve the same spectral distortion performance, leading to some interesting observations on the role of the analysis-by-synthesis structure of CELP.
Abstract: Phonetic classification of speech frames allows distinctive quantization and bit allocation schemes suited to the particular class. Separate quantization of the linear predictive coding (LPC) parameters for voiced and unvoiced speech frames is shown to offer useful gains for representing the synthesis filter commonly used in code-excited linear prediction (CELP) and other coders. Subjective test results are reported that determine the required bit rate and accuracy in the two classes of voiced and unvoiced LPC spectra for CELP coding with phonetic classification. It was found, in this context, that unvoiced spectra need 9 b/frame or more whereas voiced spectra need 25 b/frame or more with the quantization schemes used. New spectral distortion criteria needed to assure transparent LPC spectral quantization for each voicing class in CELP coders are presented. Similar subjective test results for speech synthesized from the true residual signal are also presented, leading to some interesting observations on the role of the analysis-by-synthesis structure of CELP. Objective performance assessments based on the spectral distortion measure are also presented. The theoretical distortion-rate function for the spectral distortion measure is estimated for voiced and unvoiced LPC parameters and compared with experimental results obtained with unstructured vector quantization (VQ). These results show a saving of at least 2 b/frame for unvoiced spectra compared to voiced spectra to achieve the same spectral distortion performance.

Proceedings ArticleDOI
Nam Ha1
15 Mar 1999
TL;DR: This paper proposes a fast search method of algebraic codebook in CELP coders that reduces the computations considerably compared with G.729 at the expense of a slight degradation of speech quality, and gives better speech quality with smaller average search space thanG.729A.
Abstract: This paper proposes a fast search method of algebraic codebook in CELP coders. In the proposed method, the sequence of codebook search is reordered according to the criterion of mean squared weighted error between target vector and filtered adaptive codebook vector, and the algebraic codebook is searched until a predetermined threshold is satisfied. This method reduces the computations considerably compared with G.729 at the expense of a slight degradation of speech quality. Moreover, it gives better speech quality with smaller average search space than G.729A.

Journal ArticleDOI
TL;DR: This paper considers the use of sequence maximum a posteriori (MAP) decoding of trellis codes, which can exploit any "residual redundancy" that may exist in the channel encoded signal in the form of memory and/or a nonuniform distribution to provide enhanced performance over very noisy channels, relative to maximum likelihood (ML) decoding.
Abstract: This paper considers the use of sequence maximum a posteriori (MAP) decoding of trellis codes. A MAP receiver can exploit any "residual redundancy" that may exist in the channel encoded signal in the form of memory and/or a nonuniform distribution, thereby providing enhanced performance over very noisy channels, relative to maximum likelihood (ML) decoding. The paper begins with a first-order two-state Markov model for the channel encoder input. A variety of different systems with different source parameters, different modulation schemes, and different encoder complexities are simulated. Sequence MAP decoding is shown to substantially improve performance under very noisy channel conditions for systems with low-to-moderate redundancy, with relative gain increasing as the rate increases. As a result, coding schemes with multidimensional constellations are shown to have higher MAP gains than comparable schemes with two-dimensional (2-D) constellations. The second part of the paper considers trellis encoding of the code-excited linear predictive (CELP) speech coder's line spectral parameters (LSPs) with four-dimensional (4-D) QPSK modulation. Two source LSP models are used. One assumes only intraframe correlation of LSPs while the second one models both intraframe and interframe correlation. MAP decoding gains (over ML decoding) as much as 4 dB are achieved. Also, a comparison between the conventionally designed codes and an I-Q QPSK scheme shows that the I-Q scheme achieves better performance even though the first (sampler) LSP model is used.

01 Sep 1999
TL;DR: At this time, it seems plausible that high-quality audio coding at approximately 2 bits/sample and with a coding delay of less than 2 ms is a realistic goal.
Abstract: In certain bidirectional and multi-directional real-time audio applications there is a need for low-delay audio coding techniques. A typical encoding/decoding delay of current wideband audio codecs is more than 40 ms, whereas the goal for low-delay coding significantly less than that. Both technical and psychoacoustical requirements and limitations of low-delay coding in bidirectional real-time audio applications are discussed in this paper. The coding delay in codecs based on non-parametric spectral estimation, e.g., subband decomposition or MDCT, result from the buffering of signal frames before spectral analysis. Usually there are also several other sources for the algorithmic delay which have been reviewed in the case of MPEG codecs in [1]. In this paper, it is suggested that the algorithmic coding delay should be 2-5 ms. A frame length of 2 ms corresponds to 88 samples of audio at 44.1 kHz sampling rate. It is probable that sufficiently high definition spectral decomposition cannot be obtained using non-parametric techniques in this short frame. The codec introduced in the paper uses parametric spectral estimation, which is a variant of linear predictive spectral modeling. This algorithm is actually a modification of a lowdelay speech coding algorithm, G.728 Low-Delay CELP [2], which is a widely used standard codec in video conferencing applications. Linear predictive analysis and auditory modeling are performed in a backward adaptive manner, which means that the analysis window lies mainly on the already transmitted part of the signal. The coefficients from the LPC analysis are used in a time-varying synthesis filter. The synthesis filter is driven by an excitation signal which consists of a sequence of excitation vectors which have been selected from a vector codebook using a simplified auditory model. A version of the G.728 codec for wideband speech at sampling rate of 32 kHz has already been proposed [3]. A major modification to the conventional algorithm in the current paper is that the linear predictive analysis and synthesis filter are frequencywarped [4, 5, 6, 7]. This means that the frequency resolution of the spectral estimation is matched with the frequency scale of hearing. This technique makes the linear predictive coding scheme applicable to perceptual wideband audio coding. At this time, it seems plausible that high-quality audio coding at approximately 2 bits/sample and with a coding delay of less than 2 ms is a realistic goal.

Journal ArticleDOI
TL;DR: The target of the paper is to introduce an efficient and accurate framework allowing a network designer to analyze the impact of multimode VBR speech coding on the quality of service (QoS) provided by a wireless/wired ATM network.
Abstract: Multimode coders are able to exploit the different characteristics of the speech waveform and to take into account the different peculiarities of background noise, thus allowing improvements in both signal reconstruction and network-offered load. In this context the variable rate code excited linear prediction (VR-CELP) coding, that is, a multimode variable bit rate (VBR) coding based on the CELP technique, has been introduced in the literature and is currently being considered for use in various applications, especially in the third-generation UMTS cellular systems. The target of the paper is to introduce an efficient and accurate framework allowing a network designer to analyze the impact of multimode VBR speech coding on the quality of service (QoS) provided by a wireless/wired ATM network. In order to capture the coder output characteristics, we propose to model a VR-CELP voice source by using a switched batch Bernoulli process (SBBP). More specifically, three models are introduced and compared in terms of accuracy and simplicity in determining network performance. As a result of the comparison, a four-state model has been chosen as the best tradeoff. The model is then used to analytically derive the loss probability and the jitter probability density function of an ATM multiplexer loaded by a number of VR-CELP sources. Finally, the proposed paradigm has been assessed in a case study where we demonstrate that, for a given output ATM link capacity and for a number of telecommunication services involving voice transmission, VR-CELP coding performs better than traditional on-off coding.

Proceedings ArticleDOI
15 Mar 1999
TL;DR: A system to encode speech with high quality using MELP, a technique previously demonstrated to be effective at bit-rates of 1.6-2.4 kb/s, is presented and the development and testing of a high quality 4kb/s M ELP coder is described.
Abstract: A number of coding techniques have been reported to achieve near toll quality synthesized speech at bit-rates around 4 kb/s. These include variants of code excited linear prediction (CELP), sinusoidal transform coding (STC) and multi-band excitation (MBE). While CELP has been an effective technique for bit-rates above 6 kb/s, STC, MBE, waveform interpolation (WI) and mixed excitation linear prediction (MELP) models seem to be attractive at bit-rates below 3 kb/s. We present a system to encode speech with high quality using MELP, a technique previously demonstrated to be effective at bit-rates of 1.6-2.4 kb/s. We have enhanced the MELP model producing significantly higher speech quality at bit-rates above 2.4 kb/s. We describe the development and testing of a high quality 4 kb/s MELP coder.

Patent
16 Mar 1999
TL;DR: An integrated circuit for processing a speech signal in accordance with a CELP standard includes a plurality of processing elements coupled to a data bus in parallel Each processing element includes a multiplier and an accumulator The integrated circuit further includes an auxiliary processing element which is also coupled to the data bus and has a division unit and a comparator.
Abstract: An integrated circuit for processing a speech signal in accordance with a CELP standard includes a plurality of processing elements coupled to a data bus in parallel Each processing element includes a multiplier and an accumulator The integrated circuit further includes an auxiliary processing element, which is also coupled to the data bus and has a division unit and a comparator The plurality of processing elements and the auxiliary processing element are also coupled in a pipeline formation

Proceedings ArticleDOI
20 Jun 1999
TL;DR: A wideband (7 kHz) speech coding scheme using code-excited linear prediction (CELP) with mixed time and frequency domain excitation with improved synthesis filter is described.
Abstract: This paper describes a wideband (7 kHz) speech coding scheme using code-excited linear prediction (CELP) with mixed time and frequency domain excitation. The proposed frequency domain innovation can be used alternatively or in parallel to a time domain codebook. In addition an improved synthesis filter is used consisting of a signal dependent combination of a forward adaptive and a backward adaptive (FA/BA) structure. An experimental codec operating at 15.5 or 20.0 kbit/s is demonstrated.

Patent
Anders Uvliden1, Jonas Svedberg1
24 Aug 1999
TL;DR: In this article, a multi-codebook fixed bitrate CELP signal block encoder/decoder includes a codebook selector for selecting, for each signal block, a corresponding codebook identification in accordance with a deterministic selection procedure that is independent of signal type.
Abstract: A multi-codebook fixed bitrate CELP signal block encoder/decoder includes a codebook selector (22) for selecting, for each signal block, a corresponding codebook identification in accordance with a deterministic selection procedure that is independent of signal type Included are also means for encoding/decoding each signal block by using a codebook having the selected codebook identification

Proceedings ArticleDOI
20 Jun 1999
TL;DR: It is shown that the coding distortion induced by the phase difference between the coded residual signal and the time-variant linear prediction filter used for synthesis in the decoder may cause audible artifacts to the synthesized speech even if lossless coding of all parameters is employed.
Abstract: Several speech coding algorithms modify the time scale of the residual signal to facilitate efficient coding of pitch information. Time scaling, however, results in a phase difference between the coded residual signal and the time-variant linear prediction (LP) filter used for synthesis in the decoder. In this paper, we examine the coding distortion induced by this phase difference. Moreover, we show that it may cause audible artifacts to the synthesized speech even if lossless coding of all parameters is employed. These artifacts occur particularly at onsets when the frequency response of successive LP filters changes rapidly. A waveform interpolation coder is used to illustrate the effects of the phase mismatch.

Journal ArticleDOI
TL;DR: Subjective test results are presented demonstrating that the EVRC delivers excellent quality voice in clean speech/clear channel conditions, and that its performance exceeds that of most currently standardized speech coders for wireless applications in background noise and/or impaired channel conditions.
Abstract: The Enhanced Variable Rate Coder (EVRC), standardized by the Telecommunications Industry Association (TIA) as IS-127, is intended for use with the IS-95x Rate Set 1 air interface (CDMA) This coder operates at a maximum rate of 85 kb/s and an average rate of about 41 kb/s on conversational speech The EVRC consists of three coding modes that are all based on the Code Excited Linear Prediction (CELP) model Selection among the three modes is based on an estimate of the input signal state, with active speech encoded primarily at 170 bits/20 msec frame (Rate 1), background noise and silence encoded at 16 bits/frame (Rate 1/8), and some active speech and essentially all transitions between speech and silence encoded at 80 bits/frame (Rate 1/2) In order to improve performance in the presence of background noise, the EVRC employs an adaptive noise-suppression filter at the input Subjective test results are presented demonstrating that the EVRC delivers excellent quality voice in clean speech/clear channel conditions, and that its performance exceeds that of most currently standardized speech coders for wireless applications in background noise and/or impaired channel conditions

Proceedings ArticleDOI
15 Mar 1999
TL;DR: A method for adaptively allocating of pulse position candidates using an adaptive code vector for the adaptation of CELP coders using pulse codebooks for excitations such as ACELP is described.
Abstract: CELP coders using pulse codebooks for excitations such as ACELP have the advantages of low complexity and high speech quality. At low bit rates, however, the decrease of pulse position candidates and the number of pulses degrades reconstructed speech quality. This paper describes a method for adaptively allocating of pulse position candidates. In the proposed method, N efficient candidates of pulse positions are selected out of all possible positions in a subframe. The amplitude envelope of an adaptive code vector is used for selecting N efficient candidates. The larger the amplitude is, the more pulse positions are assigned. Using an adaptive code vector for the adaptation, the proposed method requires no additional bits for the adaptation. Experimental results show that the proposed method increases WSNRseg by 0.3 dB and MOS by 0.15.

Proceedings ArticleDOI
20 Jun 1999
TL;DR: This paper deals with multi-stage vector quantization of line spectrum pair (LSP) parameters in wideband speech coders and discusses commonly used spectral distortion measures and their relation to the perceptual quality of the speech coding.
Abstract: This paper deals with multi-stage vector quantization of line spectrum pair (LSP) parameters in wideband speech coders and discusses commonly used spectral distortion measures and their relation to the perceptual quality of the speech coding.

Proceedings ArticleDOI
15 Mar 1999
TL;DR: A new analysis-by-synthesis speech coding structure is presented for high-quality speech coding in the 4 to 8 kb/s range and Subjective and objective comparisons reveal significant advantages for GPP-CELP over classical CELP.
Abstract: A new analysis-by-synthesis speech coding structure is presented for high-quality speech coding in the 4 to 8 kb/s range. CELP with generalized pitch prediction (GPP-CELP) differs from classical code-excited linear prediction (CELP) in that for voiced segments it is the speech signal that is decomposed into a component predictable with the aid of the adaptive codebook (ACB) and a nonpredictable aperiodic component, not the LPC residual. The spectrum of the aperiodic component is estimated by linear-prediction analysis. An approximation to the aperiodic component is synthesized from a stochastic codebook of sparse pulse sequences and its spectrum is shaped by the LPC synthesis filter. The ACB contains samples of the past reconstructed signal, low-passed to increase the pitch prediction gain. For voiced segments the new structure yields higher pitch prediction gain and lower linear-prediction gain than classical CELP. Subjective and objective comparisons reveal significant advantages for GPP-CELP over classical CELP.

Journal ArticleDOI
TL;DR: Comparisons with the recent ITU-T G.729 8 kbit/s standard, used in the discontinuous transmission mode, demonstrate that the proposed coder provides an average bit rate reduction of about 20% maintaining the same algorithmic delay and perceptive quality.
Abstract: This letter deals with a variable bit-rate CS-ACELP speech coder based on new algorithms that are robust in the presence of the background noise typical of wireless communications. The coder presents eight operating modes ranging from 0-8 kbit/s with an average bit-rate of about 4 kbit/s. Subjective and objective comparisons with the recent ITU-T G.729 8 kbit/s standard, used in the discontinuous transmission mode, demonstrate that the proposed coder provides an average bit rate reduction of about 20% maintaining the same algorithmic delay and perceptive quality.

Proceedings ArticleDOI
S. Heinen1, M. Adratm, O. Steil, Peter Vary, Wen Xu 
15 Mar 1999
TL;DR: A new 6.1 to 13.3-kb/s speech codec is proposed called variable rate code-excited linear prediction (VR-CELP) for adaptive multi-rate (AMR) transmission over mobile radio channels such as GSM or UMTS to enhance the transmission quality under very poor channel conditions.
Abstract: We propose a new 6.1 to 13.3-kb/s speech codec called variable rate code-excited linear prediction (VR-CELP) for adaptive multi-rate (AMR) transmission over mobile radio channels such as GSM or UMTS. The AMR concept allows to operate with almost wireline speech quality for poor channel conditions and better quality for good channel conditions. This is achieved by dynamically splitting the gross bit rate of the transmission system between source and channel coding according to the current channel conditions. Thus the source coding scheme must be designed for seamless switching between rates without annoying artifacts. To enhance the transmission quality under very poor channel conditions, a new powerful error concealment strategy based on estimation theory is applied.

Proceedings ArticleDOI
20 Jun 1999
TL;DR: MPEG-4 parametric speech coding, harmonic vector excitation coding (HVXC) algorithm, is described, showing that the proposed coding method at 2.0 kbps provides significantly better quality than that of FS1016 CELP at 4.8 kbps.
Abstract: MPEG-4 parametric speech coding, harmonic vector excitation coding (HVXC) algorithm, is described. New features of the coder includes a quantizer scheme capable of generating 2.0 and 4.0 kbps scalable bit-streams, where 2.0 kbps decoding is possible using a subset of 4.0 kbps bit-stream. Time scale modification of speech is also possible without changing pitch nor phoneme for fast and slow playback mode. Listening tests show that the proposed coding method at 2.0 kbps provides significantly better quality than that of FS1016 CELP at 4.8 kbps. In October 1998, the HVXC coder was adopted to the Final Draft International Standard (FDIS) of MPEG-4 standardization.

Book ChapterDOI
01 Dec 1999
TL;DR: This chapter provides a step in studying the interplay among different types of services in a DS-CDMA system by developing a generalized Erlang capacity formulation for voice and generalized data.
Abstract: Publisher Summary The chapter provides a step in studying the interplay among different types of services in a DS-CDMA system. It develops a generalized Erlang capacity formulation for voice and generalized data. The calculations are reduced to using means and variances. As data, but not voice, can be delayed, emphasis is on voice outage probability in anticipation of control strategies for data. One such data control strategy, using related analysis methodology, is studied. The analytic simplicity is achieved using an approximation. This approximation was also used in a somewhat different way, for an Erlang-capacity-type problem for packet CDMA. The chapter illustrates the formulation with a CELP speech coder and fax. The chapter provides the general formulation and gives the source models for voice and fax.

Patent
28 Apr 1999
TL;DR: In this paper, the fixed codebook response is chosen as that portion of the pulse sequence which best matches a residual signal of the input signal, and the indexed location of that portion along the signal sequence is designated as the fixed codedbook bits which are included within the bit frame.
Abstract: A fixed codebook response is able to better characterize an input signal of a vocoder because the entries of the fixed codebook are tailored to the input signal being processed. A uniformly distributed random noise signal is stored in a transmitting vocoder. During encoding by the transmitting vocoder, the noise signal is shaped by a weighing filter and a pitch sharpening filter, which are condition controlled by the linear predictive coding, pitch and pitch gain characteristics of the input signal being encoded. The shaped noise signal is passed though a thresholding filter to arrive at a pulse sequence having a given sparcity. The fixed codebook response is chosen as that portion of the pulse sequence which best matches a residual signal of the input signal. The indexed location of that portion along the pulse sequence is designated as the fixed codebook bits which are included within the bit frame. The identical random noise signal is stored in a receiving vocoder. The linear predictive coding, pitch, and pitch gain characteristics are part of the bit frame, and are again used to produce an identical pulse sequence. The fixed codebook bits of the bit frame are used to index the pulse sequence to the best matching portion, and hence the fixed codebook response for the bit frame.

Proceedings ArticleDOI
15 Mar 1999
TL;DR: The HE-LPC coder has the potential of producing high quality speech at 4.8 kb/s and below and employs a new pitch estimation and voicing technique, and new DCT based LPC and residual amplitude quantization techniques have been developed.
Abstract: The harmonic excitation linear predictive speech coder (HE-LPC) is a technique derived from MBE and MB-LPC type of speech coding algorithms. The HE-LPC coder has the potential of producing high quality speech at 4.8 kb/s and below. This coder employs a new pitch estimation and voicing technique. In addition, new DCT based LPC and residual amplitude quantization techniques have been developed. The 4 kb/s HE-LPC coder with a 14th order LPC filter was found to produce much better speech quality than the various low rate speech coding standards, including 3.6 kb/s INMARSAT Mini-M AMBE vocoder. During formal ITU ACR test, the 4 kb/s HE-LPC vocoder was found to produced equivalent performance to 32 kb/s ADPCM and G.729 for both flat and modified IRS filtered clean input speech conditions. The HE-LPC algorithm can also be extended to cover bit rates between 1.2 and 8 kb/s range depending on the application.