scispace - formally typeset
Search or ask a question

Showing papers on "Code-excited linear prediction published in 1995"


Book
01 Feb 1995
TL;DR: A detailed account of the most recently developed digital speech coders designed specifically for use in the evolving communications systems, including an in-depth examination of the important topic of code excited linear prediction (CELP).
Abstract: From the Publisher: A detailed account of the most recently developed digital speech coders designed specifically for use in the evolving communications systems. Discusses the variety of speech coders utilized with such new systems as MBE IMMARSAT-M. Includes an in-depth examination of the important topic of code excited linear prediction (CELP).

453 citations


Journal ArticleDOI
TL;DR: A new mixed excitation LPC vocoder model is presented that preserves the low bit rate of a fully parametric model but adds more free parameters to the excitation signal so that the synthesizer can mimic more characteristics of natural human speech.
Abstract: Traditional pitch-excited linear predictive coding (LPC) vocoders use a fully parametric model to efficiently encode the important information in human speech. These vocoders can produce intelligible speech at low data rates (800-2400 b/s), but they often sound synthetic and generate annoying artifacts such as buzzes, thumps, and tonal noises. These problems increase dramatically if acoustic background noise is present at the speech input. This paper presents a new mixed excitation LPC vocoder model that preserves the low bit rate of a fully parametric model but adds more free parameters to the excitation signal so that the synthesizer can mimic more characteristics of natural human speech. The new model also eliminates the traditional requirement for a binary voicing decision so that the vocoder performs well even in the presence of acoustic background noise. A 2400-b/s LPC vocoder based on this model has been developed and implemented in simulations and in a real-time system. Formal subjective testing of this coder confirms that it produces natural sounding speech even in a difficult noise environment. In fact, diagnostic acceptability measure (DAM) test scores show that the performance of the 2400-b/s mixed excitation LPC vocoder is close to that of the government standard 4800-b/s CELP coder. >

352 citations


PatentDOI
TL;DR: A method for encoding a signal that includes a speech component that is classified in one of at least two modes, based, for example, on pitch stationarity, short-term level gradient or zero crossing rate, is described.
Abstract: A method for encoding a signal that includes a speech component is described. First and second linear prediction windows of a frame are analyzed to generate sets of filter coefficients. First and second pitch analysis windows of the frame are analyzed to generate pitch estimates. The frame is classified in one of at least two modes, e.g. voiced, unvoiced and noise modes, based, for example, on pitch stationarity, short-term level gradient or zero crossing rate. Then the frame is encoded using the filter coefficients and pitch estimates in a particular manner depending upon the mode determination for the frame, preferably employing CELP based encoding algorithms.

282 citations


Patent
Peter Kroon1, Yair Shoham1
07 Jun 1995
TL;DR: In this article, the CS-ACELP decoder generates a speech excitation signal selectively based on output signals from said first and second portions when said decoder fails to receive reliably at least a portion of a current frame of compressed speech information.
Abstract: A CELP speech decoder includes a first portion comprising an adaptive codebook and a second portion comprising a fixed codebook. The CS-ACELP decoder generates a speech excitation signal selectively based on output signals from said first and second portions when said decoder fails to receive reliably at least a portion of a current frame of compressed speech information. The decoder does this by classifying the speech signal to be generated as periodic (voiced) or non-periodic (unvoiced) and then generating an excitation signal based on this classification. If the speech signal is classified as periodic, the excitation signal is generated based on the output signal from the first portion and not on the output signal from the second portion. If the speech signal is classified as non-periodic, the excitation signal is generated based on the output signal from said second portion and not on the output signal from said first portion.

100 citations


Proceedings ArticleDOI
E. Shlomot1
29 Sep 1995
TL;DR: An apparatus and method of quantizing a sequence of input data vectors using delayed decision switched prediction and vector quantization to generate a quantized data vector.
Abstract: An apparatus and method of quantizing a sequence of input data vectors using delayed decision switched prediction and vector quantization. The method has the following steps of operation: (a) predicting a next vector element from said sequence of input data vectors to generate a set of prediction vectors; (b) subtracting the set of prediction vectors from the next vector element to generate a set of prediction error vectors; (c) multi-stage vector quantizing the set of prediction error vectors to generate a set of quantized prediction error vectors with each of the stages having at least one of the tables and local decision means to generate a final quantization error vector according to a predetermined distance measure; (d) selecting one predictor out of the set of predictors from the switched prediction step and selecting, for each of the stages, at least one entry from the set of tables of the vector quantization step according to the predetermined distance measure, generating a quantized data vector.

86 citations


Journal ArticleDOI
TL;DR: The GELP synthesizer, which resynthesizes speech with high quality, provides the speech scientist with a simple speech synthesis procedure that uses established analysis techniques, that is able to reproduce all speech sounds, and yet also has an excitation model waveform that is related to the derivative of the glottal flow and the integral of the residue.

76 citations


Proceedings Article
01 Jan 1995
TL;DR: Though each of these representations provides equivalent information about the LPC spectral envelope, their interpolation performance is found to be different.
Abstract: In this paper, interpolation of linear predictive coding (LPC) parameters in terms of the following representations is investigated: linear prediction coefficient representation, reflection coefficient representation, log-arearatio representation, arc-sine reflection coefficient representation, cepstral coefficient representation, line spectral frequency representation, autocorrelation coefficient representation and impulse response representation. Though each of these representations provides equivalent information about the LPC spectral envelope, their interpolation performance is found to be different. It is shown that the line spectral frequency representation results in the best interpolation performance.

62 citations


Patent
14 Aug 1995
TL;DR: In this paper, a preprocessor recognizes that a given frame has been corrupted and modifies the encoded signal so that the decoding thereof will result in improved coding system performance, based on the decoding process and on a predetermined target signal.
Abstract: A method and apparatus for improving the performance of coding systems in the presence of frame erasures or lost packets. The encoded signal ismodified after transmission but prior to decoding by a decoder preprocessor. Thepreprocessor recognizes that a given frame has been corrupted and modifies the encoded signal so that the decoding thereof will result in improved coding system performance. Specifically, based on the decoding process and on a predetermined target signal, the encoded signal is modified so that the decoding thereof will generate an approximation to the target signal. In a first illustrative embodiment, a CELP speech coder is used and the target signal is an excitation signal comprised of all-zero excitation vectors. In this case, the portion of the corrupted excitation signal indices which identify the corresponding gain factors are set to values which represent a low gain factor. In a second illustrative embodiment, a CELP speech coder is used and the target signal comprises an extrapolation of the excitation signal represented by the encoded signal for one or more previous frames. In this case, the preprocessor encodes the extrapolated excitation signal using the best codebook matches available. In either case, the effect of corrupted frames in the reconstructed speech signal is minimized.

48 citations


Proceedings ArticleDOI
09 May 1995
TL;DR: An efficient coding scheme for linear predictive coding (LPC) residuals is proposed based on harmonic and noise representation, which reduces the high complexity of the direct sinusoidal synthesis method with interpolated magnitudes and phases.
Abstract: An efficient coding scheme for linear predictive coding (LPC) residuals is proposed based on harmonic and noise representation. New features of the scheme include classified vector quantization of the spectral envelope of LPC residuals with a weighted distortion measure. The improvement in performance obtained by classifying codebooks based on a voiced/unvoiced (V/UV) decision is shown. Sequences of the short-term RMS power of the time domain waveforms are also vector quantized and transmitted for unvoiced signals. A fast synthesis algorithm for voiced signals using an FFT is also presented, which reduces the high complexity of the direct sinusoidal synthesis method with interpolated magnitudes and phases. Informal listening tests indicate that, in combination with a known LSP quantization technique, this residual coding scheme provides good communication quality at a total bit rate of less than 2.0 kbps.

41 citations


Patent
24 Mar 1995
TL;DR: In this article, the authors used a CELP system and a conversion coding system to encode a sound at a high compression rate and encode a musical tone with high quality by using an input signal 11 of a sampling frequency fS=24kHz is made a low band signal of fS =16kHz by a converter 221, and it is encoded by a CelP coder 241, and a resultant code C1 is outputted, and the code C 1 is decoded by a decoder 251, and the decoded signal is made the signal of FS= 24
Abstract: PURPOSE: To encode a sound at a high compression rate and to encode a musical tone with high quality by using a CELP system and a conversion coding system. CONSTITUTION: An input signal 11 of a sampling frequency fS=24kHz is made a low band signal of fS=16kHz by a converter 221 , and it is encoded by a CELP coder 241 , and a resultant code C1 is outputted, and the code C1 is decoded by a decoder 251 , and the decoded signal is made the signal of fS=24kHz by a converter 26, and it is subtracted from the input signal 11, and a high band signal and a quantization error signal are coded by a conversion coding coder 242 , and the code C2 is outputted. Only the code C1 , or both of C1 and C2 are decoded to be used.

38 citations


Proceedings ArticleDOI
09 May 1995
TL;DR: It is shown that the improvement of more than 1.8 dB is achieved by the proposed coder over the conventional CELP coder, which is based on mel-cepstral analysis.
Abstract: We propose a CELP coder based on mel-cepstral analysis. In the coder, since the transfer functions of perceptual weighting and postfiltering are defined through mel-cepstral coefficients, the effects of perceptual weighting and postfiltering should fit with the characteristics of the human auditory sensation. We use a basic CELP structure without adaptive codebook, and the subjective speech quality of the proposed coder in terms of the opinion equivalent Q is measured and compared with that of the conventional CELP coder. It is shown that the improvement of more than 1.8 dB is achieved by the proposed coder over the conventional CELP coder.

Patent
28 Mar 1995
TL;DR: In this article, a feature extraction part 66 extracts its feature at every short time frame of the input acoustic signal to control the mode switchover part 65, which divides the input signal into four sub-frames, and obtains the mean power or mean spectrum envelope of respective divided sub-frame, and controls the mode switching part 65 so as to select the coding by the first coding part 12 when the change rate is a prescribed value or above, and by the second coding part 41 under the prescribed values or below.
Abstract: PURPOSE: To efficiently encode an acoustic signal and to obtain a decoded acoustic signal of high quality by encoding the acoustic signal while selecting a coding method suitable for a feature, a characteristic of an input acoustic signal. CONSTITUTION: An input acoustic signal from an input terminal 11 is supplied to either a first coding part (CELP coding part) 12 or a second coding part (TWIN coding part) 41 through a mode switchover part 65. The input acoustic signal is inputted to a feature extraction part 66 also. The feature extraction part 66 extracts its feature at every short time frame of the input acoustic signal to control the mode switchover part 65. That is, the feature extraction part 66 divides e.g. respective short time frames of the input acoustic signal into four sub-frames, and obtains mean power or mean spectrum envelope of respective divided sub-frames, and obtains the change rate of the mean power or the change rate of the mean spectrum envelope, and controls the mode switchover part 65 so as to select the coding by the first coding part 12 when the change rate is a prescribed value or above, and to select the coding by the second coding part 41 when the change rate is the prescribed value or below. COPYRIGHT: (C)1996,JPO

Proceedings ArticleDOI
C.R. Watkins1, Juin-Hwey Chen1
09 May 1995
TL;DR: As the speech quality degradation due to 1% frame erasures ranges from just slightly noticeable in case 1 to almost unnoticeable in case 3, the output speech is still intelligible for frame erasure rates up to 10% or even 20%.
Abstract: We have improved G728 output speech quality for frame erasure channels Three cases are considered: (1) no change to G728, (2) change only the G728 decoder, and (3) change both the encoder and decoder In case 1, we synthesize a bit-stream during erased frames so that the decoder decodes an excitation with low energy or with characteristics similar to the excitation of previous good frames In case 2, the gain-scaled excitation and LPC coefficients are extrapolated, and vital operations of backward LPC and gain adaptations are continued Case 3 adds spectral smoothing and increases bandwidth expansion for the LPC and gain predictors These techniques are quite effective, as the speech quality degradation due to 1% frame erasures ranges from just slightly noticeable in case 1 to almost unnoticeable in case 3 For case 3, the output speech is still intelligible for frame erasure rates up to 10% or even 20%

Journal ArticleDOI
TL;DR: Simulation results show that higher SEGSNR and lower computation complexity can be achieved, and the pitch contour of the synthesized speech is smoother than that produced by conventional CELP coders.
Abstract: This correspondence proposes a new CELP coding method which embeds speech classification in adaptive codebook search. This approach can retain the synthesized speech quality at bit-rates below 4 kb/s. A pitch analyzer is designed to classify each frame by its periodicity, and with a finite-state machine, one of four states is determined. Then the adaptive codebook search scheme is switched according to the state. Simulation results show that higher SEGSNR and lower computation complexity can be achieved, and the pitch contour of the synthesized speech is smoother than that produced by conventional CELP coders. >

Proceedings ArticleDOI
09 May 1995
TL;DR: The low bit rate enhanced multiband excitation or EMBE speech coder adds several important new features including phonetic classification and a naval spectral quantization technique called variable dimension vector quantization (VDVQ) to the basic multibandexcitation vocoder.
Abstract: The low bit rate enhanced multiband excitation or EMBE speech coder adds several important new features including phonetic classification and a naval spectral quantization technique called variable dimension vector quantization (VDVQ) to the basic multiband excitation vocoder. Phonetic classification allows the adaptation of spectral modeling and quantization to the local acoustic-phonetic character of the speech signal, enhancing quality and robustness. The VDVQ scheme quantizes the log-spectrum with relatively few bits while preserving perceptually important features. Both the fixed rate (2.4 kb/s) and the variable rate (1.44 kb/s average) implementations of EMBE deliver speech quality comparable to the 4.8 kb/s Federal Standard 1016 CELP coder and the 4.15 kb/s Inmarsat-M standard IMBE coder.

Patent
Goertz Udo1
09 Nov 1995
TL;DR: In this paper, a hybrid stochastic codebook search technique including use of regular pulse excitation codebooks is described for a CELP-type speech codec based on a hybrid codebook.
Abstract: A new scheme to generate the stochastic excitation for a CELP-type speech codec based upon a hybrid stochastic codebook search technique including use of regular pulse excitation codebooks is described. From the ideal RPE sequence the position of the first nonzero pulse and the position of the pulse with maximum amount as well as the overall sign of the RPE sequence are determined. The corresponding target vectors and pulse responses of the synthesis filter are stored in databases belonging to the positions of the maximum pulse, respectively. These databases are used to derive the stochastic codebook via the so-called LBG-algorithm. Once the codebook has become available, the position of the maximum pulse serves as pre-selection measure to limit the search for the "best" candidate vector to a "small" subset of the stochastic codebook.

Journal ArticleDOI
TL;DR: The design of a speech coder called pitch synchronous innovation CELP (PSI-CELP) for low hit-rate mobile communications is described, which has been chosen as the Japanese half-rate speech codec for digital cellular telecommunications.
Abstract: This paper describes the design of a speech coder called pitch synchronous innovation CELP (PSI-CELP) for low hit-rate mobile communications. PSI-CELP is based on CELP, but has more adaptive excitation structures. In voiced frames, instead of conventional random excitation vectors, PSI-CELP converts even the random excitation vectors to have pitch periodicity by repeating stored random vectors as well as by using an adaptive codebook, in silent, unvoiced, and transient frames, the coder stops using the adaptive codebook and switches to fixed random codebooks. The PSI-CELP coder also implements novel structures and techniques: an FIR-type perceptual weighting filter using unquantized LPC parameters, a random codebook with a conjugate structure trained to be robust against channel errors, codebook search with delayed decision, a gain quantization with sloped amplitude, and a moving average prediction coding of LSP parameters, Our speech coder is implemented by DSP chips. Its coded speech quality at 3.6 kb/s with 2.0 kb/s redundancy is comparable to that of the Japanese full-rate VSELP coder at 6.7 kb/s with 4.5 kb/s redundancy. The basic structure of this PSI-CELP coder has been chosen as the Japanese half-rate speech codec for digital cellular telecommunications. >

Patent
Juin-Hwey Chen1
29 Nov 1995
TL;DR: In this paper, a low-delay pitch parameter derivation and quantization method was proposed for low-bitrate CELP with low delay, which is a fraction of prior coding delays for equivalent speech quality.
Abstract: A highly efficient, low delay pitch parameter derivation and quantization permits overall delay which is a fraction of prior coding delays for equivalent speech quality at low bitrates. In distinguishing between pitch period information for voiced and non-voiced frames of input signals, non-voiced frames are assigned a non-zero "bias" value, while voiced frames have associated with them generated pitch information based on an analysis of signals in a present frame and comparison with signals relating to the pitch in a prior frame. Transitions from non-voiced to voiced input frames are efficiently accomplished using a non-uniform quantization method based on an analysis of a sequence of frames. Typical uses include low delay, low-bitrate coders such as Code Excited Linear Prediction (CELP).

Book
18 Dec 1995
TL;DR: DSPLAB: The DSP Laboratory Software Waveform Coding with Fixed Prediction and Pitch-excited Linear Predictive Vocoder.
Abstract: DSPLAB: The DSP Laboratory Software. Quantization: PCM and APCM. Waveform Coding with Fixed Prediction. Pitch-excited Linear Predictive Vocoder. Waveform Coding with Adaptive Prediction. Analysis-by-Synthesis LPC. Subband Coding. Projects. Appendices. Bibliography. Index.

Patent
17 Apr 1995
TL;DR: In this paper, a method of encoding a signal containing speech is employed in a bit rate Codebook Excited Linear Predictor (CELP) communication system, which includes a transmitter that organizes a signal-containing speech into frames of 40 millisecond duration, and classifies each frame as one of three modes: voiced and stationary, unvoiced or transient, and background noise.
Abstract: A method of encoding a signal containing speech is employed in a bit rate Codebook Excited Linear Predictor (CELP) communication system. The system includes a transmitter that organizes a signal containing speech into frames of 40 millisecond duration, and classifies each frame as one of three modes: voiced and stationary, unvoiced or transient, and background noise.

Patent
27 Jun 1995
TL;DR: In this paper, a line spectral frequency (LSF) vector quantizer is proposed for CELP speech encoders, which employs a minimum number of bits, is of moderate complexity and incorporates built-in error detection capability.
Abstract: A line spectral frequency (LSF) vector quantizer, having particular application in digital cellular networks (DCN), is provided for code excited linear predictive (CELP) speech encoders. The LSF vector quantizer is efficient in terms of bits employed, robust and effective in terms of performance across speakers and handsets, moderate in terms of complexity, and accommodates effective and simple built-in transmission error detection schemes. The LSF vector quantizer employs a minimum number of bits, is of moderate complexity and incorporates built-in error detection capability in order to combat transmission errors. The LSF vector quantizer classifies unquantized line spectral frequencies into four categories, employing different vector quantization tables for each category. Each quantization table is optimized for particular types of vectors. For each category, three split vector codebooks are used with a simplified error measure to find three candidate split quantized vectors. The three sets of three split vectors are combined to produce as many as 27 vectors from each category. The quantizer then makes a final selection of optimal category using a more complex error measure to achieve the robust performance across speakers and handsets. Split vector quantization follows a two stage constrained search procedure that results in an ordered set of quantized line spectral frequencies that is "close" to the unquantized set with moderate complexity within each category. Effective and simple transmission error detection schemes at the receiver are made possible by the split nature of the vector quantization and the constrained search procedure. Only twenty-six bits are required to encode ten line spectral frequencies.

Proceedings ArticleDOI
Juin-Hwey Chen1
09 May 1995
TL;DR: A low-complexity CELP (LC-CELP) coder with a complexity as low as 3 MIPS, which achieved slightly higher mean opinion stores than the CCITT 32 kb/s ADPCM and exhibits good performance when tandemed with itself or transcoded with other coders.
Abstract: We present a 16 kb/s CELP coder with a complexity as low as 3 MIPS. The main thrust is to reduce the complexity as much as possible while maintaining toll-quality. This low-complexity CELP (LC-CELP) coder has the following features: (1) fast LPC quantization, (2) 3-tap pitch prediction with efficient open-loop pitch search and predictor tap quantization, (3) backward-adaptive excitation gain, and (4) a trained excitation codebook with a small vector dimension and a small codebook size. Most CELP coders require one full DSP or even two DSP chips to implement in real-time. In contrast, 3 to 6 full-duplex LC-CELP coders can fit into a single DSP chip, since each takes only around 3 MIPS to implement. This coder achieved slightly higher mean opinion stores (MOS) than the CCITT 32 kb/s ADPCM. It also exhibits good performance when tandemed with itself or transcoded with other coders.

Proceedings ArticleDOI
18 Jun 1995
TL;DR: A variable-rate CELP architecture wherein cues for rate variation are derived from subband measures of spectral flatness using the entropy functional is proposed, which can achieve an average rate of below 2000 bits/sec while maintaining communications quality in the encoded speech.
Abstract: With the standardization of Qualcomm's QCELP and the deployment of various digital multiple-access networks, the implementation of variable-rate speech coding schemes has become an area of significant interest. Code-excited linear prediction (CELP) is the predominant coding methodology for communications quality speech coding below 8 kbps, and several variable-rate CELP schemes have been discussed in the literature. We propose a variable-rate CELP architecture wherein cues for rate variation are derived from subband measures of spectral flatness using the entropy functional. We also discuss a variable-rate coding scheme for multiple-tap pitch filters. Using reasonable assumptions about voice activity and instantaneous speech bandwidths, our coder can achieve an average rate of below 2000 bits/sec while maintaining communications quality in the encoded speech.

Proceedings ArticleDOI
09 May 1995
TL;DR: The code excited linear prediction coder (CELP) makes it possible to synthesize good quality speech at low bit rates because speech quality mainly depends on spectral envelope design accuracy.
Abstract: The code excited linear prediction coder (CELP) makes it possible to synthesize good quality speech at low bit rates. In such a case, speech quality mainly depends on spectral envelope design accuracy. Different kinds of parameters belonging to the parametrical domain (linear prediction coefficients.

Patent
19 Dec 1995
TL;DR: In this paper, α-parameters are converted by an αparameter to LSP converting circuit 13 into linear spectral pair (LSP) parameters and a vector of these LSP parameters is vector-quantized by a quantizer.
Abstract: Foe executing the code excitation linear prediction (CELP) coding, for example, α-parameters are taken out from the input speech signal by a linear prediction coding (LPC) analysis circuit 12. The α-parameters are then converted by an α-parameter to LSP converting circuit 13 into linear spectral pair (LSP) parameters and a vector of these line spectral pair (LSP) parameters is vector-quantized by a quantizer 14. The changeover switch 16 is controlled depending upon the pitch value detected by a pitch detection circuit 22 for selecting and using one of the codebook 15M for male voice and the codebook 15F for female voice for improving quantization characteristics without increasing the transmission bit rate.

Proceedings ArticleDOI
09 May 1995
TL;DR: Objective and subjective quality evaluations of the recovery system applied to the LD-CELP G.728 standard and a variable rate CELP system for random and burst frame erasures are presented, indicating that the system is robust up to a frame erasure rate of 10%.
Abstract: A common aspect of speech transmission through packetised networks is the need to consider the discarded (missing) packets as a result of error detection or network overload. The missing packets and the possible mistracking that results in the speech decoder lead to significant quality degradation. We introduce a packet recovery technique for CELP based speech coders. The proposed technique extrapolates independently the excitation signal and the short-term synthesis filter. A recovery strategy based on speech classification (voiced, unvoiced, transition, silence) is discussed. The extrapolation of the short-term filter uses a least-squares fading memory polynomial filter applied to the reflection coefficients. Objective and subjective quality evaluations of the recovery system applied to the LD-CELP G.728 standard and a variable rate CELP system for random and burst frame erasures are presented. The results indicate that the system is robust up to a frame erasure rate of 10%. Very little degradation in quality was observed at erasure rates up to 3%.

Journal ArticleDOI
TL;DR: The proposed algorithm may be combined with an LP-based speech coder which uses a noise-like excitation to reproduce unvoiced speech and fast computation is made possible while high-quality speech can be obtained at bit rate of about 3 kb/s.
Abstract: Techniques for coding voiced speech at very low bit rates are investigated and a new algorithm, designed to produce high quality speech with low complexity, is proposed. This algorithm encodes and transmits partial representative waveforms (RWs) from which the complete speech waveforms are reconstructed by using a method called forward-backward waveform prediction (FBWP). The RW is encoded at 20-30 ms intervals with a low complexity approach, taking into account the special initial conditions of short- and long-term filters. The basic idea of FBWP is essentially consistent with that of the prototype waveform interpolation (PWI) algorithm, which was reported to be capable of producing high-quality voiced speech at a bit rate of between 3.0 and 4.0 kb/s. By implementing the FBWP in the time domain, fast computation is thereby made possible while high-quality speech can be obtained at bit rate of about 3 kb/s. As in the PWI method, the proposed algorithm may be combined with an LP-based speech coder which uses a noise-like excitation to reproduce unvoiced speech. >

Proceedings ArticleDOI
09 May 1995
TL;DR: Subjective tests indicated that this codebook improves speech quality compared with the conventional trained codebook for noisy speech and the MOS showed that the quality of improved CS-CELP is equivalent to that of the 32-kbit/s ADPCM for clean speech.
Abstract: A high-quality 8-kbit/s speech coder based on conjugate structure CELP (CS-CELP) is proposed that uses a trained sparse conjugate codebook. The trained sparse conjugate codebook improves speech quality for noisy speech. This codebook consists of two sub-codebooks and each sub-codebook consists of a random component and a trained component. Each component has excitation vectors consisting of a few pulses. In the random component, pulse position and amplitude are determined randomly. The trained component is determined by training. Subjective tests (differential mean opinion score, DMOS and mean opinion score, MOS) indicated that this codebook improves speech quality compared with the conventional trained codebook for noisy speech. The MOS showed that the quality of improved CS-CELP is equivalent to that of the 32-kbit/s ADPCM for clean speech.

Proceedings ArticleDOI
09 May 1995
TL;DR: A coder that achieves near transparent wideband speech coding by parameterising the prediction residual through the use of multiple codebooks and synthetic glottal pulses coupled with adaptive bit allocation is proposed.
Abstract: We propose a coder that achieves near transparent wideband speech coding by parameterising the prediction residual through the use of multiple codebooks and synthetic glottal pulses coupled with adaptive bit allocation. The use of synthetic glottal pulses improves the performance of the coder compared to a previous coder using a single impulse without increasing the bit rate. This multiple codebook approach results in a coder operating at 16 kb/s and 24 kb/s that provides comparable speech quality to the CCITT G.722 coder operating at 64 kb/s.

Proceedings ArticleDOI
Juin-Huey Chen1
20 Sep 1995
TL;DR: A transform coder based on the modified discrete cosine transform, LPC spectral fit, and pitch harmonic fit and a Predictive Coding algorithm that uses short-term and long-term prediction to remove the redundancy in speech.
Abstract: Wideband speech coding has been gaining popularity in recent years. Due to its higher sampling rate, wideband speech inherently requires higher processing power than telephone-bandwidth speech when the same coding algorithm is used. However, today many wideband coding applications demand a complexity that is even lower than that of most state-of-the-art telephone-bandwidth coders. To meet this complexity challenge, we created two wideband coders that are fundamentally different from and simpler than popular narrowband coders. The first one is a transform coder based on the modified discrete cosine transform (MDCT), LPC spectral fit, and pitch harmonic fit. The second algorithm is called UTransform Predictive Coding”, or TPC. It uses short-term and long-term prediction to remove the redundancy in speech. The prediction residual is quantized in the frequency domain based on a calculated noise masking threshold. In its current form, the TPC coder uses only open-loop quantization and therefore has a low complexity. We estimate that its complexity is only about half of that of the ITU-T 16 kb/s G.728 LD-CELP narrowband coder. The speech quality of TPC is almost transparent at 32 kb/s, very good at 24 kb/s, and acceptable at 16 kb/s. Work is in progress to further improve the speech quality at 16 kb/s.