scispace - formally typeset
Search or ask a question

Showing papers on "Code-excited linear prediction published in 1994"


Journal ArticleDOI
01 Jun 1994
TL;DR: Current activity in speech compression is dominated by research and development of a family of techniques commonly described as code-excited linear prediction (CELP) coding, which offer a quality versus bit rate tradeoff that significantly exceeds most prior compression techniques.
Abstract: Speech and audio compression has advanced rapidly in recent years spurred on by cost-effective digital technology and diverse commercial applications. Recent activity in speech compression is dominated by research and development of a family of techniques commonly described as code-excited linear prediction (CELP) coding. These algorithms exploit models of speech production and auditory perception and offer a quality versus bit rate tradeoff that significantly exceeds most prior compression techniques for rates in the range of 4 to 16 kb/s. Techniques have also been emerging in recent years that offer enhanced quality in the neighborhood of 2.4 kb/s over traditional vocoder methods. Wideband audio compression is generally aimed at a quality that is nearly indistinguishable from consumer compact-disc audio. Subband and transform coding methods combined with sophisticated perceptual coding techniques dominate in this arena with nearly transparent quality achieved at bit rates in the neighborhood of 128 kb/s per channel. >

234 citations


PatentDOI
TL;DR: A multi-mode CELP encoding and decoding method and device for digitized speech signals providing improvements over prior art codecs and coding methods by selectively utilizes backward prediction for the short-term predictor parameters and fixed codebook gain of a speech signal.
Abstract: The present invention provides a multi-mode CELP encoding and decoding method and device for digitized speech signals providing improvements over prior art codecs and coding methods by selectively utilizes backward prediction for the short-term predictor parameters and fixed codebook gain of a speech signal. In order to achieve these improvements, the present invention provides a coding method comprising the steps of classifying a segment of the digitized speech signal as one of a plurality of predetermined modes, determining a set of unquantized line spectral frequencies to represent the short term predictor parameters for that segment, and quantizing the determined set of unquantized line spectral frequencies using a mode-specific combination of scalar quantization and vector quantization, which utilizes backward prediction for modes with voiced speech signals. Furthermore, backward prediction is selectively applied to the fixed codebook gain in the modes that are free of transients so that it may be used in the fixed codebook search and fixed codebook gain quantization in those modes.

97 citations


Proceedings ArticleDOI
19 Apr 1994
TL;DR: This paper describes the application of transform coded excitation (TCX) coding to encoding wideband speech and audio signals in the bit rate range of 16 k bits/s to 32 kbits/s and proposes novel quantization procedures including inter-frame prediction in the frequency domain.
Abstract: This paper describes the application of transform coded excitation (TCX) coding to encoding wideband speech and audio signals in the bit rate range of 16 kbits/s to 32 kbits/s. The approach uses a combination of time domain (linear prediction; pitch prediction) and frequency domain (transform coding; dynamic bit allocation) techniques, and utilizes a synthesis model similar to that of linear prediction coders such as CELP. However, at the encoder, the high complexity analysis-by-synthesis technique is bypassed by directly quantizing the so-called target signal in the frequency domain. The innovative excitation is derived at the decoder by inverse filtering the quantized target signal. The algorithm is intended for applications whereby a large number of bits is available for the innovative excitation. The TCX algorithm is utilized to encode wideband speech and audio signals with a 50-7000 Hz bandwidth. Novel quantization procedures including inter-frame prediction in the frequency domain are proposed to encode the target signal. The proposed algorithm achieves very high quality for speech at 16 kbits/s, and for music at 24 kbits/s. >

93 citations


Journal ArticleDOI
TL;DR: This paper describes a linear predictive (LP) speech synthesis procedure that resynthesizes speech using a 6th-order polynomial waveform to model the glottal excitation, to demonstrate that quantization of the glotto excitation waveform does not significantly degrade the quality of speech synthesized with a GELP synthesizer.
Abstract: This paper describes a linear predictive (LP) speech synthesis procedure that resynthesizes speech using a 6th-order polynomial waveform to model the glottal excitation. The coefficients of the polynomial model form a vector that represents the glottal excitation waveform for one pitch period. A glottal excitation code book with 32 entries for voiced excitation is designed and trained using two sentences spoken by different speakers. The purpose for using this approach is to demonstrate that quantization of the glottal excitation waveform does not significantly degrade the quality of speech synthesized with a glottal excitation linear predictive (GELP) synthesizer. This implementation of the LP synthesizer is patterned after both a pitch-excited LP speech synthesizer and a code excited linear predictive (CELP) speech coder. In addition to the glottal excitation codebook, we use a stochastic codebook with 256 entries for unvoiced noise excitation. Analysis techniques are described for constructing both codebooks. The GELP synthesizer, which resynthesizes speech with high quality, provides the speech scientist a simple speech synthesis procedure that uses established analysis techniques, that is able to reproduce all speed sounds, and yet also has an excitation model waveform that is related to the derivative of the glottal flow and the integral of the residue. It is conjectured that the glottal excitation codebook approach could provide a mechanism for quantitatively comparing the differences in glottal excitation codebooks for male and female speakers and for speakers with vocal disorders and for speakers with different voice types such as breathy and vocal fry voices. Conceivably, one could also convert the voice of a speaker with one voice type, e.g., breathy, to the voice of a speaker with another voice type, e.g., vocal fry, by synthesizing speech using the vocal tract LP parameters for the speaker with the breathy voice excited by the glottal excitation codebook trained for vocal fry.

64 citations


Journal ArticleDOI
TL;DR: A pitch predictor exploiting the present interpolation strategy, with an update rate of 50 Hz, provides a subjective speed quality similar to a conventional pitch predictor where the parameters are updated for every pitch cycle.
Abstract: The pitch-predictor contributes greatly to the efficiency of current analysis-by-synthesis speech coders by mapping the past reconstructed signal into the present. However, for good performance, it is required that its parameters are updated often (one every 2.5-7.5 ms). A slower update rate of the pitch-predictor delay results in time misalignment between the original signal and the pitch-predictor contribution to the reconstructed signal and the pitch-predictor contribution to the reconstructed signal. The authors introduce a new procedure, that allows a slow update rate of the pitch-predictor parameters without this problem. In this method the original signal is modified in a closed-loop fashion such that the parameter values obtained by interpolation of open-loop estimates form the optimal encoding of the modified signal. This new paradigm is a generalization of the familiar analysis-by-synthesis principle. The generalized analysis-by-synthesis principle can be used for interpolation of both the pitch-predictor delay and gain. The authors compare, by means of a subjective test, speech signals encoded with different versions of the code-excited linear predictor delay and gain. They compare, by means of a subjective test, speech signals encoded with different versions of the code-excited linear predictor (CELP) coder. The comparison shows that a pitch predictor exploiting the present interpolation strategy, with an update rate of 50 Hz, provides a subjective speed quality similar to a conventional pitch predictor where the parameters are updated for every pitch cycle. >

59 citations


PatentDOI
Juin-Hwey Chen1
TL;DR: Modified perceptual weighting parameters and a novel use of postfiltering greatly improve tandeming of a number of encodings and decodings while retaining high quality reproduction.
Abstract: A code-excited linear-predictive (CELP) coder for speech or audio transmission at compressed (e.g., 16 kb/s) data rates is adapted for low-delay (e.g., less than five ms. per vector) coding by performing spectral analysis of at least a portion of a previous frame of simulated decoded speech to determine a synthesis filter of a much higher order than conventionally used for decoding synthesis and then transmitting only the index for the vector which produces the lowest internal error signal. Modified perceptual weighting parameters and a novel use of postfiltering greatly improve tandeming of a number of encodings and decodings while retaining high quality reproduction.

57 citations


PatentDOI
TL;DR: In this paper, a bit rate Codebook Excited Linear Predictor (CELP) communication system is proposed, which includes a transmitter that organizes a signal containing speech into frames of 40 millisecond duration, and classifies each frame as one of three modes: voiced and stationary, unvoiced or transient, and background noise.
Abstract: A bit rate Codebook Excited Linear Predictor (CELP) communication system which includes a transmitter that organizes a signal containing speech into frames of 40 millisecond duration, and classifies each frame as one of three modes: voiced and stationary, unvoiced or transient, and background noise.

57 citations


Patent
07 Dec 1994
TL;DR: In this paper, an adaptive-stochastic codebook search combination is used to determine when it is desirable to dispense with the adaptive LTP analysis of the target vector and instead use the bits freed up by foregoing the LTP to add another codevector obtained from a second stochastic code book to the modeling process.
Abstract: Methods and apparatus for determining codevectors in response to a speech signal including an adaptive-stochastic codebook search combination. Each stochastic codebook search is made up of BPC and SHC search components (124). The speech signal is used as the input to each of the two possible codebook searches, LTP-CB1 and CB0-CB1. The codebook target vector is computed at (120). The present invention determines when it is desirable to dispense with the adaptive LTP analysis (122) of the target vector and instead use the bits freed up by foregoing the LTP to add another codevector obtained from a second stochastic codebook to the modeling process. A first synthesized speech signal can be determined from the first and second codevectors and a second synthesized speech can be determined from the first and second codewords. The error between the synthesized and the input speech signals is computed (126), concurrently the SHC/BPC search for codebook is performed (128). The resultant vector is searched in the SHC/BPC (124) search for codebook 1 (130).

43 citations


Proceedings ArticleDOI
19 Apr 1994
TL;DR: The MOS subjective test shows that 4.075 kbps M-LCELP synthetic speech quality is high, and that its quality is mostly equivalent to that for an 8 kbps North American full-rate VSELP coder.
Abstract: This paper presents the M-LCELP (multi-mode learned code excited LPC) speech coder, which has been developed for the North American half-rate digital cellular systems. M-LCELP develops the following techniques to achieve high-quality synthetic speech at 4 kbps: (1) Multimode and multi-codebook coding, (2) Pitch lag differential coding with pitch tracking, (3) A two-stage joint design regular-pulse codebook with common phase structure in voiced frames, (4) An efficient vector quantization for LSP parameters, (5) An adaptive MA type comb filter to suppress excitation signal inter-harmonic noise. The MOS subjective test shows that 4.075 kbps M-LCELP synthetic speech quality is high, and that its quality is mostly equivalent to that for an 8 kbps North American full-rate VSELP coder. >

36 citations


Journal ArticleDOI
TL;DR: The advantage of the nonlinear prediction capability of neural networks is exploited and applied to the design of improved predictive speech coders and resulted in a fully vector-quantized, code-excited, nonlinear predictive speech coder.
Abstract: Recent studies have shown that nonlinear predictors can achieve about 2-3 dB improvement in speech prediction over conventional linear predictors. In this paper, we exploit the advantage of the nonlinear prediction capability of neural networks and apply it to the design of improved predictive speech coders. Our studies concentrate on the following three aspects: (a) the development of short-term (formant) and long-term (pitch) nonlinear predictive vector quantizers (b) the analysis of the output variance of the nonlinear predictive filter with respect to the input disturbance (c) the design of nonlinear predictive speech coders. The above studies have resulted in a fully vector-quantized, code-excited, nonlinear predictive speech coder. Performance evaluations and comparisons with linear predictive speech coding are presented. These tests have shown the applicability of nonlinear prediction in speech coding and the improvement in coding performance. >

36 citations


Proceedings ArticleDOI
08 Jun 1994
TL;DR: An adaptive coding system that adjusts the rate allocation according to actual channel conditions and shows that the objective and the subjective speech quality of the adaptive coders are superior than their non-adaptive counterparts.
Abstract: Although the mobile communication channels are time-varying, most systems allocate the combined rate between the speech coder and error correction coder according to a nominal channel condition. This generally leads to a pessimistic design and consequently an inefficient utilization of the available resources, such as bandwidth and power. This paper describes an adaptive coding system that adjusts the rate allocation according to actual channel conditions. Two types of variable rate speech coders are considered : the embedded coders and the multimode coders and both are based on code excited linear prediction (CELP). On the other hand, the variable rate channel coders are based on the rate compatible punctured convolutional codes (RCPC). A channel estimator is used at the receiver to track both the short term and the long term fading condition in the channel. The estimated channel state information is then used to vary the rate allocation between the speech and the channel coder, on a frame by frame basis. This is achieved by sending an appropriate rate adjustment command through a feedback channel. Experimental results show that the objective and the subjective speech quality of the adaptive coders are superior than their non-adaptive counterparts. Improvements of up to 1.35 dB in SEGSNR of the speech signal and up to 0.9 in informal MOS for a combined rate of 12.8 kbit/s have been found. In addition, we found that the multimode coders perform better than their embedded counterparts. >

Proceedings ArticleDOI
28 Nov 1994
TL;DR: In this article, a non-square transform vector quantization (NSTVQ) was proposed to map the variable-dimension harmonic magnitude vector into a fixed-dimension vector.
Abstract: Several techniques for speech coding at rates of 4 kb/s and lower require quantization of spectral magnitudes at a set of frequencies which are harmonics of the fundamental pitch period of the talker (for example: multiband excitation coding, sinusoidal transform coding, and time-frequency interpolation). The number of harmonic magnitudes to be quantized depends on the fundamental frequency value and hence is variable, changing from frame to frame. The variable number of components to be quantized makes it difficult to use fixed-dimension vector quantization for harmonic magnitude encoding. In this paper, we introduce a quantization technique called non-square transform vector quantization (NSTVQ) which uses a fixed-dimension vector quantizer combined with a variable-size non-square transform which maps the variable-dimension harmonic magnitude vector into a fixed-dimension vector. The optimal reconstruction procedure for non-square transforms is derived and shown to be equivalent to an optimal least-square estimation procedure. The proposed technique is evaluated experimentally as part of a new coding system called spectral excitation coding (SEC). The results are compared to an existing technique which estimates the spectral shape using all-pole modeling followed by vector quantization of the LSP parameters.

Proceedings ArticleDOI
08 Jun 1994
TL;DR: The paper summarizes the standardized PSI-CELP algorithm and the techniques used to improve speech quality, to reduce computational complexity, and to reduce memory requirements.
Abstract: A pitch synchronous innovation-CELP (PSI-CELP), proposed by NTT DoCoMo in 1993, is adopted as the Japanese half-rate PDC (personal digital cellular) speech codec standard. This algorithm is based on CELP (code excited linear prediction) with a pitch synchronized excitation source. It uses 3.45 kbit/s out of 5.6 kbit/s for speech coding and the remaining 2.15 kbit/s for error protection. The paper summarizes the standardized PSI-CELP algorithm. The techniques used to improve speech quality, to reduce computational complexity, and to reduce memory requirements are mentioned. A real time operating prototype based on this algorithm is also described. >

01 Jan 1994
TL;DR: This paper presents research and development of a family of techniques commonly described as code-exited linear prediction (CELP) coding, which exploit models of speech production and auditory perception and offer a quality versus bit rate tradeoff that significantly exeeds most prior compression techniques.
Abstract: Speech and audio compression has advanced rapidly in recent years spurred on by cost-effective digital technology and diverse commercial applications. Recent activity in speech compression is dominated by research and development of a family of techniques commonly described as code-exited linear prediction (CELP) coding. These algorithms exploit models of speech production and auditory perception and offer a quality versus bit rate tradeoff that significantly exeeds most prior compression techniques for rates in the range of 4 to 16 kb/s

Proceedings ArticleDOI
19 Apr 1994
TL;DR: This paper proposes high-quality and low bit-rate coders using a pitch synchronous innovation CELP (PSI-CELP) method or a phase adaptive PSI-BELP, which makes not only the periodicity but also the phase of random codevectors equal to those of an adaptive codevector.
Abstract: This paper proposes high-quality and low bit-rate (3.6 and 2.4 kbit/s) coders using a pitch synchronous innovation CELP (PSI-CELP) method or a phase adaptive PSI-CELP. PSI-CELP, which is used as the excitation structure of the half-rate codec for the standard of Japanese digital mobile telephony, is based on CELP but adds pitch synchronous innovation, which means that even random codevectors are adaptively converted to have pitch periodicity for voiced frames. Phase adaptive PSI-CELP makes not only the periodicity, like in PSI-CELP, but also the phase of random codevectors equal to those of an adaptive codevector. The subjective qualities of the 3.6- and 2.4-kbit/s coders exceed those of the 6.7-kbit/S VSELP coder, which is the full-rate codec for the standard of Japanese digital mobile telephony, and-the 4.8-kbit/s U.S. Federal Standard 1016 CELP coder, respectively, in the error-free condition.

Proceedings ArticleDOI
01 May 1994
TL;DR: A variable bit rate speech coder operating below 2 kb/s is presented which combines phonetic classification and frequency domain modeling and adaptively matches a suitable coding scheme to the character of the input speech.
Abstract: A variable bit rate speech coder operating below 2 kb/s is presented which combines phonetic classification and frequency domain modeling. Each input frame is identified as one of the following classes: mixed voiced, fully voiced, unvoiced, noise, and silence. Based on the class parameter, a suitable parametric coding scheme (spectral analysis, modeling, quantization, and synthesis) is selected. The coder thereby adaptively matches a suitable coding scheme to the character of the input speech leading to variable rates ranging from 0.15 kb/s (for silence) to 2.6 kb/s (for mixed voiced). For typical conversational speech, the coder operates at an average rate of 1.4 kb/s while delivering speech quality that is subjectively preferred over Federal Standard 1016 CELP at 4.8 kb/s by a majority of listeners. >

Proceedings ArticleDOI
A. Goalic1, J. Labat1, J. Trubuil1, Samir Saoudi1, D. Rioualen1 
13 Sep 1994
TL;DR: The paper deals with the design of a digital acoustic underwater phone prototype and uses a scheme where synchronization and equalization (FSE+DFE) were jointly optimized to achieve a synthesized speech with a quality close to the telephonic one.
Abstract: The paper deals with the design of a digital acoustic underwater phone prototype. Digital techniques allow to achieve a synthesized speech with a quality close to the telephonic one. The input speech signal is compressed down to 5.45 kbit/s using a CELP coder. The bit rate is 6 kbit/s before channel coding and expected to be about 8 kbits/s after channel coding. A QPSK modulation with differential encoding was chosen to transmit the useful signal. For the receiver the authors use a scheme where synchronization and equalization (FSE+DFE) were jointly optimized. The whole system (unidirectional link) has been implemented on single DSPs (Motorola 56001) and tested successfully in a very difficult environment (IFREMER pool). >

Proceedings ArticleDOI
19 Apr 1994
TL;DR: An algorithm for the coding of wideband (7 kHz) speech signals at 16 kbps using code excited linear prediction (CELP), primarily intended for use in audio-visual coding systems, or other telecommunication equipment using loudspeaker sound.
Abstract: In this paper we present an algorithm for the coding of wideband (7 kHz) speech signals at 16 kbps using code excited linear prediction (CELP), primarily intended for use in audio-visual coding systems (e.g., videotelephony) in the ISDN network, or other telecommunication equipment using loudspeaker sound. The algorithm is a full-band approach and much effort has been put into reducing the computational complexity. This has led to a real-time implementation of the present algorithm on a single TMS320C31 DSP (encoder+decoder). Through a formal subjective evaluation we have demonstrated a performance comparable to the CCITT Rec. G.722 subband coder at 64 kbps. >


Proceedings ArticleDOI
L. Cellario1, Daniele Sereno1, M. Giani, Peter Blöcher, K. Hellwig 
19 Apr 1994
TL;DR: This paper focuses on the design, implementation and testing of a variable rate (VR) CELP codec aimed to be used in the testbed of one RACE-II project: CoDiT (code division testbed).
Abstract: This paper focuses on the design, implementation and testing of a variable rate (VR) CELP codec aimed to be used in the testbed of one RACE-II project: CoDiT (code division testbed). The project has been conceived to demonstrate the potentiality of CDMA for the UMTS (universal mobile telecommunications system). Because of the flexibility permitted by CDMA to easily convey the information stream over a VR physical channel, the fixed-rate constraint has been removed from the speech coding algorithm design, in order to exploit the time-varying local character of speech. One major feature of the proposed algorithm is the possibility for the average rate to be either source-controlled or network-controlled. This is particularly appealing for cellular communications in order to cope with areas or cells with a high time-varying congestion. >

Proceedings ArticleDOI
19 Apr 1994
TL;DR: Subjective testing indicates that the quality of CS-CELP is equivalent to that of the 32-kbit/s ADPCM under error-free conditions for IRS and non-IRS speech and it also operates in real time using fixed-point DSP chips.
Abstract: This paper presents a high-quality 8-kbit/s speech coder (conjugate structure CELP: CS-CELP) that is a candidate for standardization by the ITU-T (formerly CCITT). To achieve high-quality for two types of speech (IRS and non-IRS (flat) speech) and real-time implementation, CS-CELP has been revised by two novel schemes. To handle two types of speech, the LSP parameters are quantized by multistage VQ with fourth-order interframe MA prediction. This scheme has little spectrum distortion, even if the two types of speech have many variations of the LSP parameters. The computational complexity of the implementation is reduced for adaptive and fixed-shape codebooks without degrading the speech quality. Multistage selection is adopted in the adaptive codebook; this selection uses a truncated impulse response. Improved pre-selection is proposed in the fixed-shape codebook. Subjective testing indicates that the quality of CS-CELP is equivalent to that of the 32-kbit/s ADPCM under error-free conditions for IRS and non-IRS speech. It also operates in real time using fixed-point DSP chips. >

PatentDOI
TL;DR: In this paper, a high pass filter is applied to the input signal to remove the pitch information for which the VSELP coder searches, to prevent the CELP-based coder from determining pitches for non-periodic signals.
Abstract: The perception of speech processed by a CELP based coder, such as a VSELP coder, when operating in noisy background conditions is improved by removing swirl artifacts during silence periods. This is done by removing the low frequency components of the input signal when no speech is detected. A speech activity detector distinguishes between a periodic signal, like speech, and a non-periodic signal, like noise by using most of the VSELP coder internal parameters to determine the speech or non-speech conditions. To prevent the VSELP coder from determining pitches for non-periodic signals, a high pass filter is applied to the input signal to remove the pitch information for which the VSELP coder searches.

Proceedings ArticleDOI
28 Nov 1994
TL;DR: A 2.4 kb/s speech coder, based on the multiband excitation (MBE) model and DAP modeling of the speech spectra, which delivers speech quality comparable to two standard higher rate coders.
Abstract: This paper presents a high quality, low bit rate speech coder which applies an effective spectral modeling technique called discrete all-pole (DAP) modeling to efficiently represent speech spectra. The technique provides a fixed-dimension representation of the (pitch-dependent) variable dimension spectral shape vectors which arise in harmonic coders. Consequently, the spectral shapes are quantized more efficiently than with the usual linear prediction modeling, leading to better speech quality. We present a 2.4 kb/s speech coder, based on the multiband excitation (MBE) model and DAP modeling of the speech spectra, which delivers speech quality comparable to two standard higher rate coders: the 4.8 kb/s U.S. Federal Standard 1016 CELP coder and the 4.15 kb/s IMBE coder, adopted as the INMARSAT-M standard for satellite voice communications.

Proceedings ArticleDOI
19 Apr 1994
TL;DR: This paper replaces the weighting filter with an auditory model which enables the search for the optimum stochastic code vector in the psychoacoustic domain and produces speech that is of considerably better quality than obtained with a weighting filters.
Abstract: The dominant technique in present day low bit rate speech coders is based on the use of voice production models in which vocal tract filters are excited by vectors chosen from fixed and adaptive codebooks. It has long been recognized that to improve the perceptual quality of such coders it is necessary to also allow for the psychoacoustic properties of the human ear. The weighting filter traditionally used for this purpose is sub-optimal as it doesn't explicitly evaluate auditory characteristics. In this paper we replace the weighting filter with an auditory model which enables the search for the optimum stochastic code vector in the psychoacoustic domain. The algorithm, which has been termed PERCELP (for perceptually enhanced random codebook excited linear prediction), produces speech that is of considerably better quality than obtained with a weighting filter. The computational overhead is low enough to warrant the use of this approach in new speech coders. >

Proceedings ArticleDOI
02 Oct 1994
TL;DR: A simple method is proposed to reduce the pitch searching time in the pitch filter almost without degradation of quality and its required computations are greatly reduced.
Abstract: The major drawback in the code excited linear prediction (CELP) type vocoders is their large computational requirements. In the present paper a simple method is proposed to reduce the pitch searching time in the pitch filter almost without degradation of quality. Based upon the observational regularity of the correlation function of speech, the searching range can be restricted to the positive side in pitch search. This is done by skipping the negative side with the width which is estimated from the previous positive envelope. In addition to that, the maximum number of available lags can be limited by the threshold, LT, which is set on 58 empirically. So, only the limited numbers of lags are considered in pitch search, which is less than a half of that of the full search method. By using the proposed method in pitch search, its required computations are greatly reduced. Experimental results show 51% time reduction almost without lowering the speech quality in segmental SNR measures. >

Proceedings ArticleDOI
13 Nov 1994
TL;DR: A new method for texture coding which combines 2-D linear prediction and stochastic vector quantization is presented, using an algorithm which takes into account the pixels surrounding the block being encoded.
Abstract: A new method for texture coding which combines 2-D linear prediction and stochastic vector quantization is presented in this paper. To encode a texture, a linear predictor is computed first. Next, a codebook following the prediction error model is generated and the prediction error is encoded with VQ, using an algorithm which takes into account the pixels surrounding the block being encoded. In the decoder, the error image is decoded first and then filtered as a whole, using the prediction filter. Hence, correlation between pixels is not lost from one block to another and a good reproduction quality can be achieved. >

Journal ArticleDOI
TL;DR: The results indicate that the difference between the quality of the reconstructed speech in an error-free channel and that in a Rayleigh fading channel is imperceptible at channel SNRs larger than 20 dB for both the RCPC and the PRS-based codecs.
Abstract: We present a combined speech and channel coding scheme for digital mobile communications. The speech coding algorithm is based on code-excited linear prediction (CELP) and it provides good communication quality speech at a rate of 4 kbps. For the channel code, both rate-compatible punctured convolutional (RCPC) codes and punctured Reed-Solomon (PRS) codes are considered. In the case of RCPC codes, soft decision decoding is considered, in addition to the simpler hard decision decoding. The modulation format chosen in our study is /spl pi//4-DQPSK with differential detection. Unequal error protection is used based on the bit error sensitivities of the different speech parameters. The performance of the combined speech and channel codec is studied under different mobile channel conditions, such as fade rates, signal-to-noise ratios, and interleaving delays. The results indicate that, with no interleaving delay and a large channel signal-to-noise ratio, PRS codes and RCPC codes with soft decision decoding have similar performance, given in terms of the segmental signal-to-noise ratio (SSNR) of the reconstructed speech. At low channel SNR, RCPC codes with soft decision decoding are significantly better than PRS codes. Informal listening tests were also conducted and the results indicate that the difference between the quality of the reconstructed speech in an error-free channel and that in a Rayleigh fading channel is imperceptible at channel SNRs larger than 20 dB for both the RCPC and the PRS-based codecs. >

Proceedings ArticleDOI
19 Apr 1994
TL;DR: A method for locally optimal variable-to-variable length source coding with distortion is introduced, and it is possible to compress the linear predictive coefficients of speech to one-third the rate of entropy-constrained vector quantization of speech, with no increase in spectral distortion.
Abstract: We introduce a method for locally optimal variable-to-variable length source coding with distortion, and apply it to coding the linear predictive coefficients of speech. The method is similar to entropy-constrained vector quantization, but it uses a dynamic programming algorithm to encode. The method automatically discovers variable-length source structure, in this case the acoustic-phonetic structure of speech. Using this structure, it is possible to compress the linear predictive coefficients of speech to one-third the rate of entropy-constrained vector quantization of speech, with no increase in spectral distortion. Auditory tests reveal that using this method, the spectral component of speech can be coded naturally and intelligibly to as low as 50 bits per second. >

Journal ArticleDOI
TL;DR: A novel split vector quantization (SVQ) scheme for low bit rate coding of speech signals is proposed, in which the LPC parameter vector is split into small-dimension subvectors, and each subvector is sequentially quantized according to a multistage structure that resembles a segmented lattice filter.
Abstract: A novel split vector quantization (SVQ) scheme for low bit rate coding of speech signals is proposed. In this scheme, the LPC parameter vector, which is represented by Parcor coefficients, is split into small-dimension subvectors, and each subvector is sequentially quantized according to a multistage structure that resembles a segmented lattice filter. The forward and backward prediction residuals in the segmented filter are coupled across VQ stages. The quantizer in each stage operates on the principle of minimizing the forward and backward prediction error energies similar to linear predictive analysis. Simulation results show that the new split VQ scheme can achieve transparent quantization of LPC parameters at 25 b/frame. >

Patent
09 Jun 1994
TL;DR: In this paper, a method of encoding speech using a fixed-point processor was proposed. But the method treated the signal as floating point, while operating on each sample of the signal, and may not be rapidly executed on a fixed point processor.
Abstract: A method of encoding speech using a fixed-point processor. The method treats the signal as floating point, while operating on each sample of the signal as fixed point. The disclosed method achieves precision similar to that of conventional floating point and may be rapidly executed on a fixed point processor.