scispace - formally typeset
Search or ask a question

Showing papers on "Code-excited linear prediction published in 1996"


Proceedings ArticleDOI
07 May 1996
TL;DR: The enhanced MELP speech coder is described, which is a candidate for the new U.S. Federal Standard at 2.4 kbits/s and has been optimized for performance in acoustic background noise and in channel errors, as well as for efficient real-time implementation.
Abstract: This paper describes our enhanced mixed excitation linear prediction (MELP) speech coder which is a candidate for the new U.S. Federal Standard at 2.4 kbits/s. The new coder is based on the MELP model, and it uses a number of enhancements as well as efficient quantization algorithms to improve performance while maintaining a low bit rate. In addition, the coder has been optimized for performance in acoustic background noise and in channel errors, as well as for efficient real-time implementation. Listening tests confirm that the enhanced 2.4 kbit/s MELP coder performs as well as the higher bit rate 4.8 kbit/s FS1016 CELP standard.

169 citations


Journal ArticleDOI
TL;DR: The objective is to design efficient coding/decoding schemes for the transmission of the CELP line spectral parameters (LSPs) over very noisy channels by quantifying the amount of "residual redundancy" inherent in the LSPs of Federal Standard 1016 CELF.
Abstract: We consider the problem of reliably transmitting CELP-encoded speech over noisy communication channels. Our objective is to design efficient coding/decoding schemes for the transmission of the CELP line spectral parameters (LSPs) over very noisy channels. We begin by quantifying the amount of "residual redundancy" inherent in the LSPs of Federal Standard 1016 CELP. This is done by modeling the LSPs as first- and second-order Markov chains. Two models for LSP generation are proposed; the first model characterizes the intraframe correlation exhibited by the LSPs, while the second model captures both intraframe and interframe correlation. By comparing the entropy rates of the models thus constructed with the CELP rates, it is shown that as many as one-third of the LSP bits in every frame of speech are redundant. We next consider methods by which this residual redundancy can be exploited by an appropriately designed channel decoder. Before transmission, the LSPs are encoded with a forward error control (FEC) code; we consider both block (Reed-Solomon) codes and convolutional codes. Soft-decision decoders that exploit the residual redundancy in the LSPs are implemented assuming additive white Gaussian noise (AWGN) and independent Rayleigh fading environments. Simulation results employing binary phase-shift keying (BPSK) indicate coding gains of 2-5 dB over soft-decision decoders that do not exploit the residual redundancy.

139 citations


PatentDOI
TL;DR: In this article, a method of digitally compressing speech and music by use of multiple band fixed excitations stored in codebooks was proposed, along with a coupling method for interconnecting the excitation codebooks and adaptive codebooks, and for generating the composite excitation signal.
Abstract: A method of digitally compressing speech and music by use of multiple band ("multiband") fixed excitations stored in codebooks. The use of multiband fixed excitations, along with a coupling method for interconnecting the excitation codebooks and adaptive codebooks and for generating the composite excitation signal, improve the long-term and short-term prediction, and the use of voice-music classification allows the coding structure to be adapted to the statistical character of the audio signal.

90 citations


Patent
25 Oct 1996
TL;DR: In this article, an encoding unit for CELP encoding with a noise codebook memory containing codebook vectors generated by clipping Gaussian noise and learned using the code vectors obtained by learning using the Gaussian noises as initial values.
Abstract: An encoding apparatus in which an input speech signal is divided into blocks and encoded in units of blocks. The encoding apparatus includes an encoding unit for performing CELP encoding having a noise codebook memory containing having codebook vectors generated by clipping Gaussian noise and codebook vectors obtained by learning using the code vectors generated by clipping the Gaussian noise as initial values. The encoding apparatus enables optimum encoding for a variety of speech configurations.

43 citations


Patent
Claude Lamblin1
04 Jan 1996
TL;DR: In this paper, the authors used the technique of CELP coding with algebraic codebook to find the excitation of the pulses p and q in the codebook using a compound filter made up of synthesis filters and perceptual weighting filter.
Abstract: The method uses the technique of CELP coding with algebraic codebook The search for the CELP excitation includes a calculation of certain components of the covariance matrix U=H T ·H where H denotes a lower triangular Toeplitz matrix formed on the basis of the impulse response of a compound filter made up of synthesis filters and of a perceptual weighting filter The memory-stored components of the covariance matrix are only those of the form U(pos i ,p,pos i ,p) and those of the form U(pos i ,p, pos j ,q), pos i ,p and pos j ,q respectively denoting position i and position j for the pulses p and q in the codes of the algebraic codebook

42 citations


Proceedings ArticleDOI
07 May 1996
TL;DR: A novel multi-pulse excitation signal quantization method is proposed, where the pulse amplitudes are vector-quantized (VQ), which remarkably enhances the performance and drastically reduces the position search complexity.
Abstract: This paper proposes a speech codec, named MP-CELP (multi-pulse-based CELP), with a 10 msec frame length, which has been developed for the GSM EFR (enhanced full-rate) codec standardization. A novel multi-pulse excitation signal quantization method is proposed, where the pulse amplitudes are vector-quantized (VQ). The combination search of the pulse position and the amplitude VQ remarkably enhances the performance. By restricting the pulse positions based on the algebraic-type structure, the search complexity and the bits are reduced. The divided pulse position search drastically reduces the position search complexity. The speech quality for MP-CELP is higher than that for G.728 LD-CELP. MP-CELP also satisfies all the speech quality requirements of the GSM EFR standardization except for the background noise condition.

36 citations


Proceedings ArticleDOI
28 Apr 1996
TL;DR: This paper presents a multi-mode variable rate speech coder based on the CELP algorithm that got the highest overall score on the tests in speech quality and average data rate and proposed this coder as a candidate for a new speech service standard of the North American CDMA digital cellular system IS-95.
Abstract: This paper presents a multi-mode variable rate speech coder based on the CELP algorithm. The coder operates at a rate of 8.5 kbps, 4 kbps or 0.8 kbps with a 20 ms frame. The coder consists of five coding modes applied to distinct speech features. One out of the five coding modes is selected for each frame by using a mode selector which comprises a neural network and a speech power variation detector. To improve the coding performance, an inter-frame predictive LSP quantizer and a coding strategy for speech onsets are utilized. In low bit-rate speech coding, decoded speech quality is severely degraded in high background noise. A noise suppressor based on the spectral subtraction algorithm is also introduced in order to reduce background noises. We proposed this coder as a candidate for a new speech service standard of the North American CDMA digital cellular system IS-95. As a result of the first evaluation conducted by the Telecommunications Industry Association, the coder got the highest overall score on the tests in speech quality and average data rate.

33 citations


Patent
Kazunori Ozawa1
29 Feb 1996
TL;DR: In this paper, a quantization unit quantizes the spectral parameters of at least one subframe by switching between a plurality of quantization code books to obtain quantized spectral parameters.
Abstract: A voice coder system is capable of coding speech at low bit rates with high speech quality. Speech signals are divided into frames and further divided into subframes. A spectral parameter calculator calculates spectral parameters representing a spectral characteristic of the speech signals in at least one subframe. A quantization unit quantizes the spectral parameters of at least one subframe by switching between a plurality of quantization code books to obtain quantized spectral parameters. A mode classifier includes means for calculating a degree of pitch periodicity based on pitch prediction distortions and determines one of a plurality of modes for each frame using the degree of pitch periodicity. A weighting part weights perceptual weights to the speech signals depending on the spectral parameters obtained in the spectral parameter calculator to obtain weighted signals. An adaptive code book obtains a set of pitch parameters representing pitch periods of the speech signals in a predetermined mode by using the determined mode, the spectral parameters, the quantized spectral parameters, and the weighted signals. An excitation quantization unit searches a plurality of stages of excitation code books and gain code books by using the spectral parameters, the quantized spectral parameters, the weighted signals and the pitch parameters to obtain quantized excitation signals of the speech signals and is able to switch between a plurality of excitation code books and a plurality of gain code books based on the mode determined by the mode classifier.

33 citations


Patent
22 Aug 1996
TL;DR: In this paper, a new autocorrelation matrix based on the combination of the autocorerelation matrix of the current frame and that of a past period determined to be a noise is proposed.
Abstract: For the CELP (Code Excited Linear Prediction) coding of an input audio signal, an autocorrelation matrix, a speech/noise decision signal and a vocal tract prediction coefficient are fed to an adjusting section. In response, the adjusting section computes a new autocorrelation matrix based on the combination of the autocorrelation matrix of the current frame and that of a past period determined to be a noise. The new autocorrelation matrix is fed to an LPC (Linear Prediction Coding) analyzing section. The analyzing section computes a vocal tract prediction coefficient based on the autocorrelation matrix and delivers it to a prediction gain computing section. At the same time, in response to the above new autocorrelation matrix, the analyzing section computes an optimal vocal tract prediction coefficient by correcting the vocal tract prediction coefficient. The optimal vocal tract prediction coefficient is fed to a synthesis filter.

32 citations


Proceedings ArticleDOI
07 May 1996
TL;DR: A split-band encoding scheme for 16 kbit/s wideband speech coding (50-7000 Hz), using 2 unequal subbands from 0-6 kHz and from 6-7 kHz, which was motivated by an experimental evaluation of the signal bandwidth of speech frames.
Abstract: We propose a split-band encoding scheme for 16 kbit/s wideband speech coding (50-7000 Hz), using 2 unequal subbands from 0-6 kHz and from 6-7 kHz. This approach was motivated by an experimental evaluation of the signal bandwidth of speech frames. The higher subband is simply represented by white noise with adjustment of the short term energy. For the lower subband code-excited linear prediction (CELP) is used. The analysis filter bank, which performs the unequal band splitting combined with critical subsampling of the sub-bands, is described. A bit error concealment technique and the bit allocation is also presented. By informal listening tests the speech quality was rated higher than the speech quality of the CCITT G.722 wideband codec operating at 48 kbit/s.

31 citations


PatentDOI
TL;DR: A speech encoding method and apparatus in which an input speech signal is divided in terms of blocks or frames as encoding units and encoded in termsof the encoding units, whereby explosive and fricative consonants can be impeccably reproduced.
Abstract: A speech encoding method and apparatus in which an input speech signal is divided in terms of blocks or frames as encoding units and encoded in terms of the encoding units, whereby explosive and fricative consonants can be impeccably reproduced, while there is an attenuation of the occurrence of foreign sounds being generated at a transient portion between voiced (V) and unvoiced (UV) portions, so that the speech with high clarity devoid of “stuffed” feeling may be produced. The encoding apparatus includes a first encoding unit for finding residuals of linear predictive coding (LPC) of an input speech signal for performing harmonic coding and a second encoding unit for encoding the input speech signal by waveform coding. The first encoding unit and the second encoding unit are used for encoding a voiced (V) portion and an unvoiced (UV) portion of the input signal, respectively. Code excited linear prediction (CELP) encoding employing vector quantization by a closed loop search of an optimum vector using an analysis-by-synthesis method is used for the second encoding unit. A corresponding decoding method and apparatus is also provided.

Journal ArticleDOI
TL;DR: Subjective testing indicates that the quality of this coder is equivalent to that of 32-kb/s adaptive differential pulse code modulation (ADPCM) under error-free conditions, and testing has further demonstrated that the coding is robust against random bit errors.
Abstract: This paper describes a high-quality 8-kb/s speech coder called conjugate structure code-excited linear prediction (CS-CELP) with a 10-ms frame length. To provide a short delay and high quality under both error-free and channel error conditions, it uses three new schemes: line spectrum pair (LSP) quantization using interframe prediction, preselection in the codebook search, and gain vector quantization (VQ) with backward prediction. The LSP parameters are quantized by using multistage VQ with moving-average (MA) prediction. This scheme can operate efficiently with various frequency responses of speech. The preselection of the codebook reduces the computational complexity and improves the robustness to channel errors. The gain VQ with backward prediction can provide a high quality and robustness without transmission of input speech power information. A conjugate structure for both random codebook and gain codebook is introduced to improve the ability to handle random bit errors and to reduce codebook storage memory requirements. Subjective testing indicates that the quality of this coder is equivalent to that of 32-kb/s adaptive differential pulse code modulation (ADPCM) under error-free conditions. Testing has further demonstrated that the coder is robust against random bit errors.

Patent
Mitsuo Fujimoto1
20 May 1996
TL;DR: A speech coder using a pitch synchronous innovation code excited linear prediction (PSI-CELP) speech coding system is described in this paper, which is capable of representing a portion which is not sufficiently represented by an adaptive codebook in a periodic portion of input speech and capable of improving the quality of reproduced speech.
Abstract: A speech coder using a pitch synchronous innovation code excited linear prediction (PSI-CELP) speech coding system. The speech coder is capable of representing a portion which is not sufficiently represented by an adaptive codebook in a periodic portion of input speech and capable of improving the quality of reproduced speech. The periodicity corresponds to the pitch cycle of input speech by preliminarily reproducing speech from simple impulse trains. The speech coder depending on the particular embodiment includes an adaptive code book, a fixed code book, a noise code book, and a pulse codebook. A pulse code book stores a plurality of types of codevectors corresponding to pitch waveforms of voiced sounds. At the time of coding input speech, the pulse code book is searched.

Proceedings ArticleDOI
18 Jun 1996
TL;DR: The combination of the reconstruction method with adaptive speech coders showed virtually the same good results for forward adaptation, whereas a higher degradation is caused by backward-adaptive coders.
Abstract: A new reconstruction method for frame erasures in speech transmission is presented which is based on parameterization of the speech signal by means of linear prediction (LPC) and voicing analysis. The problem of generating partially voiced substitute speech signals is solved by performing separate voicing decisions in sub-bands. The method yields considerable improvements compared with silence substitution for frame erasure ratios of up to 10% or even 20%. The combination of the reconstruction method with adaptive speech coders showed virtually the same good results for forward adaptation, whereas a higher degradation is caused by backward-adaptive coders.

PatentDOI
TL;DR: A speech compressor utilizing Trellis Encoding and Linear Prediction (TELP), which provides improved signal generation and search technique for a code-excited linear prediction (CELP) speech encoder.
Abstract: A speech compressor utilizing Trellis Encoding and Linear Prediction (TELP). A TELP speech compressor provides improved signal generation and search technique for a code-excited linear prediction (CELP) speech encoder. TELP is a frame oriented coding that breaks the quantized speech signals into frames of prescribed length N and each frame into subframes of prescribed length L, which are processed as dependent units utilizing an analysis-by-synthesis approach. The approach is based on constructing the best mean square linear predicting filter and searching the best exciting sequence for the filter in order to produce synthesized speech. A trellis encoder is used instead of a stochastic code book. The Q-ary analysis of a given subframe and previous excitations is proposed for a fast vector search in an adaptive code book. It simplifies the implementation of digital speech compression.

Patent
Keiichi Funaki1
01 Apr 1996
TL;DR: In this paper, a voice coder has an LPC (linear prediction coding) analyzer, a parameter quantizer for quantizing the LPC coefficients to output a quantized code CL, an adaptive codebook, a long-term predicting circuit for searching the codebook to determine a delay code CD and an adaptive vector, an excitation codebook and a gain codebook searching circuit for determining an optimum quantised code CS.
Abstract: A voice coder for coding a speech signal at a low bit rate with high speech quality and improved efficiency for gain quantization according to code-excited linear prediction (CELP) coding. The voice coder has an LPC (linear prediction coding) analyzer for calculating LPC coefficients, a parameter quantizer for quantizing the LPC coefficients to output a quantized code CL, an adaptive codebook, a long-term predicting circuit for searching the adaptive codebook to determine a delay code CD and an adaptive code vector, an excitation codebook, an excitation codebook searching circuit for determining an optimum quantized code CS and an excitation vector, and a gain codebook searching circuit for outputting a gain code CG by determining quantized gains representing quantized vectors of gains of the adaptive code vector and the excitation vector. The gain codebook searching circuit has a plurality of gain codebooks each for storing quantized gains corresponding to one of searching ranges divided by predetermined ranges with respect to the value of a searching parameter, and gain codebook selector for selecting one of the gain codebooks depending on the value of the searching parameter. The gain code CG is determined by using the gain codebook selected by the gain codebook selector.

Proceedings ArticleDOI
03 Oct 1996
TL;DR: The subjective performance test indicates that the quality of the proposed CELP coder is about 2 dB higher than that of the conventional one and the spectrum represented by mel-generalized cepstrum has frequency resolution similar to that of human ear.
Abstract: This paper presents a CELP speech coding system based on mel-generalized cepstral analysis. In the mel-generalized cepstral analysis, we can vary the model spectrum continuously from AR to cepstral modeling by changing the value of a parameter /spl gamma/ and we can choose an appropriate model spectrum. Furthermore, the spectrum represented by mel-generalized cepstrum has frequency resolution similar to that of human ear. Since the perceptual weighting and postfiltering are carried out through the mel-generalized cepstrum, we expect the perceptual performance of the proposed coder to be improved. The subjective performance test indicates that the quality of the proposed CELP coder is about 2 dB higher than that of the conventional one.

Proceedings ArticleDOI
03 Oct 1996
TL;DR: The approach is to reduce the hoarse voice in CELP-coded speech by enhancing the pitch periodicity in the reproduction signal and also to reduced the muffing characteristics of narrowband speech by regenerating the highband components of speech spectra from the reproduction Signal.
Abstract: In this paper, a method for improving the quality of narrowband CELP-coded speech is present. The approach is to reduce the hoarse voice in CELP-coded speech by enhancing the pitch periodicity in the reproduction signal and also to reduce the muffing characteristics of narrowband speech by regenerating the highband components of speech spectra from the reproduction signal. In the proposed method, multiband excitation (MBE) analysis is performed on the reproduction speech signal from a CELP decoder and the pitch periodicity is enhanced by resynthesizing the speech signal using a harmonic synthesizer according to the MBE model. The highband magnitude spectra are regenerated by matching to lowband spectra using a trained wideband spectral codebook. Information about the voiced/unvoiced (V/UV) excitation in the highband are derived from a training procedure and then stored alongside with the wideband spectral codebook so that they can be recovered by indexing to the codebook using the matched lowband index. Simulation results indicate that the quality of the wideband resynthesized speech is significantly improved over the narrowband CELP-coded speech.

Proceedings ArticleDOI
07 May 1996
TL;DR: A new algorithm for the classification of telephone-bandwidth speech that is designed for efficient control of bit allocation in low bit-rate speech coders and in comparison with a classifier based on the long-term autocorrelation function, the D/sub y/WT classifier proves to be superior.
Abstract: This paper describes a new algorithm for the classification of telephone-bandwidth speech that is designed for efficient control of bit allocation in low bit-rate speech coders. The algorithm is based on the dyadic wavelet transform (D/sub y/WT) and classifies each unit subframe into one of the three categories background noise/unvoiced, transients/voicing onsets, periodic/voiced. A set of three parameters is derived from the D/sub y/WT coefficients, each giving a decision score that the associated class is active. Taking the history into account, a finite-state model controlled by these parameters computes the classifier's decision. The proposed algorithm is robust to various types of background noise. In comparison with a classifier based on the long-term autocorrelation function, the D/sub y/WT classifier proves to be superior. To evaluate its performance in CELP-type speech coders, a variety of excitation coding schemes with bit rates between 2200 and 4800 bit/s is investigated.

PatentDOI
Andrew P. Dejaco1, Bi Ning1
TL;DR: In this paper, the analysis window for the coder is extended beyond the length of the target speech frame by using a one-dimensional autocorrelation matrix to reduce the computational complexity and memory required for the search.
Abstract: A method for selecting a code vector in an algebraic codebook wherein the analysis window for the coder is extended beyond the length of the target speech frame By extending the analysis window, the two dimensional impulse response matrix can be stored as a one dimensional autocorrelation matrix greatly saving on the computational complexity and memory required for the search

Proceedings ArticleDOI
07 May 1996
TL;DR: The linked split-vector quantizer (LSVQ) where the lower and the upper codebook are selected according to the preselected middle codevector, using the ordering property of LSFs, links three codebooks for the efficient use of the codebook space.
Abstract: In speech coding, several vector quantization (VQ) methods for the LPC (linear predictive coding) parameters have been developed. Because LPC parameters are too dynamic to quantize directly, the LSFs (line spectrum frequencies) are used instead. In this study, we propose the linked split-vector quantizer (LSVQ) where the lower and the upper codebook are selected according to the preselected middle codevector. Using the ordering property of LSFs, LSVQ links three codebooks for the efficient use of the codebook space. Compared with the conventional split-vector quantizer (SVQ), LSVQ increases the usage of codebook space by 10.84%, and shows lower spectral distortion at 23 bits/frame than the SVQ at 24 bits/frame.

Patent
24 Jun 1996
TL;DR: In this paper, the authors proposed a method to receive a speech signal, perform a recognition weighting process on it, synthesize a synthetic speech signal and calculate an autocorrelation of the synthesized speech signal whose delay is a predetermined value, to divide the square of the former by the latter, to calculate a pitch lag and a pitch filter coefficient by calculating only the part of a positive peak with skipping over the negative peak.
Abstract: The present invention relates to the method to receive a speech signal, to perform a recognition weighting process on it, to synthesize a synthetic speech signal, to calculate an autocorrelation of the synthetic speech signal whose delay is a predetermined value and an autocorrelation whose delay is 0, to divide the square of the former by the latter, to calculate a pitch lag and a pitch filter coefficient by calculating only the part of a positive peak with skipping over the part of a negative peak by using the results from the dividing operation, and to calculate and output the pitch lag and the pitch filter coefficient by repeating the above process Thus, real-time implementation of CELP vocoder can be achieved.

Proceedings ArticleDOI
21 Oct 1996
TL;DR: To find the optimum pitch lag in a CELP vocoder with reduced computation requirement, a new pitch search algorithm is proposed, based on the sign of the abbreviated correlation function and agrees with that of the original correlation function.
Abstract: To find the optimum pitch lag in a CELP vocoder with reduced computation requirement, a new pitch search algorithm is proposed. This algorithm is based on the sign of the abbreviated correlation function and agrees with that of the original correlation function. The abbreviated correlation function makes it possible to confine the candidates for the optimum pitch to the positively correlated lags, which reduces the computation load considerably. However, since the optimum pitch can be found without omission, the degradation of segmental SNR (SEGSNR) does not occur with the proposed algorithm. Experimental results show that the proposed algorithm can achieve 40% of computation time reduction compared to the conventional full search method.

Journal ArticleDOI
TL;DR: An educational software tool on speech coding is presented that is used in a senior-level DSP class at Arizona State University, USA, to expose undergraduate students to speech coding and present speech analysis/synthesis as an application paradigm for many DSP fundamental concepts.
Abstract: An educational software tool on speech coding is presented. Portions of this program are used in a senior-level DSP (digital signal processing) class at Arizona State University, USA, to expose undergraduate students to speech coding and present speech analysis/synthesis as an application paradigm for many DSP fundamental concepts. The simulation software provides an interactive environment that allows users to investigate and understand speech coding algorithms for a variety of input speech records. Time- and frequency-domain representations of input and reconstructed speech can be graphically displayed and played back on a PC equipped with a standard 16-bit sound card. The program has been developed for use in the MATLAB environment and includes implementations of the FS-1015 LPC-10e, the FS-1016 CELP, the ETSI GSM, the IS-54 VSELP, the G.721 ADPCM, and the G.728 LD-CELP speech coding algorithms, integrated under a common graphical interface.

Proceedings ArticleDOI
14 Oct 1996
TL;DR: Simulation results indicate that the duality of the wideband resynthesized speech is significantly improved over the narrowband CELP-coded speech.
Abstract: A method for improving the quality of narrowband CELP-coded speech is presented. The approach is to reduce the hoarse quality of CELP-coded speech by enhancing the pitch periodicity in the reproduction signal and also to reduce the muffing characteristics of narrowband speech by regenerating the highband components of speech spectra from the reproduction signal. In the proposed method, multiband excitation (MBE) analysis is performed on the reproduction speech signal from a CELP decoder and the pitch periodicity is enhanced by re-synthesizing the speech signal using a harmonic synthesizer according to the MBE model. The highband magnitude spectra are regenerated by matching to lowband spectra using a trained wideband spectral codebook. Information about the voiced/unvoiced (V/UV) excitation in the highband are derived from a training procedure and then stored alongside with the wideband spectral codebook so that they can be recovered by indexing to the codebook using the matched lowband index. Simulation results indicate that the duality of the wideband resynthesized speech is significantly improved over the narrowband CELP-coded speech.

Patent
25 Oct 1996
TL;DR: In this article, the authors proposed a speech encoding method and apparatus in which an input speech signal is divided into blocks or frames as encoding units and encoded in terms of the encoding units, in which explosive and fricative consonants can be impeccably reproduced, while there is no risk of foreign sound being generated at a transient portion between voiced (V) and unvoiced (UV) portions.
Abstract: A speech encoding method and apparatus in which an input speech signal is divided.in terms of blocks or frames as encoding units and encoded in terms of the encoding units, in which explosive and fricative consonants can be impeccably reproduced, while there is no risk of foreign sound being generated at a transient portion between voiced (V) and unvoiced (UV) portions, so that the speech with high clarity devoid of "stuffed" feeling may be produced. The encoding apparatus includes a first encoding unit 110 for finding residuals of linear predictive coding (LPC) of an input speech signal for performing harmonic coding and a second encoding unit 120 encoding the input speech signal by waveform coding. The first encoding unit 110 and the second encoding unit 120 are used for encoding a voiced (V) portion and an unvoiced (UV) portion of the input signal, respectively. The constitution of a code excited linear prediction (CELP) encoding employing vector quantization by a closed loop search of an optimum vector using an analysis-by-synthesis method is used for the second encoding unit 120.

Journal ArticleDOI
T.T. Le1, J.S. Mason1
01 Jun 1996
TL;DR: Direct comparisons of MLPs and linear filters show that with CELP degradation the SNR improvements achieved by the MLP is measurably better than with an equivalent linear structure but when the degradation is additive noise the two structures perform equally well.
Abstract: A multilayer perceptron (MLP) is applied as a time domain nonlinear filter to two classes of degraded speech, namely gaussian white noise and nonlinear system degradation introduced by a low bit-rate CELP coder. The goal of the study is to examine the influence of the inherent nonlinearity within the MLP, and this is achieved by varying the levels of nonlinearity within the structure. Direct comparisons of MLPs and linear filters show that with CELP degradation the SNR improvements achieved by the MLP is measurably better than with an equivalent linear structure (3 dB cf 1.5 dB) but when the degradation is additive noise the two structures perform equally well. The study highlights the importance of scaling to achieve optimum performance, and of matching the enhancer to the degradation.

Proceedings ArticleDOI
16 Sep 1996
TL;DR: This paper investigates a way to predict the coding quality from the image content, based on a neural network, which can be based on those predicted coding qualities, and does not require the computation of all coding algorithms.
Abstract: In image coding, the choice of a good image coding algorithm is very dependent on the image content. Based on this fact, dynamic coding algorithms have been designed. They try to find an optimal coding scheme for each image segment. They rely on an exhaustive search of the best coding algorithm. Evaluation of all algorithms is computationally very intensive and strongly limits the number of considered algorithms for a given application. Therefore, current standards rely on a single coding algorithm. This paper investigates a way to predict the coding quality from the image content. This prediction is based on a neural network. The coding quality is computed from image region features. Those features are easy and fast to compute, and are common to the whole set of considered coding algorithms. Therefore, the choice of the best algorithm can be based on those predicted coding qualities, and does not require the computation of all coding algorithms. The system is also fast enough to be used for dynamic bitrate allocation, and a simple algorithm to do this is proposed.

Proceedings ArticleDOI
03 Oct 1996
TL;DR: The paper presents a phoneme/diphone based speech synthesis system for the (Brazilian) Portuguese language and detailing the process of building the phoneme library and the interpolation techniques used.
Abstract: The paper presents a phoneme/diphone based speech synthesis system for the (Brazilian) Portuguese language. The basic idea of this system is the construction of a library of phonetic units, and processing of those basic units to build an utterance. The system is complemented by a text to phoneme translator described previously. The phonate's representation in the library is based on a linear prediction model; the filter which models the vocal tract is represented by line spectrum pairs, and the excitation by code excited linear prediction (CELP) parameters. The paper is organized as follows. After a brief introduction, CELP coding is briefly presented and the relevant points to be applied in speech synthesis are presented. The main contribution of the paper is detailing the process of building the phoneme library and the interpolation techniques used.

Patent
19 Sep 1996
TL;DR: An improved pitch searching time reducing method for a CELP vocoder using a Line Spectral Pair (LSP) frequency which is capable of significantly reducing the pitch search time by separating the speech signal using a first formant frequency of the line spectral pair of the digital type personal communication system is presented in this paper.
Abstract: An improved pitch searching time reducing method for a CELP vocoder using a Line Spectral Pair (LSP) frequency which is capable of significantly reducing the pitch search time by separating the speech signal using a first formant frequency of the line spectral pair of the digital type personal communication system, which includes the steps of computing a decimation interval of a pitch search interval using an LSP frequency of a first formant computed by a formant filter so as to compute a preparatory pitch of a given speech; determining a preparatory pitch to be used when searching a pitch by detecting a peak and a valley within each decimation interval; and computing a preparatory pitch by adapting a first formant frequency of an LSP computed by a formant filter with a decimation rate and performing a pitch search with respect to the obtained preparatory pitch.