scispace - formally typeset
Search or ask a question
Author

Forrest Feng-Tzer Tzeng

Bio: Forrest Feng-Tzer Tzeng is an academic researcher. The author has contributed to research in topics: Distortion & Speech coding. The author has an hindex of 2, co-authored 2 publications receiving 220 citations.

Papers
More filters
PatentDOI
TL;DR: In this article, a linear predictive speech codec is proposed, which includes a spectrum synthesizer for providing reconstructed speech generation in response to excitation signals; a distortion analyzer for comparing the reconstructed speech with an original speech, and providing a distortion analysis signal in response of such comparison.
Abstract: A linear predictive speech codec arrangement including: a spectrum synthesizer for providing reconstructed speech generation in response to excitation signals; a distortion analyzer for comparing the reconstructed speech with an original speech, and providing a distortion analysis signal in response to such comparison; and an excitation model circuit for providing excitation signals to the spectrum synthesizer, with the excitation model circuit receiving and utilizing the distortion analysis signal in an analysis-by-synthesis operation, for determining ones of excitation signals which provide an optimal reconstructed speech. The excitation model circuit can include: a voiced excitation generator and a Gaussian noise generator, both of which should optimally provide a plurality of available excitation signal models. The voiced excitation generator and Gaussian noise generator can be in the form of a codebook of a plurality of possible pulse trains and Gaussian sequences, respectively, or alternatively, the voiced excitation generator can be in the form of a first order pitch synthesizer. The optimal excitation signal and/or the pitch value and the pitch filter coefficient are determined using an analysis-by-synthesis technique.

110 citations

PatentDOI
TL;DR: In this paper, a 26-bit spectrum filter coding scheme was used to jointly optimize pitch and gain parameter sets in a speech codec operating at low data rates using an iterative method, where the number of bits allocated to the pitch and excitation signals depend on whether the signals are significant or not.
Abstract: A speech codec operating at low data rates uses an iterative method to jointly optimize pitch and gain parameter sets. A 26-bit spectrum filter coding scheme may be used, involving successive subtractions and quantizations. The codec may preferably use a decomposed multipulse excitation model, wherein the multipulse vectors used as the excitation signal are decomposed into position and amplitude codewords. Multipulse vectors are coded by comparing each vector to a reference multipulse vector and quantizing the resulting difference vector. An expanded multipulse excitation codebook and associated fast search method, optionally with a dynamically-weighted distortion measure, allow selection of the best excitation vector without memory or computational overload. In a dynamic bit allocation technique, the number of bits allocated to the pitch and excitation signals depend on whether the signals are "significant" or "insignificant". Silence/speech detection is based on an average signal energy over an interval and a minimum average energy over a predetermined number of intervals. Adaptive post-filter and the automatic gain control schemes are also provided. Interpolation is used for spectrum filter smoothing, and an algorithm is provided for ensuring stability of the spectrum filter. Specially designed scalar quantizers are provided for the pitch gain and excitation gain.

110 citations


Cited by
More filters
PatentDOI
TL;DR: In this paper, a variable rate coding of frames of digitized speech samples is proposed, comprising the steps of determining a level of speech activity for a frame of digitised speech samples, selecting an encoding rate from a set of rates based upon the determined level of activity within said frame, and coding said frame according to a predetermined coding format for said selected rate wherein each rate has a corresponding different coding format.
Abstract: A method of speech signal compression, by variable rate coding of frames of digitized speech samples, comprising the steps of: determining a level of speech activity for a frame of digitized speech samples; selecting an encoding rate from a set of rates based upon said determined level of speech activity within said frame; coding said frame according to a predetermined coding format for said selected rate wherein each rate has a corresponding different coding format; providing for said frame a corresponding output data packet at said selected rate.

552 citations

Patent
06 Mar 1995
TL;DR: In this paper, a communication system consisting of at least one mobile handheld telephone handset adapted to communicate with a telephone network handling system is described, where the first processing step preserves predetermined information.
Abstract: A communication system, comprising at least one mobile handheld telephone handset adapted to communicate with a telephone network handling system. The handset comprises means to produce first signals dependent thereupon, means to produce a voice transmission signal and means to transmit the voice transmission signal. The handset also comprises first processing means to carry out a speech recognition process to produce initial feature analysis parameter coefficients data dependent thereupon. The first processing step preserves predetermined information. The handset further comprises means to produce a data transmission. The telephone network handling system comprises means to receive the voice signal, means to forward the voice signal, and means to receive and process the data transmission signal. The telephone handling system also comprises means to carry out the remote second processing step in a speech recognition process on the regenerated data.

224 citations

PatentDOI
TL;DR: In this article, a wideband speech signal (8 kHz) of high quantity is reconstructed from a narrowband speech signals (300 Hz to 3.4 kHz) by LPC-analyzing to obtain spectrum information parameters.
Abstract: A wideband speech signal (8 kHz, for example) of high quantity is reconstructed from a narrowband speech signal (300 Hz to 3.4 kHz). The input narrowband speech signal is LPC-analyzed to obtain spectrum information parameters, and the parameters are vector-quantized using a narrowband speech signal codebook. For each code number of the narrowband speech signal codebook, the wideband speech waveform corresponding to the codevector concerned is extracted by one pitch for voiced speech and by one frame for unvoiced speech and prestored in a representative waveform codebook. Representative waveform segments corresponding to the respective output codevector numbers of the quantizer are extracted from the representative waveform codebook. Voiced speech is synthesized by pitch-synchronous overlapping of the extracted representative waveform segments and unvoiced speech is synthesized by randomly using waveforms of one frame length. By this, a wideband speech signal is produced. Then, frequency components below 300 Hz and above 3.4 kHz are extracted from the wideband speech signal and are added to an up-sampled version of the input narrowband speech signal to thereby reconstruct the wideband speech signal.

219 citations

Patent
28 Nov 2002
TL;DR: In this article, a detection mechanism on the encoder side is used to assess what parts of the spectrum will not be correctly reproduced by the HFR method in the decoder.
Abstract: The present invention proposes a new method and a new apparatus for enhancement of audio source coding systems utilising high frequency reconstruction (HFR). It utilises a detection mechanism on the encoder side to assess what parts of the spectrum will not be correctly reproduced by the HFR method in the decoder. Information on this is efficiently coded and sent to the decoder, where it is combined with the output of the HFR unit.

120 citations