scispace - formally typeset
Search or ask a question
Author

Yang Gao

Other affiliations: Conexant
Bio: Yang Gao is an academic researcher from Mindspeed Technologies. The author has contributed to research in topics: Speech coding & Voice activity detection. The author has an hindex of 11, co-authored 19 publications receiving 580 citations. Previous affiliations of Yang Gao include Conexant.

Papers
More filters
Patent
Yang Gao1, Adil Benyassine2, Jes Thyssen2, Eyal Shlomot2, Huan-Yu Su2 
15 Sep 2000
TL;DR: In this paper, a speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed, which optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech.
Abstract: A speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed. The speech compression system optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. The codecs are selectively activated based on a rate selection. In addition, the full and half-rate codecs are selectively activated based on a type classification. Each codec is selectively activated to encode and decode the speech signals at different bit rates emphasizing different aspects of the speech signal to enhance overall quality of the synthesized speech.

119 citations

Proceedings ArticleDOI
15 Apr 2007
TL;DR: This paper describes the scalable coder - G.729.1 - which has been recently standardized by ITU-T for wideband telephony and voice over IP (VoIP) applications and which can operate at 12 different bit rates from 32 down to 8 kbit/s with wideband quality starting at 14 k bit/s.
Abstract: This paper describes the scalable coder - G.729.1 - which has been recently standardized by ITU-T for wideband telephony and voice over IP (VoIP) applications. G.729.1 can operate at 12 different bit rates from 32 down to 8 kbit/s with wideband quality starting at 14 kbit/s. This coder is a bitstream interoperable extension of ITU-T G.729 based on three embedded stages: narrowband cascaded CELP coding at 8 and 12 kbit/s, time-domain bandwidth extension (TDBWE) at 14 kbit/s, and split-band MDCT coding with spherical vector quantization (VQ) and pre-echo reduction from 16 to 32 kbit/s. Side information - consisting of signal class, phase, and energy - is transmitted at 12, 14 and 16 kbit/s to improve the resilience and recovery of the decoder in case of frame erasures. The quality, delay, and complexity of G.729.1 are summarized based on ITU-T results.

108 citations

Patent
Yang Gao1, Adil Benyassine1, Huan-Yu Su1, Eyal Shlomot1, Jes Thyssen1 
15 Sep 2000
TL;DR: In this article, a speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed, which optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech.
Abstract: A speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed. The speech compression system optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. The codecs are selectively activated based on a rate selection. In addition, the full and half-rate codecs are selectively activated based on a type classification. Each codec is selectively activated to encode and decode the speech signals at different bit rates emphasizing different aspects of the speech signal to enhance overall quality of the synthesized speech.

81 citations

Patent
Yang Gao1, Adil Benyassine1, Jes Thyssen1, Eyal Shlomot1, Huan-Yu Su1 
08 Apr 2003
TL;DR: In this article, a speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed, which optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech.
Abstract: A speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed. The speech compression system optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. The codecs are selectively activated based on a rate selection. In addition, the full and half-rate codecs are selectively activated based on a type classification. Each codec is selectively activated to encode and decode the speech signals at different bit rates emphasizing different aspects of the speech signal to enhance overall quality of the synthesized speech.

64 citations

Patent
Yang Gao1, Adil Benyassine1, Jes Thyssen1, Eyal Shlomot1, Huan-Yu Su1 
15 Sep 2000
TL;DR: In this article, a speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed, which optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech.
Abstract: A speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed. The speech compression system optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. The codecs are selectively activated based on a rate selection. In addition, the full and half-rate codecs are selectively activated based on a type classification. Each codec is selectively activated to encode and decode the speech signals at different bit rates emphasizing different aspects of the speech signal to enhance overall quality of the synthesized speech.

38 citations


Cited by
More filters
Patent
30 Jun 2007
TL;DR: In this paper, video decoding innovations for multithreading implementations and graphics processor unit (GPU) implementations are described, where a decoder uses innovations in the areas of layered data structures, picture extent discovery, a picture command queue, and/or task scheduling for multi-reading.
Abstract: Video decoding innovations for multithreading implementations and graphics processor unit (“GPU”) implementations are described. For example, for multithreaded decoding, a decoder uses innovations in the areas of layered data structures, picture extent discovery, a picture command queue, and/or task scheduling for multithreading. Or, for a GPU implementation, a decoder uses innovations in the areas of inverse transforms, inverse quantization, fractional interpolation, intra prediction using waves, loop filtering using waves, memory usage and/or performance-adaptive loop filtering. Innovations are also described in the areas of error handling and recovery, determination of neighbor availability for operations such as context modeling and intra prediction, CABAC decoding, computation of collocated information for direct mode macroblocks in B slices, reduction of memory consumption, implementation of trick play modes, and picture dropping for quality adjustment.

159 citations

Journal ArticleDOI
TL;DR: A statistical model for speech enhancement that takes into account the time-correlation between successive speech spectral components is proposed and it is shown that a special case of the causal estimator degenerates to a "decision-directed" estimator with a time-varying frequency-dependent weighting factor.
Abstract: In this paper, we propose a statistical model for speech enhancement that takes into account the time-correlation between successive speech spectral components. It retains the simplicity associated with the Gaussian statistical model, and enables the extension of existing algorithms to noncausal estimation. The sequence of speech spectral variances is a random process, which is generally correlated with the sequence of speech spectral magnitudes. Causal and noncausal estimators for the a priori SNR are derived in agreement with the model assumptions and the estimation of the speech spectral components. We show that a special case of the causal estimator degenerates to a "decision-directed" estimator with a time-varying frequency-dependent weighting factor. Experimental results demonstrate the improved performance of the proposed algorithms.

154 citations

Patent
11 Oct 2005
TL;DR: In this paper, a method and an apparatus for spectral envelope encoding is presented, which is applicable to both natural audio coding and speech coding systems and is especially suited for coders using SBR [WO 98/57436] or other high frequency reconstruction methods.
Abstract: The present invention provides a new method and an apparatus for spectral envelope encoding. The invention teaches how to perform and signal compactly a time/frequency mapping of the envelope representation, and further, encode the spectral envelope data efficiently using adaptive time/frequency directional coding. The method is applicable to both natural audio coding and speech coding systems and is especially suited for coders using SBR [WO 98/57436] or other high frequency reconstruction methods.

142 citations

Patent
31 Jul 2007
Abstract: Applications of dim-and-burst techniques to coding of wideband speech signals are described. Reconstruction of a highband portion of a frame of a wideband speech signal using information from a previous frame is also described.

128 citations

Patent
29 Jun 2007
TL;DR: In this article, an audio decoder provides a combination of decoding components including components implementing base band decoding, spectral peak decoding, frequency extension decoding and channel extension decoding techniques, and a bitstream syntax scheme to permit the various decoding components to extract the appropriate parameters for their respective decoding technique.
Abstract: An audio decoder provides a combination of decoding components including components implementing base band decoding, spectral peak decoding, frequency extension decoding and channel extension decoding techniques. The audio decoder decodes a compressed bitstream structured by a bitstream syntax scheme to permit the various decoding components to extract the appropriate parameters for their respective decoding technique.

116 citations