scispace - formally typeset
Search or ask a question

Showing papers on "Adaptive Multi-Rate audio codec published in 2000"


Proceedings ArticleDOI
28 May 2000
TL;DR: This paper gives an overview of the HILN tools, presents the recent advances in signal modelling and parameter coding, and concludes with an evaluation of the subjective audio quality.
Abstract: The MPEG-4 Audio Standard combines tools for efficient and flexible coding of audio. For very low bitrate applications, tools based on a parametric signal representation are utilised. The parametric speech coding tools (HVXC) are already available in Version 1 of MPEG-4. The main focus of this paper is on the parametric audio coding tools "Harmonic and Individual Lines plus Noise" (KILN) which are included in Version 2 of MPEG-4. As already indicated by their name, the HILN tools are based on the decomposition of the audio signal into components which are described by appropriate source models and represented by model parameters. This paper gives an overview of the HILN tools, presents the recent advances in signal modelling and parameter coding, and concludes with an evaluation of the subjective audio quality.

121 citations


Patent
Yang Gao1, Adil Benyassine2, Jes Thyssen2, Eyal Shlomot2, Huan-Yu Su2 
15 Sep 2000
TL;DR: In this paper, a speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed, which optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech.
Abstract: A speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed. The speech compression system optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. The codecs are selectively activated based on a rate selection. In addition, the full and half-rate codecs are selectively activated based on a type classification. Each codec is selectively activated to encode and decode the speech signals at different bit rates emphasizing different aspects of the speech signal to enhance overall quality of the synthesized speech.

119 citations


Patent
Yang Gao1, Adil Benyassine1, Huan-Yu Su1, Eyal Shlomot1, Jes Thyssen1 
15 Sep 2000
TL;DR: In this article, a speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed, which optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech.
Abstract: A speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed. The speech compression system optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. The codecs are selectively activated based on a rate selection. In addition, the full and half-rate codecs are selectively activated based on a type classification. Each codec is selectively activated to encode and decode the speech signals at different bit rates emphasizing different aspects of the speech signal to enhance overall quality of the synthesized speech.

81 citations


Proceedings ArticleDOI
17 Sep 2000
TL;DR: An algorithm to generate wideband speech from narrow band speech using as low as 500 bit/s of side information is presented, which has enhanced quality compared to narrowband speech.
Abstract: Wireless telephone speech is usually limited to the 300-3400 Hz band, which reduces its quality. There is thus a growing demand for wideband speech systems that transmit from 50 Hz to 8000 Hz. This paper presents an algorithm to generate wideband speech from narrowband speech using as low as 500 bit/s of side information. The 50-300 Hz band is predicted from the narrowband signal. A source-excitation model is used for the 3400-8000 Hz band, where the excitation is extrapolated at the receiver, and the spectral envelope is transmitted. Though some artifacts are present, the resulting wideband speech has enhanced quality compared to narrowband speech.

77 citations


Journal ArticleDOI
TL;DR: The natural coding part within MPEG-4 audio describes traditional type speech and high-quality audio coding algorithms and their combination to enable new functionalities like scalability across the boundaries of coding algorithms.
Abstract: MPEG-4 audio represents a new kind of audio coding standard. Unlike its predecessors, MPEG-1 and MPEG-2 high-quality audio coding, and unlike the speech coding standards which have been completed by the ITU-T, it describes not a single or small set of highly efficient compression schemes but a complete toolbox to do everything from low bit-rate speech coding to high-quality audio coding or music synthesis. The natural coding part within MPEG-4 audio describes traditional type speech and high-quality audio coding algorithms and their combination to enable new functionalities like scalability (hierarchical coding) across the boundaries of coding algorithms. This paper gives an overview of the basic algorithms and how they can be combined.

62 citations


Proceedings ArticleDOI
17 Sep 2000
TL;DR: A speech/music discrimination procedure for multi-mode wideband coding that is suitable for combined speech and audio coding and shows improved performance when compared to single-mode encoding is described.
Abstract: We propose in this paper a general solution for combined speech and audio coding. Particularly, we describe a speech/music discrimination procedure for multi-mode wideband coding. The speech/music decision is updated only when a low-energy frame is detected, and kept unchanged otherwise. The signal is classified using second-order statistics of discriminant parameters. An experimental CELP/transform coder operating at 16 kbit/s is demonstrated. Results show improved performance when compared to single-mode encoding.

56 citations


01 Jan 2000
TL;DR: It is shown in this work that by selectively constructing a cepstral feature vector from the GSM codec parameters it is possible to reduce the effect of coding on recognition, and weighted acoustic modeling is introduced as an alternative to the method based on average distortion information.
Abstract: The growth of cellular telephony combined with recent advances in speech recognition technology results in sizeable potential opportunities for mobile speech recognition applications. Classic robustness techniques that have been previously proposed for speech recognition yield limited improvements of the degradation introduced by idiosyncrasies of the mobile networks. These sources of degradation include distortion introduced by the speech codec as well as artifacts arising from channel errors and discontinuous transmission. In this thesis we focus on characterizing the distortion introduced to the speech signal by the speech codec and we propose methods for reducing the detrimental effect of coding on recognition accuracy. The initial focus of this thesis is on the full rate GSM codec (FR-GSM). We propose a method to generate recognition features directly from codec parameters. It is shown in this work that by selectively constructing a cepstral feature vector from the GSM codec parameters it is possible to reduce the effect of coding on recognition. The later parts of this work are related to weighted acoustic modeling for robust speech recognition. The motivation for this approach is based on the observation that not all phones in a GSM-coded corpus are distorted to the same extent due to coding. We first establish a set of phonetic distortion classes through an analysis of the distribution of the log spectral distortion introduced to each phone by the codec. These classes are then employed to estimate an optimal weighted combination of acoustic models according to the average distortion encountered by the class. A relative reduction of almost 70% of the degradation introduced by the GSM codec was achieved using this method. The technique of weighted acoustic modeling based on instantaneous distortion is introduced as an alternative to the method based on average distortion information. When the extent of cepstral distortion introduced by coding is known, weighted acoustic modeling provides a reduction of about 50% in the word error rate introduced by concurrent GSM and CELP. We propose two methods to estimate the instantaneous distortion information: one based on recoding sensitivity and another based on long-term predictability. Due to the non linear relation between the time and the log-spectral domain, the proposed estimates of the instantaneous distortion do not perform as well as algorithms based on knowledge of cepstral distortion. However, we show that employing the proposed instantaneous distortion information estimates can help obtain the best recognition results established in the baseline conditions employing only 50% of the baseline Gaussian density computations.

47 citations


01 Jan 2000
TL;DR: This thesis demonstrates that wideband speech can be communicated at or near the bit rate of a narrowband speech coder and examines in detail each component of the wideband enhancement scheme: highband excitation synthesis, highband envelope estimation, and narrowband-highband envelope continuity.
Abstract: ii Abstract Most existing telephone networks transmit narrowband coded speech which has been bandlimited to 4 kHz. Compared with normal speech, this speech has a muffled quality and reduced intelligibility, which is particularly noticeable in sounds such as /s/, /f/ and /sh/. Speech which has been bandlimited to 8 kHz is often coded for this reason, but this requires an increase in the bit rate. Wideband enhancement is a scheme that adds a synthesized highband signal to narrowband speech to produce a higher quality wideband speech signal. The synthesized highband signal is based entirely on information contained in the narrowband speech, and is thus achieved at zero increase in the bit rate from a coding perspective. Wideband enhancement can function as a post-processor to any narrowband telephone receiver, or alternatively it can be combined with any narrowband speech coder to produce a very low bit rate wideband speech coder. Applications include higher quality mobile, teleconferencing, and internet telephony. This thesis examines in detail each component of the wideband enhancement scheme: highband excitation synthesis, highband envelope estimation, and narrowband-highband envelope continuity. Objective and subjective test measures are formulated to assess existing and new methods for all components, and the likely limitations to the performance of wideband enhancement are also investigated. A new method for highband excitation synthesis is proposed that uses a combination of sinusoidal transform coding-based excitation and random excitation. Several new techniques for highband spectral envelope estimation are also developed. The performance of these techniques is shown to be approaching the limit likely to be achieved. Subjective tests demonstrate that wideband speech synthesized using these techniques has higher quality than the input narrowband speech. Finally, a new paradigm for very low bit rate wideband speech coding is presented in which the quality of the wideband enhancement scheme is improved further by allocating a very small bitstream for highband envelope and gain coding. Thus, this thesis demonstrates that wideband speech can be communicated at or near the bit rate of a narrowband speech coder.Most existing telephone networks transmit narrowband coded speech which has been bandlimited to 4 kHz. Compared with normal speech, this speech has a muffled quality and reduced intelligibility, which is particularly noticeable in sounds such as /s/, /f/ and /sh/. Speech which has been bandlimited to 8 kHz is often coded for this reason, but this requires an increase in the bit rate. Wideband enhancement is a scheme that adds a synthesized highband signal to narrowband speech to produce a higher quality wideband speech signal. The synthesized highband signal is based entirely on information contained in the narrowband speech, and is thus achieved at zero increase in the bit rate from a coding perspective. Wideband enhancement can function as a post-processor to any narrowband telephone receiver, or alternatively it can be combined with any narrowband speech coder to produce a very low bit rate wideband speech coder. Applications include higher quality mobile, teleconferencing, and internet telephony. This thesis examines in detail each component of the wideband enhancement scheme: highband excitation synthesis, highband envelope estimation, and narrowband-highband envelope continuity. Objective and subjective test measures are formulated to assess existing and new methods for all components, and the likely limitations to the performance of wideband enhancement are also investigated. A new method for highband excitation synthesis is proposed that uses a combination of sinusoidal transform coding-based excitation and random excitation. Several new techniques for highband spectral envelope estimation are also developed. The performance of these techniques is shown to be approaching the limit likely to be achieved. Subjective tests demonstrate that wideband speech synthesized using these techniques has higher quality than the input narrowband speech. Finally, a new paradigm for very low bit rate wideband speech coding is presented in which the quality of the wideband enhancement scheme is improved further by allocating a very small bitstream for highband envelope and gain coding. Thus, this thesis demonstrates that wideband speech can be communicated at or near the bit rate of a narrowband speech coder.

39 citations


Patent
Yang Gao1, Adil Benyassine1, Jes Thyssen1, Eyal Shlomot1, Huan-Yu Su1 
15 Sep 2000
TL;DR: In this article, a speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed, which optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech.
Abstract: A speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed. The speech compression system optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. The codecs are selectively activated based on a rate selection. In addition, the full and half-rate codecs are selectively activated based on a type classification. Each codec is selectively activated to encode and decode the speech signals at different bit rates emphasizing different aspects of the speech signal to enhance overall quality of the synthesized speech.

38 citations


Journal ArticleDOI
TL;DR: A robust codec for the transmission of very low bit-rate video over channels with a variety of errors, including random and bursty bit errors and packet loss, which is efficient in its use of bits and has good error resilience properties both objectively and subjectively over a wide range of conditions.
Abstract: We describe a robust codec for the transmission of very low bit-rate video over channels with a variety of errors, including random and bursty bit errors and packet loss. The codec exploits adaptivity to give good performance with a low overhead. By only protecting macroblocks which would otherwise be poorly concealed by the decoder the codec allows adaptive selection of the parts of video to protect. For protection, it uses multiple description codes which indirectly provide frequency-based adaptivity by protecting the more significant DCT coefficients. Simulations show significant improvements in the performance of the codec when compared to codecs which use intra macroblock updating (raster scan and random) at the same overhead. The codec is efficient in its use of bits and has good error resilience properties both objectively and subjectively over a wide range of conditions. Further, transcoding of the received bit stream to the standard H.263 syntax is relatively easy.

35 citations


Patent
Yang Gao1
15 Sep 2000
TL;DR: In this paper, a speech compression system with a fixed codebook structure and a new search routine is proposed for speech coding, which is capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech.
Abstract: A speech compression system with a special fixed codebook structure and a new search routine is proposed for speech coding. The system is capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech. The codebook structure uses a plurality of subcodebooks. Each subcodebook is designed to fit a specific group of speech signals. A criterion value is calculated for each subcodebook to minimize an error signal in a minimization loop as part of the coding system. An external signal sets a maximum bitstream rate for delivering encoded speech into a communications system. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. Each codec is selectively activated to encode and decode the speech signals at different bit rates to enhance overall quality of the synthesized speech at a limited average bit rate.

Proceedings Article
Kari Jarvinen1
01 Sep 2000
TL;DR: The AMR codec provides the next step of speech quality improvement in GSM after the introduction of Enhanced Full-Rate (EFR) codec in 1996 and is adopted as the mandatory speech codec for the third generation WCDMA system.
Abstract: European Telecommunication Standards Institute (ETSI) initiated a standardisation program in October 1997 to develop an Adaptive Multi-Rate (AMR) codec for GSM. After two competitive selection phases, ETSI chose in October 1998 a codec developed in collaboration between Ericsson, Nokia, and Siemens. The codec standard was finalised, characterised, and formally approved in ETSI during early 1999. The AMR codec provides the next step of speech quality improvement in GSM after the introduction of Enhanced Full-Rate (EFR) codec in 1996. AMR offers substantial improvement in error robustness by adapting speech and channel coding depending on channel conditions. By switching to operate in the GSM half-rate channel during good channel conditions, AMR provides also channel capacity gain over the EFR codec. In April 1999, the Third Generation Partnership Project (3GPP) adopted the AMR codec as the mandatory speech codec for the third generation WCDMA system.

Proceedings ArticleDOI
05 Jun 2000
TL;DR: It is shown that Hi-BIN offers a low bit-rate representation of the higher band and is backwards compatible with existing narrowband speech coding systems.
Abstract: In this paper, an encoding technique called Hi-BIN (High Band Injection), which can be combined with any narrowband coder to achieve good quality wideband speech, is described. The principle behind this technique is to model frequencies above 4 kHz by noise with an appropriate spectral shape. This simple way of injecting synthetic noise in the higher frequencies gives surprisingly good quality when compared to very widely used computationally intensive waveform coding techniques such as CELP. We show that Hi-BIN offers a low bit-rate representation of the higher band and is backwards compatible with existing narrowband speech coding systems.

Patent
Sami Kekki1
11 Apr 2000
TL;DR: In this paper, the authors propose a method for conveying an Adaptive Multi-Rate (AMR) codec stream between a first codex and a second codec over a network, comprising the steps of transmitting AMR frames via a real-time transport protocol (RTP) stream, and transmitting an AMR control signal via a Real-Time Transport Control Protocol (RTCP) stream.
Abstract: The invention proposes a method for conveying an Adaptive Multi-Rate (AMR) codec stream between a first codex and a second codec over a network, comprising the steps of transmitting AMR frames via a Real-Time Transport Protocol (RTP) stream, and transmitting an AMR control signal via a Real-Time Transport Control Protocol (RTCP) stream. Thus, an easy control of the AMR codec stream via UMTS or IP networks is possible.

Proceedings ArticleDOI
21 Aug 2000
TL;DR: An overview on the current status of parametric audio coding developments is given and advantages and challenges of this approach are demonstrated, which indicate possible directions of further improvements.
Abstract: For very low bit rate audio coding applications in mobile communications or on the Internet, parametric audio coding has evolved as a technique complementing the more traditional approaches. These are transform codecs originally designed for achieving CD-like quality on one hand, and specialized speech codecs on the other hand. Both of these techniques usually represent the audio signal waveform in a way such that the decoder output signal gives an approximation of the encoder input signal, while taking into account perceptual criteria. Compared to this approach, in parametric audio coding the models of the signal source and of human perception are extended. The source model is now based on the assumption that the audio signal is the sum of "components," each of which can be approximated by a relatively simple signal model with a small number of parameters. The perception model is based on the assumption that the sound of the decoder output signal should be as similar as possible to that of the encoder input signal. Therefore, the approximation of waveforms is no longer necessary. This approach can lead to a very efficient representation. However, a suitable set of models for signal components, a good decomposition, and a good parameter estimation are all vital for achieving maximum audio quality. We give an overview on the current status of parametric audio coding developments and demonstrate advantages and challenges of this approach. Finally, we indicate possible directions of further improvements.

Proceedings ArticleDOI
28 May 2000
TL;DR: This paper presents a scalable audio format, called "multi-layer scalable LPC audio format", that addresses similar functionalities of MPEG-4, and answers to the most important requirements of transmission and storage purposes, such as channel error robustness, cell loss resilientness, low delay, and playback control.
Abstract: This paper presents a scalable audio format, called "multi-layer scalable LPC audio format", that addresses similar functionalities of MPEG-4. The format offers different levels of data rate and audio quality, and answers to the most important requirements of transmission and storage purposes, such as channel error robustness, cell loss robustness, low delay, and playback control. It operates in four modes. The first mode is based on a modified version of the LD-CELP algorithm, in which each 6 samples are represented by one single byte. In order to improve the signal-to-noise ratio (SNR), additional enhancement layers are embedded in the bit stream to allow higher quality at higher bit rates. The resultant bit rates are integer-multiple of 10.67 kbps. The other three modes use QMF splitting to two, four and eight subbands. These modes allow efficient representation of wideband audio and speech signals, and offer extension layers of 5.33 and 2.66 kbps. A simple and efficient header structure is embedded in the bitstream to allow the decoding process even in channel error conditions and even when the bitstream has been down-scaled somewhere during the transmission but has not been acknowledged to the decoder. Comparison results are conducted with respect to MPEG and ITU standards.

Proceedings ArticleDOI
Juin-Hwey Chen1
05 Jun 2000
TL;DR: This paper presents a high-fidelity speech and audio codec operating at a sampling rate of 32 kHz and a bit rate of 64 kbit/s, designed primarily for real-time speech communication systems with high port densities.
Abstract: This paper presents a high-fidelity speech and audio codec operating at a sampling rate of 32 kHz and a bit rate of 64 kbit/s. Designed primarily for real-time speech communication systems with high port densities, this MDCT-based transform codec has a very low coding delay (8 ms frame size) and low codec complexity (less than 10 MIPS on a 16-bit fixed-point DSP). The codec achieves essentially transparent quality for speech, and very close to transparent quality for music. A novel frame erasure concealment algorithm makes this codec robust to frame erasures for both speech and music. Another novel feature allows the decoder to decode the bit stream directly into a 16 kHz or 8 kHz sampled signal, without the need to decode a 32 kHz signal first and then down-sample it to the target sampling rate. Other novel features include some speed-memory trade-off techniques to reduce the computational complexity.

Proceedings ArticleDOI
28 May 2000
TL;DR: A scalable MPEG-4 video codec architecture is proposed to achieve low power consumption and high cost-effectiveness for IMT-2000 multimedia applications.
Abstract: A scalable MPEG-4 video codec architecture is proposed to achieve low power consumption and high cost-effectiveness for IMT-2000 multimedia applications. The MPEG-4 video codec consists of a 16-bit multimedia-extended RISC processor and dedicated hardware accelerators, which bring about both low power consumption and programmability. The proposed architecture is extended and applied for the development of two MPEG-4 LSIs. One is an MPEG-4 video codec LSI, which performs an MPEG-4 video encoding and decoding at 15 frames per second with quarter common intermediate format. The other is an MPEG-4 audiovisual LSI, containing three 16-bit RISC processors and a 16-Mbit embedded DRAM, executes the major functions of 3GPP 3G-324M video telephony for IMT-2000 applications. By introducing the optimization of the embedded DRAM configuration, clock gating technique, and low power motion estimation, the MPEG-4 audiovisual LSI consumes only 240 mW when it activates MPEG-4 video SP@L1 codec, the AMR speech codec, and the H.223 annex B multiplex at 60 MHz clock rate.

Journal ArticleDOI
TL;DR: The proposed joint adaptation of source- codec, channel-codec, and modulation regime results in attractive, robust, high-quality audio systems, capable of conveying near-unimpaired wide-band audio signals over fading dispersive channels for signal-to-noise ratios (SNR) in excess of about 5 dB.
Abstract: Turbo-coded burst-by-burst adaptive orthogonal frequency division multiplex (AOFDM) wide-band speech transceivers are proposed. A constant throughput adaptive OFDM transceiver was designed and benchmarked against a time-variant rate scheme. The proposed joint adaptation of source-codec, channel-codec, and modulation regime results in attractive, robust, high-quality audio systems, capable of conveying near-unimpaired wide-band audio signals over fading dispersive channels for signal-to-noise ratios (SNR) in excess of about 5 dB.

Proceedings ArticleDOI
15 May 2000
TL;DR: A variable channel coding scheme for adaptive multirate (AMR) speech transmission over mobile radio channels is proposed and was developed for the GSM, but the basic concept can be adopted to other digital radio systems like UMTS.
Abstract: A variable channel coding scheme for adaptive multirate (AMR) speech transmission over mobile radio channels is proposed. Although it was developed for the GSM, the basic concept of variable channel coding can be adopted to other digital radio systems like UMTS. The AMR concept allows almost wire-line speech quality even for poor channel conditions by dynamically splitting the gross bit rate between source (speech) and channel coding according to the channel quality.

Journal ArticleDOI
TL;DR: This paper has implemented a high-speed speech codec that can process concurrently 20 voice channels with single TMS320C6201 chip in IP telephony gateway and analyzes the performance results of ITU-T G. 729 codec based on T MS320C 6201.
Abstract: ITU-T G. 729 is the primarily recommended speech codec by H. 323 standard. This paper describes how to implement G. 729 codec in IP telephony gateway, and goes deep into the programming skills on TMS320C6201 DSP and optimizing methods of program code to reduce the speech processing delay time of G. 729 codec. Due to adopting these optimizing methods and programming skills, we have implemented a high-speed speech codec that can process concurrently 20 voice channels with single TMS320C6201 chip in IP telephony gateway. Finally, the paper analyzes the performance results of ITU-T G. 729 codec based on TMS320C6201.

Proceedings ArticleDOI
17 Sep 2000
TL;DR: An adaptive multi-rate wideband (AMR-WB) speech codec proposed for the GSM system and also for the evolving third generation (3G) mobile speech services achieves an enhanced performance for background noise while maintaining its clean speech quality.
Abstract: This paper describes an adaptive multi-rate wideband (AMR-WB) speech codec proposed for the GSM system and also for the evolving third generation (3G) mobile speech services. The coder is a multi rate SB-CELP (subband-code excited linear prediction) with five modes operating at bit rates from 24 kbit/s down to 9.1 kbit/s. Our basic approach consists of an unequal band-splitting of the input signal into two subbands (SB). A variable rate, multi-mode ACELP coder is applied to the lower subband (0-6 kHz). The various bit rates are integrated in a common structure where the scalability is realized by exchanging the fixed excitation codebooks while leaving all other codec parameters invariant. For the GSM related modes (9.1-17.8 kbit/s), the upper subband (6-7 kHz) is coded using a very low bit rate representation based on bandwidth expansion techniques. In case of the 3G application (24 kbit/s) the upper band is coded using a 4 kbit/s ADPCM coding scheme. In addition the analysis by synthesis (AbS) coder of the lower band employs a novel closed loop gain re-quantization technique controlled by the character of the speech signal. Thereby the codec achieves an enhanced performance for background noise while maintaining its clean speech quality.

Proceedings ArticleDOI
21 Aug 2000
TL;DR: A high-speed speech codec which can process concurrently 18 voice channels with a single TMS320C6201 chip in the IP telephony gateway is implemented and the performance of the resulting ITU-T G.723.1 speech codec is summarized.
Abstract: This paper describes how to implement the G.723.1 recommendation in the IP telephony gateway and studies in detail the programming of the TMS320C6201 DSP and optimization methods for reducing the speech processing delay of the G.723.1 codec. As a result of adopting these optimization and programming methods, we have implemented a high-speed speech codec which can process concurrently 18 voice channels with a single TMS320C6201 chip in the IP telephony gateway. Finally, the paper summarizes the performance of the resulting ITU-T G.723.1 speech codec.

Proceedings ArticleDOI
01 Jun 2000
TL;DR: An efficient representation of the stochastic codebook component using a pulse density of one pulse per 2 ms and signed magnitudes specified by 2 bits per pulse-pair is introduced and speech quality comparable to 8 kb/s G.729 is achieved.
Abstract: An important step toward achieving a high-quality 4 kb/s speech codec is reducing the coding-rate of the stochastic codebook component to near 2 kb/s. The increased reconstruction error in the residual that such low-rate quantization implies motivates the search for techniques that reduce the perceptibility of the errors in the reconstructed signal. Pitch-synchronous estimation of the linear-prediction filter and pitch-synchronous updating of the adaptive codebook reduce the coefficient-estimation error and increase the relative contribution of the adaptive codebook component to the synthesized signal, thereby reducing audible noise. However, pitch synchronous analysis normally results in a variable-rate coder. To obtain a fixed-rate representation, we introduce an efficient representation of the stochastic codebook component using a pulse density of one pulse per 2 ms and signed magnitudes specified by 2 bits per pulse-pair. The resulting reconstructions are evaluated for CELP coders corresponding to classical and generalized-pitch-predictor designs. In both cases speech quality comparable to 8 kb/s G.729 is achieved.

Proceedings ArticleDOI
05 Jun 2000
TL;DR: An embedded sinusoidal transform codec that is scalable not only in bit-rate, but also in sampling rate, and scales up to 9.6 kbit/s for 16 kHz sampled wideband speech.
Abstract: This paper describes an embedded sinusoidal transform codec that is scalable not only in bit-rate, but also in sampling rate. In a representative implementation, the system produces an embedded bit-stream at 3.2 and 6.4 kbit/s for telephone-bandwidth speech, and scales up to 9.6 kbit/s for 16 kHz sampled wideband speech. The 3.2 kbit/s codec is a sinusoidal transform codec with synthetic phases. The 6.4 kbit/s codec adds resolution to the spectral envelope and transmits measured phases of the eight lowest harmonics. The 9.6 kbit/s codec adds information in the 4 to 8 kHz band to provide higher quality wideband speech.

Patent
16 Aug 2000
TL;DR: In this paper, a multipoint control unit (MCU) is provided which allows for dynamic codec selection according to one embodiment, the MCU (104) causes endpoints (102, 106) to renegotiate their codec selections if a most common available codec is not being used, upon entry of new parties to a teleconference.
Abstract: A multipoint control unit (104) is provided which allows for dynamic codec selection According to one embodiment, the MCU (104) causes endpoints (102, 106) to renegotiate their codec selections if a most-commonly available codec is not being used, upon entry of new parties to a teleconference Alternatively, the codec renegotiation may be performed each time a user speaks, to optimize for maximum transmission quality or for minimizing transcoding

Proceedings ArticleDOI
17 Sep 2000
TL;DR: Results indicate that the proposed scheme can be used to reproduce speech at average bit rates from 2.3 to 3.4 kbps (i.e., in a two-way communication scenario) with very high quality and intelligibility.
Abstract: A significant improvement in the efficiency of excitation coding with CELP at low bit rates is achieved by a new paradigm for encoding the fixed excitation. In the proposed scheme, the non-zero fixed-codebook excitation elements are substantially localized in a set of windows, with positions adaptive to the pitch peaks. Highly efficient coding is thus achieved by allocating most of the available excitation bits to capture the essential excitation events. The paradigm is validated by computer simulation of a variable-rate speech codec. The performance of the codec is evaluated by informal subjective tests and compared with TIA standard variable rate speech codecs. The results indicate that the proposed scheme can be used to reproduce speech at average bit rates from 2.3 to 3.4 kbps (i.e., in a two-way communication scenario) with very high quality and intelligibility.

Journal ArticleDOI
TL;DR: Simulations with various speech data show that the proposed EVRC code book search method yields voice quality equivalent to that by the standard method with only 23% codebook search load.
Abstract: An efficient enhanced variable rate codec (EVRC) codebook search method based on a two-stage search is proposed. At the first stage, a coarse codevector is selected by a fast sequential search, and at the second stage, the pulse replacement procedure is run to enhance the performance of selected codevector. Simulations with various speech data show that the proposed method yields voice quality equivalent to that by the standard method with only 23% codebook search load.

Proceedings ArticleDOI
01 Jan 2000
TL;DR: An architecture design that performs concurrent VLC codec processes with constant symbol rate is presented and is suitable for those applications that require codec processes simultaneously, such as videoconferencing, and high throughput systems,such as HDTV.
Abstract: In this paper, the algorithm of a VLC codec system with new group-based approach is presented. Based on the proposed codeword grouping and symbol memory mapping, the group-searching scheme and codec processes are completed by applying numerical properties and arithmetic operations to codewords and symbol addresses. The memory requirement of encoder is reduced by a novel symbol-converting scheme. Therefore, the programmable coding table and symbol representation can be achieved. Based on MPEG-like systems, an architecture design that performs concurrent VLC codec processes with constant symbol rate is presented. Simulation results show 100 Msps with 100 MHz-clock for both encoding/decoding procedures can be achieved. As a result, it is suitable for those applications that require codec processes simultaneously, such as videoconferencing, and high throughput systems, such as HDTV.