scispace - formally typeset
Search or ask a question

Showing papers on "Adaptive Multi-Rate audio codec published in 1999"


Patent
24 Aug 1999
TL;DR: In this paper, a method of encoding an input speech signal using a multi-rate encoder having a plurality of encoding rates is disclosed, where a high-pass filter and then a perceptual weighting filter are applied to such signal to generate a first target signal.
Abstract: A method of encoding an input speech signal using a multi-rate encoder having a plurality of encoding rates is disclosed. A high-pass filter and then a perceptual weighting filter are applied to such signal to generate a first target signal. An adaptive codebook vector is identified from an adaptive codebook using the first target signal by filtering the vector to generate a filtered adaptive codebook vector. An adaptive codebook gain for the adaptive codebook vector is calculated and an error signal minimized. The adaptive codebook gain is adaptively reduced based on one encoding rate from the plurality of encoding rates to generate a reduced adaptive codebook gain. A second target signal based at least on the first target signal and the reduced adaptive codebook gain is generated. The input speech signal is converted into an encoded speech based on the second target signal.

111 citations


Proceedings ArticleDOI
20 Jun 1999
TL;DR: The adaptive multi-rate (AMR) speech coder currently under standardization for GSM systems as part of the AMR speech service is described, which provides seamless switching on 20 ms frame boundaries and the quality when used on GSM channels is significantly higher than for existing services.
Abstract: In this paper, we describe the adaptive multi-rate (AMR) speech coder currently under standardization for GSM systems as part of the AMR speech service. The coder is a multi-rate ACELP coder with 8 modes operating at bit-rates from 12.2 kbit/s down to 4.75 kbit/s. The coder modes are integrated in a common structure where the bit-rate scalability is realized mainly by altering the quantization schemes for the different parameters. The coder provides seamless switching on 20 ms frame boundaries. The quality when used on GSM channels is significantly higher than for existing services.

85 citations


Proceedings ArticleDOI
20 Jun 1999
TL;DR: A hybrid ACELP/TCX algorithm for coding speech and music signals at 16, 24, and 32 kbit/s is presented, which switches between algebraic code excited linear prediction (ACELP) and transform coded excitation (TCX) modes on a 20-ms frame basis.
Abstract: A hybrid ACELP/TCX algorithm for coding speech and music signals at 16, 24, and 32 kbit/s is presented. The algorithm switches between algebraic code excited linear prediction (ACELP) and transform coded excitation (TCX) modes on a 20-ms frame basis. Applying TCX on 20 ms frames improved the quality for music signals. Special care was taken to alleviate the switching artifacts between the two modes resulting in a transparent switching process. Subjective test results showed that for speech signals, the performance at 16, 24, and 32 kbit/s, is equivalent to G.722 at 48, 56, and 64 kbit/s, respectively. For music signals, the quality at 24 kbit/s was found equivalent to G.722 at 56 kbit/s. However, at 16 kbit/s, the quality for music was slightly lower than G.722 at 48 kbit/s.

76 citations


Proceedings ArticleDOI
A.J. Accardi1, R.V. Cox
15 Mar 1999
TL;DR: In this paper, a modified version of Ephraim and van trees's (see IEEE Trans. Speech and Audio Proc., vol.3, p.251-66, 1995) spectral domain constrained signal subspace estimator is used in this manner, obtaining a system with greater flexibility and similar performance.
Abstract: Ephraim and Malah's (1984, 1985) MMSE-LSA speech enhancement algorithm, while robust and effective, is difficult to tune and adjust for the tradeoff between noise reduction and distortion. We suggest a means of generalizing this design, which allows for other estimators besides the MMSE-LSA to be used within the same supporting framework. When a modified version of Ephraim and Van Trees's (see IEEE Trans. Speech and Audio Proc., vol.3, p.251-66, 1995) spectral domain constrained signal subspace estimator is used in this manner, we obtain a system with greater flexibility and similar performance. We also explore the possibility of using different speech enhancement techniques as pre-processors for different parameter extraction modules of the IS-641 speech coder (a 7.4 kbit/s ACELP codec). We show that such a strategy can increase the quality of the coded speech and lead to a system that is more robust to differing noise types.

70 citations


Proceedings ArticleDOI
16 May 1999
TL;DR: Various approaches for link adaptation with respect to varying radio channel conditions are described and the method of inband signaling that is standardized is discussed and motivated.
Abstract: The European Telecommunications Standards Institute (ETSI) has just defined an adaptive multi rate (AMR) speech codec standard for the GSM system with a multitude of source and channel coding rates. The standard aims to provide robust high quality speech together with the flexibility to deliver radio network capacity enhancements by means of low bit-rate operation. The codec rates are dynamically selected with respect to the rapidly changing radio conditions and to local capacity requirements. This paper describes various approaches for link adaptation with respect to varying radio channel conditions and puts a focus on the solution in the AMR standard. Moreover the method of inband signaling that is standardized is discussed and motivated.

59 citations


Journal Article
TL;DR: An MPEG~2 AAC-derived codec which was optimized for very low delay and accepted as the baseline of development for low-delay coding in MPEG-4 version 2 audio is described.
Abstract: Perceptual audio coding is known to deliver high sound quality even at low bit rates for a broad range of audio signals. However, the total delay of the encoder/decoder chain is usually considerably higher than acceptable for two-way communication applications, such as teleconferencing. This paper discusses the primary sources of algorithmic delay in a perceptual audio codec and describes an MPEG~2 AAC-derived codec which was optimized for very low delay and accepted as the baseline of development for low-delay coding in MPEG-4 version 2 audio.

55 citations


Proceedings ArticleDOI
15 Mar 1999
TL;DR: A combined adaptive transform codec (ATC) and code-excited linear prediction (CELP) algorithm for the compression of wideband (7 kHz) signals is described and a switching scheme between CELP and ATC mode is proposed and a frame erasure concealment technique is proposed.
Abstract: This paper describes a combined adaptive transform codec (ATC) and code-excited linear prediction (CELP) algorithm, called ATCELP, for the compression of wideband (7 kHz) signals. The CELP algorithm applies mainly to speech, whereas the ATC mode is selected for music and noise signals. We propose a switching scheme between CELP and ATC mode and describe a frame erasure concealment technique. Subjective listening tests have shown that the ATCELP codec at bit rates of 16, 24 and 32 kbit/s achieved performances close to those of the CCITT G.722 at 48, 56 and 64 kbit/s, respectively, at most operating conditions.

48 citations


Proceedings ArticleDOI
S.A. Ramprashad1
20 Jun 1999
TL;DR: This multimode transform predictive coder (MTPC) shows improved performance on both speech and audio inputs when compared to a single-mode transform predictivecoder (TPC).
Abstract: Speech and audio coding are often considered to be two separate technologies, each almost independently developing different techniques for signal compression. At low bit rates the gap in performance between the two technologies begins to be noticeable; speech coders work better on speech and audio coders perform better on music. The challenge is to merge the two technologies into a single coding paradigm which will work as well as either two regardless of the input signal. Presented is a multimode speech and audio coder which can adapt almost continuously between a speech and audio coding mode. This multimode transform predictive coder (MTPC) shows improved performance on both speech and audio inputs when compared to a single-mode transform predictive coder (TPC).

44 citations


Proceedings ArticleDOI
17 Oct 1999
TL;DR: A brief tutorial overview of parametric audio coding is given and the parametric coder currently developed in the MPEG-4 audio standardisation is described.
Abstract: Parametric modelling provides an efficient representation of general audio signals and is utilised in very low bit rate audio coding. It is based on the decomposition of an audio signal into components which are described by appropriate source models and represented by model parameters. Perception models are utilised in signal decomposition and model parameter coding. This paper gives a brief tutorial overview of parametric audio coding and describes the parametric coder currently developed in the MPEG-4 audio standardisation. Recent advances as well as novel approaches in this field are presented.

43 citations


Patent
19 Aug 1999
TL;DR: In this article, a multipoint control unit (MCU) is provided which allows for dynamic codec selection, and the endpoints can renegotiate their codec selections if a most common available codec is not being used, upon entry of new parties to a teleconference.
Abstract: A multipoint control unit ( 104 ) is provided which allows for dynamic codec selection. According to one embodiment, the MCU ( 104 ) causes endpoints ( 102, 106 ) to renegotiate their codec selections if a most-commonly available codec is not being used, upon entry of new parties to a teleconference. Alternatively, the codec renegotiation may be performed each time a user speaks, to optimize for maximum transmission quality or for minimizing transcoding.

31 citations


Journal ArticleDOI
TL;DR: A two stage hybrid embedded speech/audio coding structure and algorithm is proposed which can be used to enhance the quality of an existing codec without modification of the original coding algorithm.
Abstract: A two stage hybrid embedded speech/audio coding structure and algorithm are proposed. The first stage of the structure consists of a core speech coder which provides a minimum output bit rate and acceptable performance on clean speech inputs. The second stage is a perceptual/transform based coder which provides a separate optional bitstream for the enhancement of the core stage output. The two stage structure can be used to enhance the quality of an existing codec without modification of the original coding algorithm. In this regard it can be considered a value added option that can be used with a standard (existing) system. The structure can also be used in systems in which many users/systems force the coding algorithm to work simultaneously under multiple constraints of bitrate, complexity, delay, and coding quality. Informal testing of the algorithm has been done using ITU-T standard G.723.1 at 5.3 kb/s as a core coder. The maximum combined bitrate from the core and enhancement stages for the tests is 16 kb/s. The tests show that the second stage significantly improves the quality of the core output in the cases of music and speech with background noise. Compared to the non-embedded fixed rate standard LD-CELP G.728 at 16 kb/s, the quality of the two stage structure is generally lower on these inputs; the embedded feature does affect quality. On clean speech the quality of the two stage structure at 16 kb/s is close to if not better than that of G.728 at 16 kb/s.

Proceedings ArticleDOI
15 Mar 1999
TL;DR: Subjective quality evaluation tests showed that the scalable codec constructed by using transform coding and the basic modules for scalable encoder and decoder is better than that of an MPEG-2 layer 3 codec at 8, 16, and 24 kbit/s when the authors' scalable codec is constructed of 8-k bit/s basic modules.
Abstract: A scalable codec has been constructed by using transform coding and the basic modules for scalable encoder and decoder. It allows users to choose a variety of scalable configurations in the frequency domain. The basic module is a quantizer that can quantize MDCT (modified DCT) coefficients transformed from a variety of frequency regions. This module mainly works at bit rates of more than 8 kbit/s. We can also change the target frequency regions of the basic module's input-output signals in each transform frame; i.e., we can change the scalable structure according to the nature of the input signals. In the scalable codec described here, the input-output signals are monaural and the sampling frequency is 24 kHz. The total bit rate of this scalable codec is more than 8 kbit/s. Subjective quality evaluation tests, mainly for musical sound sources, showed that it's sound quality is better than that of an MPEG-2 layer 3 codec at 8, 16, and 24 kbit/s when our scalable codec is constructed of 8-kbit/s basic modules. In combination with AAC (advanced audio coding), our scalable codec will be chosen as an international standard in ISO/IEC-MPEG-4/Audio.

Proceedings ArticleDOI
15 Mar 1999
TL;DR: An adaptive multi-rate (AMR) speech coder designed to operate under the GSM digital cellular full rate and half rate channels and to maintain high quality in the presence of highly varying background noise and channel conditions is developed.
Abstract: We have developed an adaptive multi-rate (AMR) speech coder designed to operate under the GSM digital cellular full rate (22.8 kb/s) and half rate (11.4 kb/s) channels and to maintain high quality in the presence of highly varying background noise and channel conditions. Within each total rate, several codec modes with different source/channel bit rate allocations are used. The speech coders in each codec mode are based on the CELP algorithm operating at rates ranging from 11.85 kb/s down to 5.15 kb/s, where the lowest rate coder is a source controlled multi-modal speech coder. The decoders monitor the channel quality at both ends of the wireless link using the soft values for the received bits and assist the base station in selecting the codec mode that is appropriate for a given channel condition. The coder was submitted to the GSM AMR standardization competition and met the qualification requirements in an independent formal MOS test.

Proceedings ArticleDOI
15 Mar 1999
TL;DR: This work extends previous research on a new approach to automatic speech recognition (ASR) in the GSM environment and concludes that the proposed approach is much more effective in coping with the coding distortion and transmission errors.
Abstract: We have extended our previous research on a new approach to automatic speech recognition (ASR) in the GSM environment. Instead of recognizing from the decoded speech signal, our system works from the digital speech representation used by the GSM encoder. We have compared the performance of a conventional system and the one we propose on a speaker independent, isolated-digit ASR task. For the half and full-rate GSM codecs, from our results, we conclude that the proposed approach is much more effective in coping with the coding distortion and transmission errors. Furthermore, in clean speech conditions, our approach does not impoverish the recognition performance, even recognizing from GSM digital speech, in comparison with a conventional system working on unencoded speech.

Proceedings ArticleDOI
A. Vahatalo1, I. Johansson
20 Jun 1999
TL;DR: The VAD for controlling DTX of the GSM AMR (adaptive multi-rate) speech codec is described, which is based on spectral estimation and periodicity detection and incorporates novel methods to estimate background noise and to detect periodic components based on open-loop pitch gain.
Abstract: This paper describes the VAD (voice activity detection) for controlling DTX (discontinuous transmission) of the GSM AMR (adaptive multi-rate) speech codec. The algorithm is based on spectral estimation and periodicity detection. The VAD contains a 9-band IIR filter bank, which divides input signals into frequency bands. The signal level at each band is calculated. Background noise is estimated in each sub-band. The VAD decision is computed by comparing input signal level and background noise estimate. The algorithm incorporates novel methods to estimate background noise and to detect periodic components based on open-loop pitch gain. A new method is also derived to detect correlated complex signals like music.

Patent
18 Jun 1999
TL;DR: In this article, a method and apparatus for controlling the transition of a bypass capable codec between operative modes, based on a certain characteristic of the audio data signal processed by the codec, is presented.
Abstract: The invention relates to a method and apparatus for controlling the transition of a bypass capable codec between operative modes, based on a certain characteristic of the audio data signal processed by the codec. The apparatus relies on a control signal to determine when the codec will switch from one mode to another. This control signal reflects a characteristic of the audio data signal received at the apparatus, such as the type of speech activity or the format of the audio data signal. When in the active (non-bypass) mode, the apparatus relies on an additional control signal to switch to the inactive (bypass) mode. This additional control signal is received from a control unit at a remote codec that indicates that the remote codec is also bypass capable, hence the decoder at the first codec and the encoder at the remote codec can switch to the inactive mode to pass between them the compressed data frames.

Patent
Oestreich Stefan1
05 Feb 1999
TL;DR: In this paper, a speech coder/decoder can select a broadband and a narrowband speech coding method for a connection to a mobile station, a monitoring of transmission possibilities is performed, and, given limited transmission possibilities, there is a changeover from broadband to narrowband Speech coding methods.
Abstract: A speech coder/decoder can select a broadband and a narrowband speech coding method. For a connection to a mobile station, a monitoring of transmission possibilities is performed, and, given limited transmission possibilities, there is a changeover from broadband to narrowband speech coding methods. The received narrowband speech information is expanded to a greater bandwidth at the receive side. The subjective speech impression is improved by the bridging of this changeover effect. This guarantees an improved speech quality to the listener, particularly with the introduction of adaptive multirate coding.

Patent
08 Dec 1999
TL;DR: In this article, the authors present a data CODEC system for computer consisting of a system control software, a multichannel audio/speech and multimedia data signal processor, and a multi-channel audio and speech and multimedia input-output unit.
Abstract: The present invention relates to a data CODEC system for computer. The data CODEC system for computer comprises a system control software, a multichannel audio/speech and multimedia data signal processor, and a multichannel audio/speech and multimedia data input-output unit. The system control software communicates multichannel audio/speech and multimedia data with the multichannel audio/speech and multimedia data signal processor according to control of various application programs. The multichannel audio/speech and multimedia data signal processor processes multichannel audio/speech and multimedia data. The multichannel audio/speech and multimedia data input-output means inputs/outputs multichannel audio/speech and multimedia data from/to an external system.

Proceedings ArticleDOI
15 Mar 1999
TL;DR: This paper presents the essential framework and the unique advantages of a multimode VBR codec and suggests algorithms for the different modes.
Abstract: The speech signal consists of a time-varying ensemble of different types of segments with distinct characteristics, which require different degrees of coding resolution in order to retain an overall high voice quality. A fixed-rate coder can capture such time-varying characteristics only if it operates at a high enough bit rate. At a low bit rate, a fixed-rate coder will not be able to capture all of these various segments well and will fail to render high voice quality. A multimode variable bit rate (VBR) coder uses an arsenal of modes, operating at different bit rates. These modes are designed to represent these different speech segments optimally with the right amount of coding resolution. Thus, a multimode VBR codec adapts the coding mechanism to the input speech and delivers high quality at low (average) rates. This paper presents the essential framework and the unique advantages of a multimode VBR codec and suggests algorithms for the different modes.

01 Sep 1999
TL;DR: It is proved that the MPEG-4 Structured Audio tool can be used to mimic the behavior of any other kind of decoder and that structured-audio coding is a universally minimal coding technique.
Abstract: The MPEG-4 Structured Audio standard was created to enable high-quality, very-low-bitrate transmission of synthetic sound. However, structured-audio techniques also are suitable for flexible natural audio coding. This paper introduces the concept of generalized audio coding, in which the Structured Audio decoder is used to emulate the behavior of other audio decoders. We prove that the MPEG-4 Structured Audio tool can be used to mimic the behavior of any other kind of decoder and that structured-audio coding is a universally minimal coding technique. We provide examples of simple natural audio coders that use the SA toolset, and characterize the overhead that arises in the transcoding process. Generalized audio coding removes marketplace barriers to the use of special-purpose or signal-adaptive coding formats, and thus promotes greater overall efficiency in the world of audio coding.

Proceedings ArticleDOI
20 Jun 1999
TL;DR: A wideband (7 kHz) speech coding scheme using code-excited linear prediction (CELP) with mixed time and frequency domain excitation with improved synthesis filter is described.
Abstract: This paper describes a wideband (7 kHz) speech coding scheme using code-excited linear prediction (CELP) with mixed time and frequency domain excitation. The proposed frequency domain innovation can be used alternatively or in parallel to a time domain codebook. In addition an improved synthesis filter is used consisting of a signal dependent combination of a forward adaptive and a backward adaptive (FA/BA) structure. An experimental codec operating at 15.5 or 20.0 kbit/s is demonstrated.


Proceedings ArticleDOI
20 Jun 1999
TL;DR: This work proposes the usage of what it calls parameter individual block codes (PIBC) for the most important codec parameters, which allows joint speech codec parameter and PIBC decoding taking advantage of the error concealing properties of soft-bit speech decoding.
Abstract: In digital mobile speech transmission usually the most important (class la) bits provided by the speech coding scheme are protected by a CRC for error detection. As a consequence all parameters spanned by the class la bits have to be marked at the receiver either as reliable or as unreliable. In contrast to this somewhat coarse approach we propose the usage of what we call parameter individual block codes (PIBC) for the most important codec parameters. This allows joint speech codec parameter and PIBC decoding taking advantage of the error concealing properties of soft-bit speech decoding.


Journal ArticleDOI
TL;DR: Comparisons with the recent ITU-T G.729 8 kbit/s standard, used in the discontinuous transmission mode, demonstrate that the proposed coder provides an average bit rate reduction of about 20% maintaining the same algorithmic delay and perceptive quality.
Abstract: This letter deals with a variable bit-rate CS-ACELP speech coder based on new algorithms that are robust in the presence of the background noise typical of wireless communications. The coder presents eight operating modes ranging from 0-8 kbit/s with an average bit-rate of about 4 kbit/s. Subjective and objective comparisons with the recent ITU-T G.729 8 kbit/s standard, used in the discontinuous transmission mode, demonstrate that the proposed coder provides an average bit rate reduction of about 20% maintaining the same algorithmic delay and perceptive quality.

Proceedings ArticleDOI
S. Heinen1, M. Adratm, O. Steil, Peter Vary, Wen Xu 
15 Mar 1999
TL;DR: A new 6.1 to 13.3-kb/s speech codec is proposed called variable rate code-excited linear prediction (VR-CELP) for adaptive multi-rate (AMR) transmission over mobile radio channels such as GSM or UMTS to enhance the transmission quality under very poor channel conditions.
Abstract: We propose a new 6.1 to 13.3-kb/s speech codec called variable rate code-excited linear prediction (VR-CELP) for adaptive multi-rate (AMR) transmission over mobile radio channels such as GSM or UMTS. The AMR concept allows to operate with almost wireline speech quality for poor channel conditions and better quality for good channel conditions. This is achieved by dynamically splitting the gross bit rate of the transmission system between source and channel coding according to the current channel conditions. Thus the source coding scheme must be designed for seamless switching between rates without annoying artifacts. To enhance the transmission quality under very poor channel conditions, a new powerful error concealment strategy based on estimation theory is applied.

Proceedings ArticleDOI
15 Mar 1999
TL;DR: The HE-LPC coder has the potential of producing high quality speech at 4.8 kb/s and below and employs a new pitch estimation and voicing technique, and new DCT based LPC and residual amplitude quantization techniques have been developed.
Abstract: The harmonic excitation linear predictive speech coder (HE-LPC) is a technique derived from MBE and MB-LPC type of speech coding algorithms. The HE-LPC coder has the potential of producing high quality speech at 4.8 kb/s and below. This coder employs a new pitch estimation and voicing technique. In addition, new DCT based LPC and residual amplitude quantization techniques have been developed. The 4 kb/s HE-LPC coder with a 14th order LPC filter was found to produce much better speech quality than the various low rate speech coding standards, including 3.6 kb/s INMARSAT Mini-M AMBE vocoder. During formal ITU ACR test, the 4 kb/s HE-LPC vocoder was found to produced equivalent performance to 32 kb/s ADPCM and G.729 for both flat and modified IRS filtered clean input speech conditions. The HE-LPC algorithm can also be extended to cover bit rates between 1.2 and 8 kb/s range depending on the application.

Journal ArticleDOI
TL;DR: This paper gives a brief overview on the complete audio part of the MPEG-4 standard and more detailed information on its parts related to speech coding.
Abstract: While previous MPEG Audio standards mainly were focused on the representation of audio signals close to or equal to CD quality, the new MPEG-4 Audio standard extends the range of applicability towards significantly lower bit rates. Furthermore it offers extended functionalities for the representation of natural and even synthetic audio signals in an object oriented fashion. This paper gives a brief overview on the complete audio part of the MPEG-4 standard and more detailed information on its parts related to speech coding.

Journal ArticleDOI
20 Oct 1999
TL;DR: A PC-based real-time software MPEG-4 video codec with a fast adaptive motion vector search is presented and this technique suppresses load fluctuation in the ME and contributes to the stable real- time work of the software codec.
Abstract: A PC-based real-time software MPEG-4 video codec with a fast adaptive motion vector search is presented. In a fast adaptive motion estimation (ME) technique, the search order is dynamically changed in accordance with the motion of objects. This technique suppresses load fluctuation in the ME and contributes to the stable real-time work of the codec. MMX instructions are used to increase the codec speed. On a portable PC, the software video codec supports satisfactory mobile visual communication at 64 kbps and 128 kbps, for example, at QCIF 15 fps. The codec on a 450 MHz Pentium II processor can encode and decode 30 CIF frames in real-time.

Proceedings ArticleDOI
16 May 1999
TL;DR: This DSP has the capability of processing these algorithms in real-time and has excellent flexibility, so that it can, for instance, perform video codec at 15 CIF frames/sec or video/speech (G.723.1) codec at 30 QCIF frames/, making it possible to realize low-cost systems.
Abstract: We have developed a programmable DSP for MPEG4, H.263, H.261 and wavelet based sub-band codec algorithms. This DSP has the capability of processing these algorithms in real-time and has excellent flexibility, so that it can, for instance, perform video codec at 15 CIF frames/sec or video/speech (G.723.1) codec at 30 QCIF frames/sec. This chip includes a video pre/post-processing engine and needs only one 16 Mbit SDRAM as an external memory to perform the above algorithms, making it possible to realize low-cost systems. This chip is fabricated using 0.25 um CMOS technology and contains 7.7 M transistors on 9.41 mm/spl times/9.22 mm die.