scispace - formally typeset
Search or ask a question

Showing papers on "Adaptive Multi-Rate audio codec published in 2002"


Book
01 Jan 2002
TL;DR: This paper presents a meta-modelling framework for building a Perceptual Audio Decoder that automates the very labor-intensive and therefore time-heavy, and therefore expensive, and expensive, process of Audio Coding.
Abstract: Foreword. Preface. I: Audio Coding Methods. 2. Quantization. 3. Representation of Audio Signals. 4. Time to Frequency Mapping Part I: The PQMF. 5. Time to Frequency Mapping Part II: The MDCT. 6. Introduction to Psychoacoustics. 7. Psychoacoustic Models for Audio Coding. 8. Bit Allocation Strategies. 9. Building a Perceptual Audio Decoder. 10. Quality Measurement of Perceptual Audio Codecs. II: Audio Coding Standards. 11. MPEG-1 Audio. 12. MPEG-2 Audio. 13. MPEG-2 AAC. 14.Dolby AC-3. 15. MPEG-4 Audio. Index.

367 citations


Journal ArticleDOI
TL;DR: In this paper, the adaptive multirate wideband (AMR-WB) speech codec was selected by the Third Generation Partnership Project (3GPP) for GSM and the third generation mobile communication WCDMA system for providing wideband speech services.
Abstract: This paper describes the adaptive multirate wideband (AMR-WB) speech codec selected by the Third Generation Partnership Project (3GPP) for GSM and the third generation mobile communication WCDMA system for providing wideband speech services. The AMR-WB speech codec algorithm was selected in December 2000 and the corresponding specifications were approved in March 2001. The AMR-WB codec was also selected by the International Telecommunication Union-Telecommunication Sector (ITU-T) in July 2001 in the standardization activity for wideband speech coding around 16 kb/s and was approved in January 2002 as Recommendation G.722.2. The adoption of AMR-WB by ITU-T is of significant importance since for the first time the same codec is adopted for wireless as well as wireline services. AMR-WB uses an extended audio bandwidth from 50 Hz to 7 kHz and gives superior speech quality and voice naturalness compared to existing second- and third-generation mobile communication systems. The wideband speech service provided by the AMR-WB codec will give mobile communication speech quality that also substantially exceeds (narrowband) wireline quality. The paper details AMR-WB standardization history, algorithmic description including novel techniques for efficient ACELP wideband speech coding and subjective quality performance of the codec.

312 citations


01 Jan 2002
TL;DR: The adaptive multirate wideband (AMR-WB) speech codec selected by the Third Generation Partnership Project (3GPP) for GSM and the third generation mobile communication WCDMA system for providing wideband speech services is described.
Abstract: This paper describes the adaptive multirate wideband (AMR-WB) speech codec selected by the Third Generation Partnership Project (3GPP) for GSM and the third generation mobile communication WCDMA system for providing wideband speech services. The AMR-WB speech codec algorithm was selected in December 2000 and the corresponding specifications were approved in March 2001. The AMR-WB codec was also selected by the International Telecommunication Union-Telecommunication Sector (ITU-T) in July 2001 in the standardization activity for wideband speech coding around 16 kb/s and was approved in January 2002 as Recommendation G.722.2. The adoption of AMR-WB by ITU-T is of significant importance since for the first time the same codec is adopted for wireless as well as wireline services. AMR-WB uses an extended audio bandwidth from 50 Hz to 7 kHz and gives superior speech quality and voice naturalness compared to existing second- and third-generation mobile communication systems. The wideband speech service provided by the AMR-WB codec will give mobile communication speech quality that also substantially exceeds (narrowband) wireline quality. The paper details AMR-WB standardization history, algorithmic description including novel techniques for efficient ACELP wideband speech coding and subjective quality performance of the codec.

282 citations


Book ChapterDOI
25 Jun 2002
TL;DR: This chapter brings together some of the concepts discussed earlier and examines the issues faced by designers of video CODECs and systems that interface to video C ODECs, including interfacing, performance, resource usage and design time.
Abstract: In this chapter we bring together some of the concepts discussed earlier and examine the issues faced by designers of video CODECs and systems that interface to video CODECs. Key issues include interfacing (the format of the input and output data, controlling the operation of the CODEC), performance (frame rate, compression, quality), resource usage (computational resources, chip area) and design time. This last issue is important because of the fast pace of change in the market for multimedia communication systems. A short time-tomarket is critical for video coding applications and we discuss methods of streamlining the design flow. We present design strategies for two types of video CODEC, a software implementation (suitable for a general-purpose processor) and a hardware implementation (for FPGA or ASIC).

202 citations


PatentDOI
TL;DR: In this article, a transform coding method for music signals was proposed, which is suitable for use in a hybrid codec, whereby a common linear predictive (LP) synthesis filter was employed for both speech and music signals.
Abstract: The present invention provides a transform coding method efficient for music signals that is suitable for use in a hybrid codec, whereby a common Linear Predictive (LP) synthesis filter is employed for both speech and music signals. The LP synthesis filter switches between a speech excitation generator and a transform excitation generator, in accordance with the coding of a speech or music signal, respectively. For coding speech signals, the conventional CELP technique may be used, while a novel asymmetrical overlap-add transform technique is applied for coding music signals. In performing the common LP synthesis filtering, interpolation of the LP coefficients is conducted for signals in overlap-add operation regions. The invention enables smooth transitions when the decoder switches between speech and music decoding modes.

126 citations



Proceedings ArticleDOI
06 Oct 2002
TL;DR: This paper presents mean opinion scores for the iLBC codec and shows by means of signal examples how the nature of degradation in a predictive codec based on frame-independent long-term prediction differs from that of traditional CELP codecs.
Abstract: In this paper, we discuss the internet low bit rate codec (iLBC) with an emphasis on the frame-independent long-term prediction. The frame-independent long-term prediction is a method to exploit pitch-lag correlations in the encoding of speech without suffering multiple-frame speech degradation in connection with transmission loss. We present mean opinion scores for the iLBC codec and show by means of signal examples how the nature of degradation in a predictive codec based on frame-independent long-term prediction differs from that of traditional CELP codecs.

92 citations


Proceedings ArticleDOI
07 Aug 2002
TL;DR: An MPEG-4 video codec core based on a scene-adaptive motion estimation algorithm is integrated into 5.296/spl times/5.296 mm/sup 2/ die using 0.18 /spl mu/m quad-metal technology.
Abstract: An MPEG-4 video codec core based on a scene-adaptive motion estimation algorithm is integrated into 5.296/spl times/5.296 mm/sup 2/ die using 0.18 /spl mu/m quad-metal technology. The power dissipation during codec operation of the device is 131 mW for QCIF format at 15 frames/s at 13.5 MHz using a 1.5 V supply.

47 citations


Proceedings ArticleDOI
06 Oct 2002
TL;DR: FEC methods like partial redundancy, selective redundancy for the most sensitive frames and parameter interpolation in conjunction with AMR codec mode adaptation are proposed, which secure the speech quality when using AMR for VoIP without increasing the bandwidth substantially.
Abstract: An example of a bandwidth efficient adaptive multi rate (AMR) system for Voice over IP (VoIP) is presented. In VoIP, packet losses cause degradation of the synthesized speech. The distortions may propagate over several consecutive frames, since predictors in the codec exploit inter-frame correlations to gain coding efficiency. To reduce the effects of packet loss, forward error correction (FEC) that adds redundant information to voice packets can be used. However, while FEC can reduce the effects of packet loss, it will increase the amount of bandwidth used by the voice stream, which is not desirable. In this paper we propose FEC methods like partial redundancy, selective redundancy for the most sensitive frames and parameter interpolation in conjunction with AMR codec mode adaptation, which secure the speech quality when using AMR for VoIP without increasing the bandwidth substantially.

43 citations


Patent
Jose Costa Requena1, Esa Jalonen1
21 May 2002
TL;DR: In this article, a method for communicating codec related information between a first mobile communication device (UE1) and a second mobile communication devices (UE2) via a network was proposed.
Abstract: The invention relates to a method for communicating codec related information between a first mobile communication device (UE1) and a second mobile communication device (UE2) via a network. The codec related information relates to operational modes of an adaptive multi-rate (AMR) codec. The information is transmitted by the first mobile communication device (UE1) to the second mobile communication device (UE2) in a form of a binary mask in an SDP (Session Description Protocol) body (33) of a SIP (Session Initiation Protocol) INVITE message (31), a value of 1 indicating the operational modes which are supported by the first mobile communication device (UE1). In the event that the network does not support at least one of the operational modes that the codec related information indicates as being supported the network modifies the codec related information accordingly. Fig. 4a

37 citations


Proceedings ArticleDOI
04 Aug 2002
TL;DR: In this article, a non-linear enhancement technique called audio-visual codebook dependent cepstral normalization (AVCDCN) was proposed for both audio-only and audio-Visual speech recognition.
Abstract: We introduce a non-linear enhancement technique called audio-visual codebook dependent cepstral normalization (AVCDCN) and we consider its use with both audio-only and audio-visual speech recognition. AVCDCN is inspired from CDCN, an audio-only enhancement technique that approximates the nonlinear effect of noise on speech with a piecewise constant function. Our experiments show that the use of visual information in AVCDCN allows significant performance gains over CDCN.

Proceedings ArticleDOI
Jari Mäkinen1, Pasi Ojala1, Janne Vainio1
06 Oct 2002
TL;DR: The presented concept introduces up to 50% reduction in average bit rate without any degradation in speech quality to increase the system capacity in conversational services as well as storage size in messaging type of applications.
Abstract: This paper presents a source based rate adaptation concept for AMR wideband speech codec. The source based rate adaptation algorithm selects the multi rate codec mode based on the input speech characteristics and coding parameters to minimise the average bit rate. The presented concept introduces up to 50% reduction in average bit rate without any degradation in speech quality. The benefit of source based adaptation is in increasing the system capacity in conversational services as well as storage size in messaging type of applications.

Proceedings ArticleDOI
10 Dec 2002
TL;DR: An efficient codec architecture for context-based adaptive arithmetic coding is proposed, which exhibits low cost, low latency, and high throughput rate and can be programmed for supporting multiple standards such as JPEG, JPEG2000, JBIG, andJBIG2 standards.
Abstract: For next generation image compression standard, context-based arithmetic coding is adopted for improving the compression rate. An efficient and high throughput codec design is strongly required for handling high-resolution images. We propose an efficient codec architecture for context-based adaptive arithmetic coding, which exhibits low cost, low latency, and high throughput rate. In addition, it can be programmed for supporting multiple standards such as JPEG, JPEG2000, JBIG, and JBIG2 standards. It exploits three-pipeline stages architecture. Based on parallel leading zeros detection and bit-stuffing handling, symbols can be encoded and decoded within one cycle. Therefore, the throughput rate can be increased as high as the codec operating clock rate. For 0.35 /spl mu/ 1P4M CMOS technology, both the encoding and decoding rate can run up to 185 M symbol/sec. The AC codec only costs 12 K gate count and 860 /spl mu/m/spl times/860 /spl mu/m layout area. These performances can meet high-resolution real time application requirements.

Proceedings ArticleDOI
13 May 2002
TL;DR: Subjective tests showed that, for wideband speech and audio signals, the proposed coder at 18 kbit/s produces superior quality to ITU-T 24 k bit/s G.722.1 with the shorter algorithmic delay.
Abstract: In this paper, we present a new bandwidth-scalable coder for wide band speech and audio signals. The proposed coder splits 8 kHz signal bandwidth into two narrow bands, and different coding schemes are applied to each band. The lower-band speech is coded with ITU-T G.729 Annex E, and the higher-band signal is compressed using a new algorithm based on the gammatone filter band with an invertible auditory model. Due to the split-band architecture and completely independent coding schemes for each band, the output speech of the decoder can be selected to be a narrowband or wideband coding according to the channel conditions. Subjective tests showed that, for wideband speech and audio signals, the proposed coder at 18 kbit/s produces superior quality to ITU-T 24 kbit/s G.722.1 with the shorter algorithmic delay.

Patent
09 Jul 2002
TL;DR: In this article, a technique to change the codec or MAC (message authentication code) size in a packet security unit for PacketCable communications during real-time voice transmissions is described.
Abstract: A technique is described to change the codec or MAC (message authentication code) size in a packet security unit for PacketCable communications during realtime voice transmissions is described. An algorithm that provides fast RC4 key advancing to prevent MIPS (millions of instruction cycles per second) overflow is used to perform codec or MAC size changes. The invention is performed without changing the keying material, where the sender and receiver must continue the RC4 encryption process from its state prior to the codec or MAC size change. A sender needs to preserve continuity of the timestamp across a codec change, since the timestamp reflects realtime. Changing the codec or MAC size is likely to change the frame parameters. To preserve continuity of the RC4 state and the timestamp across the codec/MAC size change, the sender TX and receiver RX generates a new frame number. The new frame number is applied to the first frame generated by the new codec or MAC size.

05 Jul 2002
TL;DR: The support is so constructed as to be rockable in such a manner that the heel seat portion, the shank portion, and the forepart portion of the upper margin all lie in horizontal planes during their movement past the roughing tool.
Abstract: {PG,1 A roughing machine having a support for supporting bottom-up a shoe assembly formed of a last having an insole located on its bottom and an upper mounted thereon with the margin of the upper secured to the bottom of the insole and a roughing tool located above the shoe assembly and yieldably urged downwardly against the shoe assembly so as to rough the upper margin during movement of the support to move the upper margin past the roughing tool. The shoe assembly bottom has its heel seat portion, its shank portion and its forepart portion lying in different planes. The support is so constructed as to be rockable in such a manner that the heel seat portion, the shank portion, and the forepart portion of the upper margin all lie in horizontal planes during their movement past the roughing tool.


Journal ArticleDOI
10 Dec 2002
TL;DR: This paper describes a quarter video graphic array/common intermediate format (QVGA/CIF) resolution MPEG-4 video codec based on a low-power, general-purpose digital signal processor (DSP) (NEC μPD77210, 160 MHz, 80 mW, 1.5 V).
Abstract: This paper describes a QVGA/CIF resolution MPEG-4 video codec based on a low-power, general-purpose DSP (NEC /spl mu/PD77210,160 MHz, 80 mW, 1.5 V). To enhance video codec performance, the codec employs fast algorithms, including, in motion estimation, a successive similarity detection algorithm (SSDA; a fast block matching) whose decision timing for termination of block matching is optimized. Further, the use of a software DMA queue reduces the wasteful DSP wait cycles that can result from massive access to external frame memories. The resulting codec executes QVGA /spl times/ 15 fps codec, or CIF /spl times/ 15 fps encoding at 384 kbps, in real time, performance levels sufficient for next-generation wireless videotelephony.

Journal ArticleDOI
TL;DR: The concept of perceptual entropy is extended to subband coding, resulting in the subband perceptual entropy (SPE), which gives a short time estimate of the lowest possible bit rate for transparent subband audio coding.
Abstract: A novel audio coding algorithm is proposed where the warped-linear prediction (WLP) technique is employed to construct a perceptual pre- and post-filter for subband audio coding. A modified signal-to-mask ratio (SMR) calculation is given for subband coding of the WLP residuals of audio signals. The concept of perceptual entropy (PE) is extended to subband coding, resulting in the subband perceptual entropy (SPE), which gives a short time estimate of the lowest possible bit rate for transparent subband audio coding. Two WLP models with frequency responses approximating the spectral shape of the masking threshold are investigated and it is found that the residual signals of both models contain less SPE compared with that of the original audio signals. Subjective tests show that the proposed audio codec operating at 56 kbps has a perceptual quality comparable to MPEG-1 audio Layer II operating at 64 kbps.

Proceedings ArticleDOI
06 Oct 2002
TL;DR: The history and performance of the adaptive multi-rate wideband (AMR-WB) speech codec recently selected by the Third Generation Partnership Project (3GPP) for GSM and the third generation mobile communication WCDMA system for providing wideband speech services is given.
Abstract: This paper gives the history and performance of the adaptive multi-rate wideband (AMR-WB) speech codec recently selected by the Third Generation Partnership Project (3GPP) for GSM and the third generation mobile communication WCDMA system for providing wideband speech services. The AMR-WB speech codec algorithm was selected in December 2000, and the corresponding specifications were approved in March 2001. In July 2001, the AMR-WB codec was also selected by ITU-T in the standardization activity for wideband speech coding around 16 kbit/s. The adoption of AMR-WB by ITU-T is of significant importance since for the first time the same codec is adopted for wireless as well as wireline services. AMR-WB uses an extended audio bandwidth from 3.4 kHz to 7 kHz and gives superior speech quality and voice naturalness compared to 2/sup nd/ and 3/sup rd/ generation mobile communication systems.

Book ChapterDOI
29 Apr 2002

Proceedings Article
01 Sep 2002
TL;DR: This paper proposes several methods to optimize speech enhancement techniques in order to improve the estimation of the CELP speech codec parameters (short-term and long-term parameters).
Abstract: In general, low rate speech coding systems do not have their own mechanism to reduce background noise from the speech signal. This is due to the complexity of the speech signal and limitations in the scope of many speech coding systems. As a consequence, most speech enhancement systems to date have attempted to process the speech waveform directly and independently from the speech coding system, before the encoding of the speech signals. In this paper, we propose several methods to optimize speech enhancement techniques in order to improve the estimation of the CELP speech codec parameters (short-term and long-term parameters). Experimental results for two speech enhancement techniques are presented in conjunction with the ETSI AMR speech codec.

Proceedings ArticleDOI
03 Nov 2002
TL;DR: This paper proposes an adaptive multi-stage Levinson-Durbin algorithm, which is more numerically robust than the conventional Levison-Durstin algorithm for input signals with high spectral dynamics such as speech or audio signals and can be used in practical linear prediction coding systems for better coding performance.
Abstract: This paper proposes an adaptive multi-stage Levinson-Durbin algorithm, which is more numerically robust than the conventional Levinson-Durbin algorithm for input signals with high spectral dynamics such as speech or audio signals. At the same time, the proposed algorithm preserves the computational efficiency of the Levinson-Durbin algorithm. It can be therefore suitable to be used in practical linear prediction coding systems as a replacement of the Levinson-Durbin algorithm for better coding performance.

Proceedings ArticleDOI
03 Sep 2002
TL;DR: A novel algorithm has been developed, that classifies clean speech segments robustly as voiced, unvoiced, or silence respectively, and non-realtime coding allowes iterative refinement of the classification, taking the cepstral distance into account.
Abstract: For use in offline speech processing systems a novel algorithm has been developed, that classifies clean speech segments robustly as voiced, unvoiced, or silence respectively This decision is needed eg in source controlled speech coders which treat voiced, unvoiced, and silent segments differently, to increase the coding efficiency The classifier is based on a combination of several features, extracted from the speech signal in the time domain Besides the Energy, a novel measure, representing the unsteadiness of the speech signal, is proposed Non-realtime coding allowes iterative refinement of the classification, taking the cepstral distance into account The new classification scheme has been tested with the popular AMR codec

Proceedings ArticleDOI
13 May 2002
TL;DR: Techniques and approaches commonly used to realize systems where the codec implementation in all assembly is necessary are described, applicable to any other speech codec.
Abstract: A lot of effort has been spent over the last few years in the development of digital speech coding methods and their subsequent standardization. Algorithms have evolved which provide good quality speech at sub 8 kbps bit rates although at a much higher computational expense. DSP processors have also improved with time providing specific signal processing functionalities aiding in easier codec implementations along with lower power consumption at higher clock speeds. Software development tools and compilers have also improved although they still do not work well in high volume, low cost systems. The cost of development tools may also be prohibitive for nonvendors and at times high level code conversion tools may not be present at all. This paper describes techniques and approaches commonly used to realize such systems where the codec implementation in all assembly is necessary. The specific codec implemented was International Telecommunication Union (ITU-T) G.729 Annex B. The techniques described in this paper are applicable to any other speech codec.

Proceedings ArticleDOI
01 Oct 2002
TL;DR: The FB-RSC method of speech coding provides improved speech quality under packet-loss conditions and significantly improves the concealment of packet loss.
Abstract: The FB-RSC method of speech coding provides improved speech quality under packet-loss conditions. Each packet has a maximum of three codes, a main code, a forward sub-code, and a backward sub-code. The main code represents the current frame. The forward and backward sub-codes represent the next and previous frames. The necessity of the sub-codecs is determined by a sub-codec selector, which considers the SNR of the original and synthesized signals in this determination. A relatively low-compression and high-quality form of coding is used in the main codec, while coding with greater compression is used in the sub-codecs. We examined the quality of the proposed method for random loss of individual packets and pairs of consecutive packets. The result shows that our method significantly improves the concealment of packet loss.

Proceedings ArticleDOI
06 Oct 2002
TL;DR: Some techniques that improve the performance of the CELP model including efficient perceptual weighting and pitch codebook, and decoding techniques such as gain smoothing, periodicity enhancement, and high frequency generation are discussed.
Abstract: This paper describes efficient encoding and decoding methods for achieving high quality wideband speech at low rates. CELP-based technologies have been successful in achieving high quality speech at low bit rates, however, new efficient techniques are needed to maintain high speech quality when the model is applied to wideband signals. This article discusses some techniques that improve the performance of the CELP model including efficient perceptual weighting and pitch codebook, and decoding techniques such as gain smoothing, periodicity enhancement, and high frequency generation. These techniques have been incorporated in the AMR-WB (adaptive multi-rate wideband) codec selected by 3GPP and ITU-T, where high quality wideband speech can be maintained at bit rates around 13 kbit/s.

Proceedings ArticleDOI
M. Karlsson1, Magnus Almgren1, Stefan Bruhn1, K. Larsson, M. Sundelin 
10 Dec 2002
TL;DR: Evaluating different ways of allocating AMR rates to users on the downlink in a WCDMA system finds that the quality and capacity trade-off can be efficiently balanced and high system performance can be achieved for a wide range of offered loads.
Abstract: The adaptive multi-rate (AMR) speech codec is the mandatory speech codec for WCDMA systems. The codec supports eight different source rates ranging from 12.2 kbit/s down to 4.75 kbit/s. This paper evaluates different ways of allocating AMR rates to users on the downlink in a WCDMA system. A novel system performance concept is introduced based on a user satisfaction metric that tries to combine the user experience from different speech quality levels as well as events like blocking and dropping. System simulations show that the AMR codec introduces a significant trade-off between capacity and quality for the speech service. By allocating AMR modes based on the system load, the quality and capacity trade-off can be efficiently balanced and high system performance can be achieved for a wide range of offered loads.

Journal ArticleDOI
TL;DR: Experimental results demonstrates the good performance of the proposed algorithm, which provides high coding efficiency with a reduced complexity.
Abstract: A new algorithm for achieving flexible tiling of the time axis for audio coding purposes is presented. It is based on the calculus of the distances among a predetermined number of time-frequency pairs. From the computed distances, a clustering process determines the final subdivision of each audio frame. Experimental results demonstrates the good performance of the proposed algorithm, which provides high coding efficiency with a reduced complexity.

Patent
Yoshihisa Harada1
04 Feb 2002
TL;DR: In this article, a digital line transmission unit can carry out switching between speech codecs during the same call to achieve balance between making effective use of a line and a high sound quality without bringing about a feeling of discomfort in a user by the switching.
Abstract: A digital line transmission unit can carry out switching between speech codecs during the same call to achieve balance between making effective use of a line and a high sound quality without bringing about a feeling of discomfort in a user by the switching. It includes in an encoder a first speech codec 7 with a high sound quality and a high bit rate, a second speech codec 8 with a reasonable sound quality but a low bit rate. It carries out switching between these speech codecs in response to the control information an operation monitoring controller 4 obtains by making a decision as to the traffic volume of the bearer line 111. The switching between the speech codecs is made during a speech pause a speech burst detector 31 in a signal detector 3 detects in an input speech signal.