scispace - formally typeset
Search or ask a question

Showing papers on "Adaptive Multi-Rate audio codec published in 2006"


01 Jan 2006
TL;DR: The paper proposes the use of synthetic speech coding algorithms (vocoders) to provide redundancy, since the algorithms produce a very low bit-rate stream, which only adds a small overhead to a packet.
Abstract: This paper describes current problems found with audio applications over the MBONE (Multicast Backbone), and investigates possible solutions to the most common one packet loss. The principles of packet speech systems are discussed, and how the structure allows the use of redundancy to design viable solutions to the problem. The paper proposes the use of synthetic speech coding algorithms (vocoders) to provide redundancy, since the algorithms produce a very low bit-rate stream, which only adds a small overhead to a packet. Preliminary experiments show that normal speech repaired with synthetic quality speech is intelligible, even at very high loss rates.

91 citations


Patent
Xiaoqin Sun1, Tian Wang1, Hosam A. Khalil1, Kazuhito Koishida1, Wei-ge Chen1 
05 Apr 2006
TL;DR: In this article, techniques and tools for processing reconstructed audio signals are described, where a reconstructed audio signal is filtered in the time domain using filter coefficients that are calculated, at least in part, in the frequency domain.
Abstract: Techniques and tools are described for processing reconstructed audio signals. For example, a reconstructed audio signal is filtered in the time domain using filter coefficients that are calculated, at least in part, in the frequency domain. As another example, producing a set of filter coefficients for filtering a reconstructed audio signal includes clipping one or more peaks of a set of coefficient values. As yet another example, for a sub-band codec, in a frequency region near an intersection between two sub-bands, a reconstructed composite signal is enhanced.

69 citations


Journal ArticleDOI
R. Salami, R. Lefebvre, A. Lakaniemi1, K. Kontola1, S. Bruhn2, A. Taleb2 
TL;DR: The architecture, performance, and application scenarios of the AMR-WB+ (extended AMW-WB) audio codec are presented, which provides high quality at exceptionally low rates, and consistent quality over all audio types.
Abstract: This article presents the architecture, performance, and application scenarios of the AMR-WB+ (extended AMR-WB) audio codec, which provides high quality at exceptionally low rates, and consistent quality over all audio types. This codec was recently selected by 3GPP and DVB to support low-bit-rate audio and audiovisual applications on mobile networks

61 citations


Journal ArticleDOI
TL;DR: The development of the MP3 coding standard and its essential components as contributed by the members of the MPEG Audio Group are described.
Abstract: In 1988 the International Standardization Organization (ISO) established the Motion Picture Expert Group (MPEG) to develop a digital coding standard for video and audio signals in order to enable interactive video and audio signals on digital storage media. The MPEG Audio Group started with members from 14 research institutions in order to develop a digital audio coding standard guided by a chairman. As a result the MPEG-1, Layer I, Layer II and Layer III coding standards have been developed and proposed for coding of stereo audio signals at 2 times 192 kbit/s, 2 times 128 kbit/s and 2 times 64 kbit/s in 1992. Later, the abbreviation "mp3" or "MP3" was introduced in order to substitute the long name of the successful MPEG-1, Layer III coding standard. This paper describes the development of the MP3 coding standard and its essential components as contributed by the members of the MPEG Audio Group

47 citations



Proceedings ArticleDOI
01 Oct 2006
TL;DR: This work modified the buffering structure of H.264 and implemented several referencing modes in order to effectively search for disparity/motion without increasing computational complexity, and shows that for closely located cameras, this codec outperforms simulcast H. 264 coding.
Abstract: H.264 is the current state-of-the-art monoscopic video codec providing almost twice the coding efficiency with the same quality comparing the previous codecs. With the increasing interest in 3D TV, multi-view video sequences that are provided by multiple cameras capturing the three dimensional objects and/or scene are more widely used. Compressing multi-view sequences independently with H.264 (simulcast) is not efficient since the redundancy between the closer cameras is not exploited. In order to reduce these redundancies, we propose a multi-view video codec based on H.264 using disparity estimation/compensation as well as motion estimation/compensation. In order to effectively search for disparity/motion without increasing computational complexity, we modified the buffering structure of H.264 and implemented several referencing modes. Our results show that for closely located cameras, our codec outperforms simulcast H.264 coding. For sparsely located cameras, our method can still improve coding gain depending on the video characteristics.

38 citations


Patent
30 Nov 2006
TL;DR: In this article, an audio codec in a baseband processor may be utilized for mixing audio signals received at a plurality of data sampling rates, and an interpolation coefficient may be generated based on a base value associated with the specified output sampling rate.
Abstract: An audio codec in a baseband processor may be utilized for mixing audio signals received at a plurality of data sampling rates. The mixed audio signals may be up sampled to a very large sampling rate, and then down sampled to a specified sampling rate that is compatible with a Bluetooth-enabled device by utilizing an interpolator in the audio codec. The down-sampled signals may be communicated to Bluetooth-enabled devices, such as Bluetooth headsets, or Bluetooth-enabled devices with a USB interface. The interpolator may be a linear interpolator for which the audio codec may enable generation of triggering and/or coefficient signals based on the specified output sampling rate. An interpolation coefficient may be generated based on a base value associated with the specified output sampling rate. The audio codec may enable selecting the specified output sampling rate from a plurality of rates.

32 citations


Journal ArticleDOI
TL;DR: An alternative approach for remote speech recognition which combines the advantages of NSR and DSR is explored and it is shown that an NSR solution can approach DSR through a reconstruction technique along with an adapted noise reduction technique originally proposed for acoustic noise.
Abstract: Network-based speech recognition (NSR) and distributed speech recognition (DSR) have been proposed as solutions to translate speech recognition technologies to mobile environments. NSR is the most straightforward solution since it does not require any modification in the mobile phone, however DSR offers higher robustness against codec compression and transmission channel degradation. This paper explores an alternative approach for remote speech recognition which combines the advantages of NSR and DSR. In this scheme, a standard speech codec is used for speech transmission but the recognition is performed from the received codec parameters. In particular, we focus on the effect of transmission channel errors, which can cause a more severe performance reduction on speech recognition than codec distortion. First, we show that an NSR solution can approach DSR through a reconstruction technique along with an adapted noise reduction technique originally proposed for acoustic noise. Then, these results are improved by working with recognition features directly extracted from the codec bitstream by means of parameter transcoding. Required modifications on current networks in order to access the bitstream are described. The network upgrading with the tandem free operation (TFO) protocol is an attractive solution. This upgrade not only offers an overall improvement on the end-to-end speech quality, but would also allow a recognition performance similar, and even higher in poor channel conditions, to that obtained by DSR when parameter transcoding along with the proposed mitigation techniques are applied

31 citations


Journal ArticleDOI
I. Varga1, R.D. De Lacovo, P. Usai
TL;DR: 3GPP and ITU-T have standardized a multi-rate codec for wideband speech conversational applications that uses the same wideband coding algorithm as Recommendation G.722.
Abstract: 3GPP and ITU-T have standardized a multi-rate codec for wideband speech conversational applications. Following a competitive selection process, the adaptive multirate wideband (AMR-WB) specifications were approved in March 2001 as part of 3GPP release 5. The ITU-T Study group 16 approved the same wideband coding algorithm as Recommendation G.722.2 and its Annexes

28 citations


Journal ArticleDOI
TL;DR: The adaptive multirate wideband (AMR-WB) speech codec is the service enabler for improved user experience and represents the state-of-the-art in speech quality as well as robustness in error prone radio channels.
Abstract: Wideband speech is the major differentiation and attraction of third-generation network services in both the circuit and packet switched domain. Increased audio bandwidth introduces a significant leap in perceived quality of service compared to currently utilized narrowband telephony in second-generation mobile communications and the PSTN. The adaptive multirate wideband (AMR-WB) speech codec is the service enabler for improved user experience. It is an established 3GPP and ITU-T wideband speech codec standard and represents the state-of-the-art in speech quality as well as robustness in error prone radio channels. It is also the first codec algorithm standardized for wideband speech for mobile communications

28 citations


Journal ArticleDOI
Sassan Ahmadi1, M. Jelinek
TL;DR: The VMR-WB codec is interoperable with AMR-WB at certain bit rates, thus eliminating quality degradation and additional delay due to transcoding, and enabling a smooth transition from legacy narrowband voice services.
Abstract: This article is an overview of the architecture and operation of the VMR-WB5 a source- and network-controlled variable-rate multimode codec designed for robust processing of wideband speech. To enable a smooth transition from legacy narrowband voice services, VMR-WB is also capable of processing conventional telephone-bandwidth speech. The VMR-WB codec is interoperable with AMR-WB at certain bit rates, thus eliminating quality degradation and additional delay due to transcoding

Proceedings ArticleDOI
14 May 2006
TL;DR: ITU-T test results showed that this coder passed all the requirements of the G729EV qualification phase.
Abstract: This paper describes a 8–32 kbit/s scalable speech and audio coder submitted as a candidate for the ITU-T G729-based Embedded Variable bitrate (G729EV) standardization. The coder is built upon a 3-stage coding structure consisting of: narrowband cascade CELP coding at 8 and 12 kbit/s, bandwidth extension based on wideband linear-predictive coding (WB-LPC) at 14 kbit/s, and MDCT coding in a WB-LPC weighted signal domain from 14 to 32 kbit/s. ITU-T test results showed that this coder passed all the requirements of the G729EV qualification phase.


Journal ArticleDOI
TL;DR: A lattice-based scheme for the single-frame and the double-frame quantization of the speech line spectral frequency parameters and the issue of the robustness to channel errors is investigated.
Abstract: A lattice-based scheme for the single-frame and the double-frame quantization of the speech line spectral frequency parameters is proposed. The lattice structure provides a low-complexity vector quantization framework, which is implemented using a trellis structure. In the single-frame scheme, the intraframe dependencies are exploited using a linear predictor. In the double-frame scheme, the parameters of two consecutive frames are jointly quantized and hence the interframe dependencies are also exploited. A switched scheme is also considered in which, lattice-based double-frame and single-frame quantization is performed for each two frame and the one which results in a lower distortion is chosen. Comparisons to the Split-VQ, the Multi-Stage VQ, the Trellis Coded Quantization, the interframe Block-Based Trellis Quantizer, and the interframe scheme used in IS-641 EFRC and the GSM AMR codec are provided. These results demonstrate the effectiveness of the proposed lattice-based quantization schemes, while maintaining a very low complexity. Finally, the issue of the robustness to channel errors is investigated

Patent
Andreas Witzel1, Dirk Kampmann1
02 Mar 2006
TL;DR: In this paper, the authors proposed several methods for codec handling, in particular, methods involving providing a supported codec list of a Call Control Server (CCS) are described, where a node receives information, whether a terminal supports a wideband codec, wherein the information is received in call set up signaling from the terminal of the subscriber.
Abstract: The invention proposes several methods for codec handling. In specific, methods involving providing a supported codec list of a Call Control Server are described. A node receives information, whether a terminal supports a wideband codec, wherein the information is received in call set up signaling from the terminal of the subscriber. Furthermore, configuration information is retrieved, whether a Radio Access Node supports the wideband codec. Additionally, information is retrieved, whether a media gateway supports the wideband codec, wherein the information is either provided by the operator or retrieved from the media gateway (MGW1, MGW2, MGWx). The information is analyzed and in response to the analysis a supported codec list is provided. Furthermore, alternative embodiments and devices adapted for the methods are disclosed.

Patent
14 Dec 2006
TL;DR: In this paper, an optimal compressor/decompressor (codec) module is intelligently selected for use when transmitting audio from a mobile communication device to a recipient, based on the type of the audio data or the characteristics of the recipient.
Abstract: An optimal compressor/decompressor (codec) module is intelligently selected for use when transmitting audio from a mobile communication device to a recipient. The codec can be selected based on the type of the audio data or the characteristics of the recipient. The codec can also be selected based on whether the audio data is to be transmitted to the recipient in real time or recorded and transmitted asynchronously. Audio data that is to be transmitted to the recipient is encoded or compressed using the selected codec and then sent to the recipient. Selection of the codec in this manner permits the compression to be optimized in response to specific circumstances associated with the communication of the audio data between the sender device and the recipient. The codec can be selected during the communication in response to a tone or other data provided by the recipient.

Proceedings ArticleDOI
Minjie Xie1, D. Lindbergh1, P. Chu1
14 May 2006
TL;DR: The low-complexity 14 kHz audio coding algorithm which has been recently standardized by ITU-T as Recommendation G.722.1C features very high audio quality and extremely low computational complexity compared to other state-of-the-art audio coding algorithms.
Abstract: This paper describes the low-complexity 14 kHz audio coding algorithm which has been recently standardized by ITU-T as Recommendation G.722.1 Annex C ("G.722.1C"). The algorithm is an extension to ITU-T Recommendation G.722.1 and doubles the G.722.1 algorithm to permit 14 kHz audio bandwidth using a 32 kHz audio sample rate, at 24, 32, and 48 kbit/s. The G.722.1C codec features very high audio quality and extremely low computational complexity compared to other state-of-the-art audio coding algorithms. This codec is suitable for use in video conferencing and teleconferencing, and Internet streaming applications. Subjective test results from the characterization phase of G.722.1 C are also presented in the paper

01 Jan 2006
TL;DR: This document specifies a real-time transport protocol (RTP) payload format to be used for Extended AMR Wideband (AMR-WB+) encoded audio signals.
Abstract: This document specifies a real-time transport protocol (RTP) payload format to be used for Extended AMR Wideband (AMR-WB+) encoded audio signals. The AMR-WB+ codec is an audio extension of the AMR-WB codec providing additional frame types designed to give higher quality of music and speech than the original frame types. A media type registration is included for AMR-WB+.

Proceedings ArticleDOI
Juin-Hwey Chen1
14 May 2006
TL;DR: This paper presents several novel codec structures for noise feedback coding (NFC) incorporating both long-term and short-term noise spectral shaping, as well as long- term andshort-term prediction in vector-quantization-based NFC.
Abstract: This paper presents several novel codec structures for noise feedback coding (NFC) incorporating both long-term and short-term noise spectral shaping, as well as long-term and short-term prediction. In addition, the paper generalizes the conventional scalar-quantization-based NFC to vector-quantization-based NFC, and it lays the foundation for the associated efficient VQ codebook search and closed-loop VQ codebook design. BroadVoicereg16, a PacketCable 1.5 mandatory narrowband speech codec standardized by CableLabsreg for Voice over Cable in North America, is based on one of such novel NFC codec structures

Proceedings ArticleDOI
01 Oct 2006
TL;DR: This contribution gives a comprehensive overview of the proposed codec, describes the implemented algorithms, and states a detailed characterization as well as results of the official G.729EV qualification tests.
Abstract: We present an embedded and hierarchical 8-32 kbit/s speech and audio coding algorithm that has been successfully submitted to the ITU-T as a candidate [1] for ITU-T Rec. G.729.1 [2] (ex G.729EV). The submitting consortium consisted of Siemens AG, Matsushita Electric Industrial Co., Ltd., and Mindspeed Technologies, Inc. This contribution gives a comprehensive overview of the proposed codec, describes the implemented algorithms, and states a detailed characterization as well as results of the official G.729EV qualification tests.

Proceedings ArticleDOI
14 May 2006
TL;DR: A technique to extend narrowband (NB) speech communication systems, using e.g. the GSM enhanced full rate (EFR) codec, with wideband (WB, 50-7000 Hz) capability, by applying split vector quantization in a transformed domain.
Abstract: We present a technique to extend narrowband (NB) speech communication systems, using e.g. the GSM enhanced full rate (EFR) codec [1], with wideband (WB, 50–7000 Hz) capability. The limited acoustic bandwidth of narrowband speech coding is extended using a fairly coarse description of the missing high frequency band (3.4–7 kHz) in terms of temporal and spectral envelopes. The high-band parameters are quantized, transmitted and then used at the receiver side to regenerate the high frequency components. The parameter encoding is done by applying split vector quantization in a transformed domain. This quantization scheme can be scaled to match any given target bit rate. Several example configurations have been implemented and tested in MUSHRA-style listening tests.

Journal ArticleDOI
H.T. How1, T. H. Liew1, E.L. Kuan, Lie-Liang Yang, Lajos Hanzo 
TL;DR: A burst-by-burst (BbB) adaptive speech transceiver is proposed, which can drop its source coding rate and speech quality under transceiver control in order to invoke a more error resilient modem mode among less favorable channel conditions.
Abstract: A burst-by-burst (BbB) adaptive speech transceiver is proposed, which can drop its source coding rate and speech quality under transceiver control in order to invoke a more error resilient modem mode among less favorable channel conditions. The adaptive multirate (AMR) speech codec is operated at bit rates of 4.75 and 10.2 kb/s and combined with source sensitivity-matched redundant residue number system (RRNS) based channel codes. BbB adaptive joint detection aided code division multiple access is used for supporting the dual rate speech codec. Both the objective and subjective speech quality assessments favored the proposed BbB adaptive transceiver.

Proceedings ArticleDOI
14 May 2006
TL;DR: This work presents a new PLC method for G.722 and proposes an efficient approach to sending the side information to resynchronize the encoder and decoder that greatly improves the robustness of the G. 722 codec to packet losses.
Abstract: Since the G.722 wideband speech codec offers higher quality and naturalness than G.711, is low in complexity, has low delay, and tandems well with other codecs, it is an attractive codec for voice over IP and voice over wireless LANs. However, packet losses in G.722 not only require good concealment of the lost frame, but a lost frame results in a mismatch of the encoder/decoder states for the next correctly received frame following the lost frame. Although proprietary schemes exist, the G.722 codec has no standardized packet loss concealment (PLC) method. We present a new PLC method for G.722 and propose an efficient approach to sending the side information to resynchronize the encoder and decoder that greatly improves the robustness of the G.722 codec to packet losses

Proceedings Article
01 Sep 2006
TL;DR: A new wideband audio coding concept is presented that provides good audio quality at bit rates below 3 bits per sample with an algorithmic delay of less than 10 ms and outperforms ITU-T G.722 at the same bit rate of 48 kbit/sec and a sample rate of 16 kHz.
Abstract: In this contribution a new wideband audio coding concept is presented that provides good audio quality at bit rates below 3 bits per sample with an algorithmic delay of less than 10 ms. The new concept is based on the principle of Linear Predictive Coding (LPC) in an analysis-by-synthesis framework, as known from speech coding. A spherical codebook is used for quantization at bit rates which are higher in comparison to low bit rate speech coding for improved performance for audio signals. For superior audio quality, noise shaping is employed to mask the coding noise. In order to reduce the computational complexity of the encoder, the analysis-by-synthesis framework has been adapted for the spherical codebook to enable a very efficient excitation vector search procedure. The codec principle can be adapted to a large variety of application scenarios. In terms of audio quality, the new codec outperforms ITU-T G.722 [4] at the same bit rate of 48 kbit/sec and a sample rate of 16 kHz.

Patent
31 Oct 2006
TL;DR: In this paper, an audio data packet format for transmitting an IYlPEG-4 HE-AAC frame via a voice channel of a mobile communication network is presented.
Abstract: Disclosed is an audio data packet format for transmitting an IYlPEG-4 HE-AAC frame via a voice channel of a mobile communication network, a method for decoding the audio data packet format, a method for correcting a codec setup error by identifying a codec used to encode sound source data inserted into a data field of voice slot data, based on the sequence number of the voice slot data, and correcting the codec setup error when a codec set up in a mobile communication terminal is different from the codec used to encode the sound source data, and a mobile communication terminal adapted to correct a codec setup error.

01 Oct 2006
TL;DR: This document specifies a Real-time Transport Protocol (RTP) payload format to be used for the International Telecommunication Union (ITU-T) G.729.1 audio codec.
Abstract: This document specifies a Real-time Transport Protocol (RTP) payload format to be used for the International Telecommunication Union (ITU-T) G.729.1 audio codec. A media type registration is included for this payload format. [STANDARDS-TRACK]

Patent
He Ouyang, Binghui Wu, Yi Zhou, Lin Luo, Kai Wan 
18 Jul 2006
TL;DR: In this article, an implementation of audio codec, which has low computational complexity, small memory footprint and high coding efficiency, is presented, which can be used in handheld devices, SoC or ASIC products and embedded systems.
Abstract: This invention discloses an implementation of audio codec, which has low computational complexity, small memory footprint and high coding efficiency. It can be used in handheld devices, SoC or ASIC products and embedded systems. At the encoder side: first, apply time-to-frequency transform to audio signals, obtaining un-quantized spectrum data; second, based on the un-quantized spectrum data and target bit count, calculate the corresponding information of optimal scale factor, frequency band group, code table index and quantized spectrum by iteration; third, calculate and format bit-stream; fourth, output formatted bit-stream. At the decoder side: parse the formatted bit-stream, apply decoding and inverse quantization to the spectrum of each frame, reconstruct temporal audio data by frequency-to-time transform, and reconstruct the time-domain signals of each channel.

Proceedings ArticleDOI
M. De Meuleneire, Herve Taddei1, O. de Zelicourt1, Dominique Pastor, P. Jax 
14 May 2006
TL;DR: Listening tests suggest that the proposed codec is equivalent to the ITU-T G. 722 at 48 kbit/s for speech signals, and the relevance of such a scheme when compared to a pure wavelet packet decomposition.
Abstract: This paper presents a scalable wideband speech codec working at bitrates ranging from 8 to 32 kbit/s. The core layer is the ITU-T G. 729 at 8 kbit/s. A first enhancement layer is a bandwidth extension algorithm requiring 2 kbit/s to widen the G. 729 narrow band output speech. The difference between the wideband original and reconstructed signal is transformed in the time-frequency domain by a full wavelet decomposition. The resulting coefficients are quantized by an embedded quantizer at 22 kbit/s. Listening tests show the relevance of such a scheme when compared to a pure wavelet packet decomposition. In addition, listening tests suggest that the proposed codec is equivalent to the ITU-T G. 722 at 48 kbit/s for speech signals.

Proceedings ArticleDOI
09 Jul 2006
TL;DR: Initial results in determining the recognition accuracy that can be achieved with five widely used speech coding standards are presented and show that performance does not strictly depend on coding rate or codec speech quality.
Abstract: Compressed-domain automatic speaker recognition is based on the analysis of the compressed parameters of speech coders. The objective is to perform low-complexity on-line speaker recognition for VoIP in the compressed domain, without the need to decode or resynthesize the speech bitstream. In this paper, we present initial results in determining the recognition accuracy that can be achieved with five widely used speech coding standards. Experiments with a database of 14 speakers obtain a recognition ratio close to 100% after the analysis of 30 seconds of active speech for most of the considered speech coders and rates. In particular, the results show that performance does not strictly depend on coding rate or codec speech quality.

01 Jan 2006
TL;DR: Novel audio coding technique designed to be utilized at medium bit-rates, using relatively long temporal segments of audio signal in critical-band-sized sub-bands to provide broadcast radio-like quality audio.
Abstract: We describe novel audio coding technique designed to be utilized at medium bit-rates. Unlike classical state-of-the-art audio coders that are based on short-term spectra, our approach uses relatively long temporal segments of audio signal in critical-band-sized sub-bands. We apply auto-regressive model to approximate Hilbert envelopes in frequency sub-bands. Residual signals (Hilbert carriers) are demodulated and thresholding functions are applied in spectral domain. The Hilbert envelopes and carriers are quantized and transmitted to the decoder. Our experiments focused on designing audio coder to provide broadcast radio-like quality audio around $10-20$kbps. Objective quality measures indicate comparable performance with the 3GPP-AMR speech codec standard for both speech and non-speech signals.