scispace - formally typeset
Search or ask a question

Showing papers on "Adaptive Multi-Rate audio codec published in 2009"


Proceedings ArticleDOI
19 Apr 2009
TL;DR: This new codec forms the basis of the reference model in the ongoing MPEG standardization activity for Unified Speech and Audio Coding, which results in a codec that exhibits consistently high quality for speech, music and mixed audio content.
Abstract: Traditionally, speech coding and audio coding were separate worlds. Based on different technical approaches and different assumptions about the source signal, neither of the two coding schemes could efficiently represent both speech and music at low bitrates. This paper presents a unified speech and audio codec, which efficiently combines techniques from both worlds. This results in a codec that exhibits consistently high quality for speech, music and mixed audio content. The paper gives an overview of the codec architecture and presents results of formal listening tests comparing this new codec with HE-AAC(v2) and AMR-WB+. This new codec forms the basis of the reference model in the ongoing MPEG standardization activity for Unified Speech and Audio Coding.

108 citations


Journal ArticleDOI
TL;DR: An algorithm and a hardware architecture of a new type EC codec engine with multiple modes are presented and the proposed four-tree pipelining scheme can reduce 83% latency and 67% buffer size between transform and entropy coding.
Abstract: In a typical portable multimedia system, external access, which is usually dominated by block-based video content, induces more than half of total system power. Embedded compression (EC) effectively reduces external access caused by video content by reducing the data size. In this paper, an algorithm and a hardware architecture of a new type EC codec engine with multiple modes are presented. Lossless mode, and lossy modes with rate control modes and quality control modes are all supported by single algorithm. The proposed four-tree pipelining scheme can reduce 83% latency and 67% buffer size between transform and entropy coding. The proposed EC codec engine can save 62%, 66%, and 77% external access at lossless mode, half-size mode, and quarter-size mode and can be used in various system power conditions. With TSMC 0.18 mum 1P6M CMOS logic process, the proposed EC codec engine can encode or decode CIF 30 frame per second video data and achieve power saving of more than 109 mW. The EC codec engine itself consumes only 2 mW power.

65 citations


Patent
14 Aug 2009
TL;DR: In this article, various techniques for adaptively encoding and compressing audio broadcast data to create a digital representation of the broadcast for storage on an electronic device are provided, and some embodiments may also provide for an adjustable compression bit-rate based at least partially upon the quality parameter of the audio broadcast.
Abstract: Various techniques for adaptively encoding and compressing audio broadcast data to create a digital representation of the broadcast for storage on an electronic device are provided. In one embodiment, the audio broadcast data may be encoded and stored onto the electronic device using a particular codec and/or compression rate, the selection of which may be based upon one or more characteristics of the audio broadcast data signal, such as a genre parameter or a quality parameter. Particularly, the audio broadcast data may be encoded using either a music codec or a speech codec depending upon the genre parameter. Further, some embodiments may also provide for an adjustable compression bit-rate based at least partially upon the quality parameter of the audio broadcast.

45 citations


Patent
23 Apr 2009
TL;DR: In this paper, a method to establish a full-duplex audio connection over an asynchronous Bluetooth link between an audio terminal and a wireless audio device exchanges supported service classes and codecs between the audio terminals and the wireless audio devices.
Abstract: A method to establish a full-duplex audio connection over an asynchronous Bluetooth link between an audio terminal and a wireless audio device exchanges supported service classes and codecs between the audio terminal and the wireless audio device, negotiates a service class and a codec that are common to the audio terminal and the wireless audio device, and establishes an asynchronous audio connection between the audio terminal and the wireless audio device using the common service class and the codec. The audio connection established can depend on the software application desiring the audio connection plus the available service classes and codecs at the audio terminal and wireless audio device. For non-internet protocol (non-IP) audio applications, an ACL using AVDTP may be selected; for IP audio applications, an ACL using BNEP may be selected. Both AVDTP and BNEP can use codecs that support wide bandwidth audio.

41 citations


Journal ArticleDOI
TL;DR: This article is an overview of the standardization, architecture, and performance of the new ITU-T Recommendation G.718, an embedded variable bit rate codec providing a scalable solution for compression of 8 and 16 kHz sampled speech and audio signals at rates between 8 kb/s and 32kb/s.
Abstract: This article is an overview of the standardization, architecture, and performance of the new ITU-T Recommendation G.718. G.718 is an embedded variable bit rate codec providing a scalable solution for compression of 8 and 16 kHz sampled speech and audio signals at rates between 8 kb/s and 32 kb/s. It comprises five layers where higher-layer bitstreams can be discarded without affecting the lower layersiquest decoding. The codec also has an optional core layer interoperable with ITU-T G.722.2 (3GPP AMR-WB) at 12.65 kb/s. G.718 was designed to provide high speech quality at low bit rates and to be robust to significant rates of frame erasures or packet losses. It is also targeting good quality for generic audio at higher rates.

36 citations


Proceedings ArticleDOI
04 Dec 2009
TL;DR: A new low-complexity full-band (20 kHz) audio coding algorithm which has been recently standardized by ITU-T as Recommendation G.719 is described, which features very high audio quality and low computational complexity and is suitable for use in applications such as videoconferencing, teleconferences, and streaming audio over the Internet.
Abstract: This paper describes a new low-complexity full-band (20 kHz) audio coding algorithm which has been recently standardized by ITU-T as Recommendation G.719. The algorithm is designed to provide 20 Hz – 20 kHz audio bandwidth using a 48 kHz sample rate, operating at 32 – 128 kbps. This codec features very high audio quality and low computational complexity and is suitable for use in applications such as videoconferencing, teleconferencing, and streaming audio over the Internet. Subjective test results from the Optimization/Characterization phase of G.719 are also presented in the paper.

34 citations



Proceedings ArticleDOI
19 Apr 2009
TL;DR: The speech and audio codec that has been submitted to ITU-T by Huawei and ETRI as a candidate for the upcoming super-wideband and stereo extensions of Rec.
Abstract: This paper describes the speech and audio codec that has been submitted to ITU-T by Huawei and ETRI as a candidate for the upcoming super-wideband and stereo extensions of Rec. G.729.1 and G.718. The core codec in the current implementation is G.729.1 and the encoded frequency range is increased from 7 kHz to 14 kHz. Therefore, the maximum bit rate is raised from 32 kbit/s to 64 kbit/s by adding five bitstream layers. A comprehensive overview of the codec is presented with a focus on the mono coding components. The results of the listening tests that have been conducted during the ITU-T qualification phase are summarized. The proposed codec passes all quality requirements for mono input signals.

28 citations


Proceedings Article
01 Aug 2009
TL;DR: The original DFT was replaced by the state-of-art transformation MDCT, and the vector quantization by the combination of a scalar quantization and an evolved context-adaptive arithmetic coder to enhance the coding efficiency of AMR-WB+ while maintaining its high flexibility.
Abstract: Coding audio material at low bit rates with a consistent quality over a wide range of signals is a current and challenging problem. The high-granularity switched speech and audio coder AMR-WB+ performs especially well for speech and mixed content by promptly adapting its coding model scheme to the signal. However, the high adaptation rate is done at the price of limited performance for non-speech signals. The aim of the paper is to enhance the coding efficiency of AMR-WB+ while maintaining its high flexibility. For this purpose, the original DFT was replaced by the state-of-art transformation MDCT, and the vector quantization by the combination of a scalar quantization and an evolved context-adaptive arithmetic coder. The improvements were measured by both objective and subjective evaluations.

27 citations


Proceedings ArticleDOI
24 Aug 2009
TL;DR: In this article, the authors proposed an audio codec based on the modified discrete cosine transform (MDCT) with very short frames and uses gain-shape quantization to preserve the spectral envelope.
Abstract: We propose an audio codec that addresses the low-delay requirements of some applications such as network music performance. The codec is based on the modified discrete cosine transform (MDCT) with very short frames and uses gain-shape quantization to preserve the spectral envelope. The short frame sizes required for low delay typically hinder the performance of transform codecs. However, at 96 kbit/s and with only 4 ms algorithmic delay, the proposed codec out-performs the ULD codec operating at the same rate. The total complexity of the codec is small, at only 17 WMOPS for real-time operation at 48 kHz.

26 citations


Journal ArticleDOI
TL;DR: The codec requirements and design constraints are presented, how standardization was conducted is described, and how the codec performance and its initial deployment are reported on.
Abstract: In March 2008 the ITU-T approved a new wideband speech codec called ITU-T G.711.1. This Recommendation extends G.711, the most widely deployed speech codec, to 7 kHz audio bandwidth and is optimized for voice over IP applications. The most important feature of this codec is that the G.711.1 bitstream can be transcoded into a G.711 bitstream by simple truncation. G.711.1 operates at 64, 80, and 96 kb/s, and is designed to achieve very short delay and low complexity. ITU-T evaluation results show that the codec fulfils all the requirements defined in the terms of reference. This article presents the codec requirements and design constraints, describes how standardization was conducted, and reports on the codec performance and its initial deployment.

Journal ArticleDOI
TL;DR: The article presents the standardization goals and process, an overview of the coding algorithm, and the codec performance in various conditions, which makes the coder especially suitable for high-quality speech communication.
Abstract: G.729.1 is a scalable codec for narrowband and wideband conversational applications standardized by ITU-T Study Group 16. The motivation for the standardization work was to meet the new challenges of VoIP in terms of quality of service and efficiency in networks, in particular regarding the strategic rollout of wideband service. G.729.1 was designed to allow smooth transition from narrowband (300-3400 Hz) PSTN to high-quality wideband (50-7000 Hz) telephony by preserving backward compatibility with the widely deployed G.729 codec. The scalable structure allows gradual quality increase with bit rate. A low-delay mode makes the coder especially suitable for high-quality speech communication. The article presents the standardization goals and process, an overview of the coding algorithm, and the codec performance in various conditions.

Patent
13 Oct 2009
TL;DR: In this paper, a high-pass filter is used to prevent damage to personal computer speakers and other components, and a tuning voltage is digitized into a tuning code used by a digital highpass filter, which can be used to insure only the audio path leading to the speakers is filtered.
Abstract: An integrated audio codec includes a high-pass filter to prevent damage to personal computer speakers and other components. The audio codec may be compliant with HD audio standards and can operate with generic software drivers. Tuning of the high-pass filter is provided through an external pin-out where either an external capacitor or external resistors provide an ability to tune the high-pass filter. In one implementation, a tuning voltage is digitized into a tuning code used by a digital high-pass filter. In addition, multiplexers can be used to insure only the audio path leading to the speakers is filtered.

Proceedings ArticleDOI
06 Sep 2009
TL;DR: The highband information is embedded into the pitch delay data of the AMR codec using an extended quantization-based method that achieves increased embedding capacity and higher perceived sound quality than the previous steganographic method.
Abstract: This paper proposes a bandwidth extension (BWE) method for the AMR narrow-band speech codec using steganography, which is called steganographic BWE herein. The highband information is embedded into the pitch delay data of the AMR codec using an extended quantization-based method that achieves increased embedding capacity and higher perceived sound quality than the previous steganographic method. The target bit-rate mode is below 7 kbps, the level below which the previous steganographic BWE method did not maintain adequate sound quality. The sound quality of the steganographic BWE speech signals decoded from the embedded bitstream is comparable to that of the wide-band speech signals of the AMRWB codec at a bit rate of less than 6.7 kbps, with only a slight degradation in the quality relative to speech signals decoded from the same bitstream by the legacy AMR decoder.

Journal Article
TL;DR: The basic elements of the ALS codec are described with a focus on prediction, entropy coding, and related tools and the most important applications of this standardized lossless audio format are pointed out.
Abstract: The MPEG-4 Audio Lossless Coding (ALS) standard belongs to the family MPEG-4 audio coding standards. In contrast to lossy codecs such as AAC, which merely strive to preserve the subjective audio quality, lossless coding preserves every single bit of the original audio data. The ALS core codec is based on forward-adaptive linear prediction, which combines remarkable compression with low complexity. Additional features include long-term prediction, multichannel coding, and compression of floating-point audio material. This paper describes the basic elements of the ALS codec with a focus on prediction, entropy coding, and related tools and points out the most important applications of this standardized lossless audio format.

Patent
22 Jan 2009
TL;DR: In this paper, the authors present a system for bandwidth management and codec negotiation, which includes a configuration storage module having supported codecs storage, codec lists and preferred site settings storage, and a call manager having an extension module, a trunk module, location service engine, a codec manager, a bandwidth manager, and media manager.
Abstract: A system for bandwidth management and codec negotiation, according to one embodiment of the present invention comprises: a configuration storage module having supported codecs storage, codec lists and preferred site settings storage, and a call manager having an extension module, a trunk module, a location service engine, a codec manager, a bandwidth manager, and a media manager The codec manager and the bandwidth manager used for negotiating a codec for a call between two endpoints The present invention also includes a number of methods including a method for negotiating a codec for a call, a method for managing bandwidth for a call, a method for adding a description of a new codec supported by an endpoint, a method for adding an identifier of a supported codec to a codec list and a method for editing code site codec settings

Proceedings ArticleDOI
19 Dec 2009
TL;DR: There is some loss in identification and verification performance foraudio coding process without the change of sample rate, and a great loss when sample rate changes during audio coding process.
Abstract: We investigate the effect of audio coding on speaker identification and verification when training and testing conditions are matched and mismatched. Experiments use popular audio coding algorithms (Windows Media Audio 9.1, Advanced Audio Coding, MPEG Audio Layer III) and a speaker identification and verification system based on Gaussian mixture models. There is some loss in identification and verification performance for audio coding process without the change of sample rate, and a great loss when sample rate changes during audio coding process.

Patent
Robert W. Zopf1, Laurent Pilati1
06 Nov 2009
TL;DR: Packet loss concealment systems and methods are described in this paper that may be used in conjunction with a Bluetooth® Low-Complexity Subband Coding (LC-SBC) codec or other sub-band codecs, including but not limited to an MPEG-1 Audio Layer 3 (MP3) codec, an AAC codec, and a Dolby AC-3 codec.
Abstract: Packet loss concealment systems and methods are described that may be used in conjunction with a Bluetooth® Low-Complexity Sub-band Coding (LC-SBC) codec or other sub-band codecs, including but not limited to an MPEG-1 Audio Layer 3 (MP3) codec, an Advanced Audio Coding (AAC) codec, and a Dolby AC-3 codec.

Journal ArticleDOI
TL;DR: This paper describes computationally efficient implementations for the ITU-T G.729 speech codec with focus on the adaptive codebook search, more specifically in the open-loop stage, which first estimates the pitch period of the speech frame being coded.
Abstract: This paper describes computationally efficient implementations for the ITU-T G.729 speech codec. Focus is given to the adaptive codebook search, more specifically in the open-loop stage, which first estimates the pitch period of the speech frame being coded. Different strategies are discussed to achieve an excellent compromise between computational complexity and signal quality. The result is an accelerated procedure for the G.729 and G.729A versions, reducing encoding time in about 12% and 9%, respectively, while sustaining original signal quality, as verified by objective and subjective measurements.

Journal Article
TL;DR: In this paper, the authors report on two listening tests to determine the speech quality of different wideband (WB) speech codecs and evaluate the hypothesis that secondary test factors lead to this rank-order effect.
Abstract: We report on two listening tests to determine the speech quality of different wideband (WB) speech codecs. In the first test, we have studied various network conditions, including WB–WB and WB–narrowband (WB– NB) tandeming, packet loss, and background noise. In addition to other findings, this test showed some codec quality rank-order changes when compared to the literature. To evaluate the hypothesis that secondary test factors lead to this rank-order effect, we conducted another speech quality listening test: Here, we simulated different source material recording conditions (room-acoustics and microphone positions), processed the material with different WB speech coders, and presented the resulting files monotically in one and diotically in another test. The paper discusses why and how these factors impact speech quality.

Proceedings ArticleDOI
04 Dec 2009
TL;DR: An allpass-based IIR filter-bank is used whose design and implementation is presented in this contribution to achieve a significantly lower signal delay in comparison to the traditional FIR QMF-bank solution without a compromise for the speech and audio quality.
Abstract: A new speech and audio codec has been submitted recently to ITU-T by a consortium of Huawei and ETRI as candidate proposal for the super-wideband and stereo extensions of ITU-T Rec. G.729.1 and G.718. This hierarchical codec with bit rates from 8–64 kbit/s relies on a subband splitting by means of a quadrature-mirror filter-bank (QMF-bank). For this, an allpass-based QMF-bank is used whose design and implementation is presented in this contribution. This IIR filter-bank allows to achieve a significantly lower signal delay in comparison to the traditional FIR QMF-bank solution without a compromise for the speech and audio quality.

01 Jan 2009
TL;DR: A new approach on how to distribute the functionality of a speech receiver between codec and application leads to easier implementations of high quality VoIP applications and a combination of the PEAQ basic and advanced values best matches---after third order linear regression---the subjective MUSHRA results.
Abstract: The Bluetooth Special Interest Group (SIG) has standardized the subband coding (SBC) audio codec to connect headphones via wireless Bluetooth links. SBC compresses audio at high fidelity while having an ultra-low algorithm delay. To make SBC suitable for the Internet, we extend it by using a time and packet loss concealment (PLC) algorithm that is based on ITU's G.711 Appendix I. The design is novel in the aspect of the interface between codec and speech receiver. We developed a new approach on how to distribute the functionality of a speech receiver between codec and application. Our approach leads to easier implementations of high quality VoIP applications. We conducted subjective and objective listening tests of the audio quality of SBC and PLC in order to determine an optimal coding mode and the trade-off between coding mode and packet loss rate. More precisely, we conducted MUSHRA listening tests for selected sample items. These tests results are then compared with the results of multiple objective assessment algorithms (ITU P.862 PESQ, ITU BS.1387-1 PEAQ, Creusere's algorithm). We found out that a combination of the PEAQ basic and advanced values best matches---after third order linear regression---the subjective MUSHRA results . The linear regression has coefficient of determination of R=0.907. By comparison, our individual human ratings show a correlation of about R=0.9 compared to our averaged human rating results. Using the combination of both PEAQ algorithms, we calculate hundred thousands of objective audio quality ratings varying audio content and algorithmic parameters of SBC and PLC. The results show which set of parameters value are best suitable for a bandwidth and delay constrained link. The transmission quality of SBC is enhanced significantly by selecting optimal encoding parameters as compared to the default parameter sets given in the standard. Finally, we present preliminary objective tests results on the comparison of the audio codecs SBC, CELT, APT-X and ULD coding speech and audio transmission. They all allow a mono and stereo transmission of music at ultra-low coding delays (<10ms), which is especially useful for distributed ensemble performances over the Internet.

Patent
09 Sep 2009
TL;DR: In this article, an analog-to-digital converter (ADC) converts an analog signal into a digital signal by sampling the analog input signal at a codec platform sampling frequency, and an encoder generates a bit stream by compressing the digital signal provided by the sampling frequency converter.
Abstract: A codec platform apparatus which can perform encoding or decoding regardless of a sampling frequency supported by a codec platform is provided. The codec platform apparatus includes an analog-to-digital converter (ADC) converting an analog input signal into a digital signal by sampling the analog input signal at a codec platform sampling frequency; a sampling frequency converter converting the digital signal provided by the ADC into a digital signal having a codec sampling frequency; and an encoder generating a bit stream by compressing the digital signal provided by the sampling frequency converter. Since there is no need to adopt a new codec platform even when an existing codec platform does not support the sampling frequency of a new codec, there is no need to implant the new codec. Therefore, it is possible to improve user satisfaction.

Patent
13 Nov 2009
TL;DR: In this article, a method for playing an audio file using an audio codec engine system in a gaming environment is described, where the audio codec does not decode the entire audio file at once and decoding only a one second maximum of an audio sample at a time.
Abstract: Various embodiments disclosed herein are directed to a method for playing an audio file using an audio codec engine system in a gaming environment The method comprising: reading an audio file into a memory buffer only once; receiving a request from a game application to play the audio file; inputting the memory buffer into the audio codec engine to obtain a decompressed audio sample in response to the request; and decoding only a one second maximum of an audio sample at time, wherein the audio codec does not decode the entire audio file at once; and wherein an audio file's samples are decoded only when the audio file is requested for active playback by the game application

Journal Article
TL;DR: A Video Codec based on Wavelet transform and hence performance of the coder is superior to other block transform based codecs and hence it is ideally suited for sequences with smooth and gentle motion of the video conferencing kind.
Abstract: Video coding schemes for low bit-rate is of high importance and traditional coding schemes which use block transforms suffer from blocking artefacts. Here we propose a Video Codec based on Wavelet transform and hence performance of the coder is superior to other block transform based codecs. The Wavelet coefficients are coded using the computationally simpler no list SPIHT whose performance is similar to that of set partitioning in hierarchical trees. Motion estimation is done using the recently proposed kite-cross-diamond search algorithm which is the fastest among the block matching algorithms. The codec is ideally suited for sequences with smooth and gentle motion of the video conferencing kind. Simulation results are provided to evaluate the performance of the codec at various bit-rates. The codec is scalable in terms of bandwidth requirement which means only one compressed bit stream is produced for different bit-rates. The use of NLS makes the codec scalable since it has the embedded coding property. However for resolution scalability different compressed files are required.

Proceedings ArticleDOI
05 Jul 2009
TL;DR: It is concluded that the JPEG-type Audio Huffman coding achieves the best results although it is not possible to truncate the bit stream, in this case, to easily match the bit rate to the fixed channel capacity.
Abstract: This paper reports on the results of four re-encoding schemes on perceptually quantized wavelet packet transform (WPT) coefficients of audio and high quality speech. These schemes comprises: 1- Embedded Zero-tree Wavelet (EZW) 2- The set partitioning in hierarchical trees (SPIHT) 3-JPEG-based entropy/run length Huffman and 4- JPEG-type Audio Huffman coding algorithms. Since EZW and SPIHT are designed for image compression, some new modifications have been implemented in these schemes for their better matching with audio signals. The performances of these four re-encoders are compared in terms of average output bit rate and computation time of a same codec. It is concluded that the JPEG-type Audio Huffman coding achieves the best results although it is not possible to truncate the bit stream, in this case, to easily match the bit rate to the fixed channel capacity.

Book ChapterDOI
Yujie Gao1
01 Jan 2009

Proceedings Article
01 Aug 2009
TL;DR: An attacks restoration method based on the correction of the temporal envelope of the decoded signal, using a small set of coefficients transmitted through an auxiliary channel, which exhibits an efficient restoration of the attacks and a significant improvement of the audio quality.
Abstract: At reduced bit rates, the audio compression affects transient parts of signals, which results in pre-echo and loss of attack character. We propose in this paper an attacks restoration method based on the correction of the temporal envelope of the decoded signal, using a small set of coefficients transmitted through an auxiliary channel. The proposed approach is evaluated for single and multiple coding-decoding, using objective perceptual measures. The experimental results for MP3 and AAC coding exhibits an efficient restoration of the attacks and a significant improvement of the audio quality.

Proceedings ArticleDOI
19 Jan 2009
TL;DR: To solve recent pressing issues regarding satisfying numerous video codec standards and supporting “full-high-definition” (full-HD) video on different consumer devices, a multi-standard CODEC IP based on a heterogeneous multiprocessor architecture was developed.
Abstract: To solve recent pressing issues regarding satisfying numerous video codec standards and supporting "full-high-definition" (full-HD) (i.e., 1920 pixels by 1080 lines) video on different consumer devices, a multi-standard CODEC IP based on a heterogeneous multiprocessor architecture was developed. To achieve satisfactory performance with low power consumption, operation-specific processors were designed in regards to two types of processing: stream processing and pixel processing. The CODEC uses effectively several dedicated circuits for functions which are unsuitable for those processors. To design the CODEC, we developed a C-language model to check that the architecture worked correctly. The model was also used as a reference for verifying the RTL model. The CODEC can process full-HD videos formatted in H.264, MPEG-2, MPEG-4, and VC-1 at an operating frequency of 162 MHz.

Proceedings ArticleDOI
25 May 2009
TL;DR: This paper reviews some important technologies for speech and audio coding, contrasting time domain and frequency domain processing, in ITU-T and ISO/IEC MPEG.
Abstract: This paper reviews some important technologies for speech and audio coding, contrasting time domain and frequency domain processing. These technologies have contributed to the standardization activities in ITU-T and ISO/IEC MPEG, respectively. It is interesting that the standardization targets for speech and audio have started to converge recently. The MPEG USAC (Unified Speech and Audio Coding) scheme, which is being actively developed, may make use of most of the important technologies for speech and audio that have been developed so far.