scispace - formally typeset
Search or ask a question

Showing papers on "Enhanced Variable Rate Codec published in 2009"


Proceedings ArticleDOI
19 Apr 2009
TL;DR: This new codec forms the basis of the reference model in the ongoing MPEG standardization activity for Unified Speech and Audio Coding, which results in a codec that exhibits consistently high quality for speech, music and mixed audio content.
Abstract: Traditionally, speech coding and audio coding were separate worlds. Based on different technical approaches and different assumptions about the source signal, neither of the two coding schemes could efficiently represent both speech and music at low bitrates. This paper presents a unified speech and audio codec, which efficiently combines techniques from both worlds. This results in a codec that exhibits consistently high quality for speech, music and mixed audio content. The paper gives an overview of the codec architecture and presents results of formal listening tests comparing this new codec with HE-AAC(v2) and AMR-WB+. This new codec forms the basis of the reference model in the ongoing MPEG standardization activity for Unified Speech and Audio Coding.

108 citations


Journal ArticleDOI
TL;DR: This work presents guidelines for the CODEC design of the ldquo forbidden pattern free crosstalk avoidance coderdqui (FPF-CAC), and shows that mathematically, a mapping scheme exists based on the representation of numbers in the Fibonacci numeral system.
Abstract: Interconnect delay has become a limiting factor for circuit performance in deep sub-micrometer designs. As the crosstalk in an on-chip bus is highly dependent on the data patterns transmitted on the bus, different crosstalk avoidance coding schemes have been proposed to boost the bus speed and/or reduce the overall energy consumption. Despite the availability of the codes, no systematic mapping of data words to codewords has been proposed for CODEC design. This is mainly due to the nonlinear nature of the crosstalk avoidance codes (CAC). The lack of practical CODEC construction schemes has hampered the use of such codes in practical designs. This work presents guidelines for the CODEC design of the ldquoforbidden pattern free crosstalk avoidance coderdquo (FPF-CAC). We analyze the properties of the FPF-CAC and show that mathematically, a mapping scheme exists based on the representation of numbers in the Fibonacci numeral system. Our first proposed CODEC design offers a near-optimal area overhead performance. An improved version of the CODEC is then presented, which achieves theoretical optimal performance. We also investigate the implementation details of the CODECs, including design complexity and the speed. Optimization schemes are provided to reduce the size of the CODEC and improve its speed.

107 citations


Proceedings ArticleDOI
19 Apr 2009
TL;DR: The speech and audio codec that has been submitted to ITU-T by Huawei and ETRI as a candidate for the upcoming super-wideband and stereo extensions of Rec.
Abstract: This paper describes the speech and audio codec that has been submitted to ITU-T by Huawei and ETRI as a candidate for the upcoming super-wideband and stereo extensions of Rec. G.729.1 and G.718. The core codec in the current implementation is G.729.1 and the encoded frequency range is increased from 7 kHz to 14 kHz. Therefore, the maximum bit rate is raised from 32 kbit/s to 64 kbit/s by adding five bitstream layers. A comprehensive overview of the codec is presented with a focus on the mono coding components. The results of the listening tests that have been conducted during the ITU-T qualification phase are summarized. The proposed codec passes all quality requirements for mono input signals.

28 citations


Proceedings ArticleDOI
24 Aug 2009
TL;DR: In this article, the authors proposed an audio codec based on the modified discrete cosine transform (MDCT) with very short frames and uses gain-shape quantization to preserve the spectral envelope.
Abstract: We propose an audio codec that addresses the low-delay requirements of some applications such as network music performance. The codec is based on the modified discrete cosine transform (MDCT) with very short frames and uses gain-shape quantization to preserve the spectral envelope. The short frame sizes required for low delay typically hinder the performance of transform codecs. However, at 96 kbit/s and with only 4 ms algorithmic delay, the proposed codec out-performs the ULD codec operating at the same rate. The total complexity of the codec is small, at only 17 WMOPS for real-time operation at 48 kHz.

26 citations


Journal ArticleDOI
TL;DR: The codec requirements and design constraints are presented, how standardization was conducted is described, and how the codec performance and its initial deployment are reported on.
Abstract: In March 2008 the ITU-T approved a new wideband speech codec called ITU-T G.711.1. This Recommendation extends G.711, the most widely deployed speech codec, to 7 kHz audio bandwidth and is optimized for voice over IP applications. The most important feature of this codec is that the G.711.1 bitstream can be transcoded into a G.711 bitstream by simple truncation. G.711.1 operates at 64, 80, and 96 kb/s, and is designed to achieve very short delay and low complexity. ITU-T evaluation results show that the codec fulfils all the requirements defined in the terms of reference. This article presents the codec requirements and design constraints, describes how standardization was conducted, and reports on the codec performance and its initial deployment.

21 citations


Journal ArticleDOI
TL;DR: The article presents the standardization goals and process, an overview of the coding algorithm, and the codec performance in various conditions, which makes the coder especially suitable for high-quality speech communication.
Abstract: G.729.1 is a scalable codec for narrowband and wideband conversational applications standardized by ITU-T Study Group 16. The motivation for the standardization work was to meet the new challenges of VoIP in terms of quality of service and efficiency in networks, in particular regarding the strategic rollout of wideband service. G.729.1 was designed to allow smooth transition from narrowband (300-3400 Hz) PSTN to high-quality wideband (50-7000 Hz) telephony by preserving backward compatibility with the widely deployed G.729 codec. The scalable structure allows gradual quality increase with bit rate. A low-delay mode makes the coder especially suitable for high-quality speech communication. The article presents the standardization goals and process, an overview of the coding algorithm, and the codec performance in various conditions.

15 citations


Patent
Robert W. Zopf1, Laurent Pilati1
06 Nov 2009
TL;DR: Packet loss concealment systems and methods are described in this paper that may be used in conjunction with a Bluetooth® Low-Complexity Subband Coding (LC-SBC) codec or other sub-band codecs, including but not limited to an MPEG-1 Audio Layer 3 (MP3) codec, an AAC codec, and a Dolby AC-3 codec.
Abstract: Packet loss concealment systems and methods are described that may be used in conjunction with a Bluetooth® Low-Complexity Sub-band Coding (LC-SBC) codec or other sub-band codecs, including but not limited to an MPEG-1 Audio Layer 3 (MP3) codec, an Advanced Audio Coding (AAC) codec, and a Dolby AC-3 codec.

8 citations


Patent
09 Sep 2009
TL;DR: In this article, an analog-to-digital converter (ADC) converts an analog signal into a digital signal by sampling the analog input signal at a codec platform sampling frequency, and an encoder generates a bit stream by compressing the digital signal provided by the sampling frequency converter.
Abstract: A codec platform apparatus which can perform encoding or decoding regardless of a sampling frequency supported by a codec platform is provided. The codec platform apparatus includes an analog-to-digital converter (ADC) converting an analog input signal into a digital signal by sampling the analog input signal at a codec platform sampling frequency; a sampling frequency converter converting the digital signal provided by the ADC into a digital signal having a codec sampling frequency; and an encoder generating a bit stream by compressing the digital signal provided by the sampling frequency converter. Since there is no need to adopt a new codec platform even when an existing codec platform does not support the sampling frequency of a new codec, there is no need to implant the new codec. Therefore, it is possible to improve user satisfaction.

6 citations


Proceedings Article
01 Jan 2009
TL;DR: A simple modification to the EVRC rate determination algorithm (EVRC RDA) is developed to exploit the noise-canceling property of differential microphone array to improve its performance in highly dynamic noise environment.
Abstract: Differential microphone array is known to have low sensitivity to distant sound sources. Such characteristics may be advantageous in voice activity detection where it can be assumed that the target speaker is close and background noise sources are distant. In this paper we develop a simple modification to the EVRC rate determination algorithm (EVRC RDA) to exploit the noise-canceling property of differential microphone array to improve its performance in highly dynamic noise environment. Comprehensive computer simulations show that the modified algorithm outperforms the original EVRC RDA in all tested noise conditions.

6 citations


Journal ArticleDOI
TL;DR: Phoneme scores and subjective ratings were significantly higher for the individualized-amplification setting than for the standard setting in both quiet and noise, and there were no significant differences among the cellular phone encoding strategies for any measure.
Abstract: Purpose: To compare multichannel amplification within a cellular phone system to a standard cellular phone response. Research Design: Three cellular phone speech-encoding strategies were evaluated: a narrow-band (3.5 kHz upper cutoff) enhanced variable-rate coder (EVRC), a narrow-band selectable mode vocoder (SMV), and a wide-band SMV (7.5 kHz cutoff). Because the SMV encoding strategies are not yet available on phones, the processing was simulated using a computer. Individualized-amplification settings were created using NAL-NL1 (National Acoustic Laboratories—Non-linear 1) targets. Overall gain was set at preferred listening levels for both the individualized-amplification setting and the standard cellular phone setting for each of the three encoders. Phoneme-recognition scores and subjective ratings (listening effort, overall quality) were obtained in quiet and in noise. Stimuli were played from loudspeakers in one room, picked up by a microphone connected to a (transmitting) computer, and sent over the Internet to a receiving computer in an adjacent room, where the signal was amplified and delivered monaurally. Study Sample: Fourteen participants with hearing loss. Results: Phoneme scores and subjective ratings were significantly higher for the individualizedamplification setting than for the standard setting in both quiet and noise. There were no significant differences among the cellular phone encoding strategies for any measure.

5 citations


Patent
09 Sep 2009
TL;DR: In this paper, a control method of a bandwidth scalable codec supporting different frequency bands is presented. But the control method is limited to the bandwidth switching section of the codec and is not suitable for the control of the entire codec.
Abstract: The present invention relates to a bandwidth scalable codec and a control method thereof. The control method of the bandwidth scalable codec supporting different frequency bands according to the present invention includes checking whether bandwidth switching has occurred by comparing bandwidths of signals corresponding to respective frames of an input signal, computing an interval of a bandwidth switching section and comparing the interval with a preset threshold value, when bandwidth switching has occurred, and controlling a frequency band of the input signal on the basis of the threshold value comparison result and increase or decrease of bandwidth in the bandwidth switching section. The bandwidth scalable codec lessens sound deterioration caused by a sudden bandwidth change, to thereby improve the overall quality of voice communication.

Proceedings ArticleDOI
01 Dec 2009
TL;DR: A multi-layer embedded speech and audio coding algorithm based on bit-plane coding and Scalar Quantized Vector Huffman Coding (SQVH) is proposed in this paper and has good performance compared with reference codec.
Abstract: A multi-layer embedded speech and audio coding algorithm based on bit-plane coding and Scalar Quantized Vector Huffman Coding (SQVH) is proposed in this paper. In this codec the signal sampled at 32 kHz can be coded in terms of scalable bit rates. The core codec is International Telecommunication Union Telecommunication Standardization Sector (ITU-T) G.729.1 which can process signal with 7 kHz bandwidth. Besides there are five extra bit-rates added and the bandwidth is extended to 14 kHz. The additional bit-rates include 36, 40, 48, 56, and 64kb/s. Some new methods used in the additional layers are proposed in this paper. The objective and subjective listening tests show that this codec has good performance compared with reference codec.

Proceedings ArticleDOI
11 Sep 2009
TL;DR: The encoding scheme described in this paper, for CS-ACELP speech codec provides on-fly security for the encoded data, which avoids the use of various complex encryption, and decryption algorithms in order to provide secured speech transmission.
Abstract: Data security is one of the major concerns for digital data as well as voice transmission through non-privatized networks. Hence, a significant amount of research work is needed to provide reliable and secured data/voice transmission. The CS-ACELP algorithm based G.729 codec is standardized as voice codec by ITU-T for multimedia and Voice over Internet Protocol (VoIP) applications. The encoding scheme described in this paper, for CS-ACELP speech codec provides on-fly (while processing the speech frame) security for the encoded data, which avoids the use of various complex encryption, and decryption algorithms in order to provide secured speech transmission. This method is based on the periodically changing the track sequence of fixed codebook index for every 100 ms speech samples sampled at a rate of 8000 samples per second and also, adapting a new technique to encode the fixed codebook index. This paper, presents an easy and efficient way of providing data security for G.729 codec. The experimental results show that similar speech quality can be obtained without compromising with additional cost and transmission bandwidth.

Proceedings ArticleDOI
14 Dec 2009
TL;DR: The simulation results show that the use of error concealment improves the parameter SNRs of these parameters, especially for Pitch delay, Delta delay and Fixed Codebook (FCB) gain.
Abstract: In digital mobile communication systems, speech coding is very important to increase the bandwidth efficiency Usually, speech coding algorithms determine speech parameters, which are highly sensitive to transmission errors In this paper, we use the residual redundancy remaining after using the Enhanced Variable Rate Codec (EVRC) algorithm for error concealment Average residual redundancies of the quantized parameters are exploited in the error concealment process as a priori knowledge of the source The simulation results show that the use of error concealment improves the parameter SNRs of these parameters, especially for Pitch delay, Delta delay and Fixed Codebook (FCB) gain The results also show that the more redundancy exists in the encoded parameter, the more improvement we could obtain by using the error concealment scheme

Proceedings ArticleDOI
30 Oct 2009
TL;DR: A new video codec which combines a new compressed sensing theory and some critical ideas of traditional video codecs is proposed, which is simpler, but high compression ratio and good quality of reconstructed video are obtained.
Abstract: In this paper, we propose a new video codec which combines a new compressed sensing theory and some critical ideas of traditional video codecs. Samples needed in the new codec are fewer than in traditional codecs, and the number of samples can be changed according to the encoding mode. The encoder works as a projector, which projects a high dimensional signal to a low dimensional domain, while the decoder works for the solution of underdetermined equations. The decoder is more complex than the encoder, and the process of decoding is no longer the inverse process of encoding. Compared with traditional codecs, the new codec is simpler, but high compression ratio and good quality of reconstructed video are obtained.

Proceedings ArticleDOI
18 Nov 2009
TL;DR: A coarse-grained functional partitioning method to balance the load of the encoder among the cores with small overhead of synchronization and a fast voice activity detection method to reduce the computational burdens are proposed.
Abstract: This paper proposes an efficient implementation scheme of the speech codec based on the embedded symmetric multiprocessor data signal processing platform. The base codec uses the G.729 algorithm. To satisfy the requirements of real time applications for efficient processing both audio and video simultaneously on multi-channels, this paper present some implementation techniques to optimize the speech coder. To exploit the characteristics of the embedded symmetric multiprocessor platform, we propose a coarse-grained functional partitioning method to balance the load of the encoder among the cores with small overhead of synchronization. In order to reduce the computational burdens, we propose a fast voice activity detection method. The experimental results show that the proposed scheme improves the performance of the speech codec greatly in terms of computational complexity.

Proceedings Article
01 Jan 2009
TL;DR: The overall performance of the automatic detection of pathologies is degraded less than 5 %, and that such degradation is not due to the codec itself, but to the bandwidth limitation needed at its input.
Abstract: Advances in speech signal analysis during the last decade have allowed the development of automatic algorithms for a non-invasive detection of laryngeal pathologies Bearing in mind the extension of these automatic methods to remote diagnosis scenarios, this paper analyzes the performance of a pathology detector based on Mel Frequency Cepstral Coefficients when the speech signal has undergone the distortion of a speech codec such as the GSM FR codec, which is used in one of the nowadays most widespread communications networks It is shown that the overall performance of the automatic detection of pathologies is degraded less than 5 %, and that such degradation is not due to the codec itself, but to the bandwidth limitation needed at its input These results indicate that the GSM system can be more adequate to implement remote voice assessment than the analogue telephone channel Index Terms: speech analysis, speech coding, voice function assessment

Proceedings ArticleDOI
31 Dec 2009
TL;DR: A candidate codec for ITU-T Embedded Variable Bit Rates (EV-VBR) standardization designed by Speech and Audio Signal Processing Laboratory (SASPL) of Beijing University of Technology is improved and the second sub-frame's spectral parameters are quantized additionally to improve the overall quantization precision.
Abstract: A candidate codec for ITU-T Embedded Variable Bit Rates (EV-VBR) standardization designed by Speech and Audio Signal Processing Laboratory (SASPL) of Beijing University of Technology is improved in this paper. In the improved codec, the second sub-frame's spectral parameters are quantized additionally to improve the overall quantization precision. Depth-first tree search algorithm replaces the full search and focus search in searching algebraic codebook. Transformation coding (TCX) process is redesigned and applied into the higher three layers which reduces the algorithm complexity dramatically. The objective mean opinion score (MOS) test results show that the improved codec achieves comparable speech quality as G.718 recommendation codec in most test items and has a lower algorithm delay.

Proceedings ArticleDOI
09 Jul 2009
TL;DR: The method of how to implement a G.729 codec filter on mediastreamer2 is presented and the experimental results show that the implementation of G.711a, speex and gsm codec filter is successful on the whole.
Abstract: Currently, the mediastreamer2 contains internal support for G.711u, G.711a, speex and gsm. However, it does not contain internal support for the popular G.729, which is known as the best audio codec ever. In this paper, we present the method of how to implement a G.729 codec filter on mediastreamer2. Then, we evaluate the performance of the G.729 filter in our platform. The experimental results show that our implementation of G.729 codec on mediastreamer2 is successful on the whole.

Journal Article
TL;DR: Multi-band Excitation(MBE) codec is a better choice among the speech coding algorithms at low rate and can synthesis better speech than the traditional codec at rate of 2.4~4.8Kb/s.
Abstract: The sources in communication especially the frequency source become more and more valuable with the development of information society.Then more and more speech compress coding technologies being to decrease bit rate in transmission and data storage quantity when in memory.After many years rapidly development of speech compression technology,it becomes more and more important.Multi-band Excitation(MBE) codec is a better choice among the speech coding algorithms at low rate.It can synthesis better speech than the traditional codec at rate of 2.4~4.8Kb/s.Its synthetic speech sounds naturally and can endure more noise.MBE codec is taken as the mayor object of research.Detailed research has been done from the aspects of speech analysis and synthesis.In the meantime,the VAD algorithm is applied to the MBE code,in order to reduce the code rate.At last,based on simulations,aiming at MBE codec,improved arithmetic is put forward.

01 Oct 2009
TL;DR: The RTP payload format of UEMCLIP, an enhanced speech codec of ITU-T G.711, has a scalable structure with an embedded u-law bitstream, also known as PCMU, thus providing a handy transcoding operation between narrowband and wideband speech.
Abstract: This document describes the RTP payload format of UEMCLIP, an enhanced speech codec of ITU-T G.711. The bitstream has a scalable structure with an embedded u-law bitstream, also known as PCMU, thus providing a handy transcoding operation between narrowband and wideband speech.

Proceedings ArticleDOI
David Gains1
TL;DR: Testing shows that the Iris-C codec is competitive with the Dirac low delay syntax codec which is typically regarded as the state-of-the-art low latency, lossless video compressor.
Abstract: Iris-C is an image codec designed for streaming video applications that demand low bit rate, low latency, lossless image compression. To achieve compression and low latency the codec features the discrete wavelet transform, Exp-Golomb coding, and online processes that construct dynamic models of the input video. Like H.264 and Dirac, the Iris-C codec accepts input video from both the YUV and YCOCG colour spaces, but the system can also operate on Bayer RAW data read directly from an image sensor. Testing shows that the Iris-C codec is competitive with the Dirac low delay syntax codec which is typically regarded as the state-of-the-art low latency, lossless video compressor.

Proceedings ArticleDOI
24 Nov 2009
TL;DR: This proposed work addressed the effectiveness of video compression, Analysis of MPEG-4 block matching algorithms and the CODEC development, and the results ofmpeg-4 and CodEC achieve high compression ratio, low computational complexity and with good resolution are encouraging.
Abstract: Due to rapid increase in data growth and tremendous advancement in digital video technology, it becomes necessary to minimize the amount of data to be transmitted. There are various compression-standards widely used to store and transmit data in efficient manner such as, MPEG-1, MPEG-2 and MPEG-4 standards address the audio, video and speech data effectively. MPEG-4 comprises the features of both standards as well object slicing. In this proposed work we addressed the effectiveness of video compression, Analysis of MPEG-4 block matching algorithms and the CODEC development. The results of MPEG-4 and CODEC achieve high compression ratio, low computational complexity and with good resolution are encouraging.

Proceedings ArticleDOI
23 Oct 2009
TL;DR: This paper enhances the G.722.2 codec by removing further redundancies in the multiple frames encapsulated in piggybacking and by having only one set of LP coefficients for all the subframes encapsulated.
Abstract: In this paper, we present the design of a new piggybacking algorithm for VoIP implemented using the G.722.2 codec. In piggybacking, multiple speech frames that include those transmitted in the past are encapsulated in a single packet. Because redundant copies of each frame are transmitted to the receiver, the receiver can recover those lost frames when one or more packets are lost or arrive late in their transmission. In this paper, we have enhanced the G.722.2 codec by removing further redundancies in the multiple frames encapsulated in piggybacking and by having only one set of LP coefficients for all the subframes encapsulated. We create multiple versions of the codec, each using a different frame size. Our new codec can encode the multiple frames with little degradation in PESQ, while having substantial bit savings. Its performance is evaluated against the original method of piggybacking over random losses, as well as that using packet traces collected in the PlanetLab.