scispace - formally typeset
Search or ask a question

Showing papers on "Adaptive Multi-Rate audio codec published in 2011"


Journal ArticleDOI
TL;DR: A new method for the bandwidth extension of telephone speech using frequency components added to the frequency band 4-8 kHz using only the information in the narrowband speech to improve speech quality and intelligibility.
Abstract: The limited audio bandwidth used in narrowband telephone systems degrades both the quality and the intelligibility of speech. This paper presents a new method for the bandwidth extension of telephone speech. Frequency components are added to the frequency band 4-8 kHz using only the information in the narrowband speech. A neural network is used to estimate the mel spectrum in the extension band in short time frames based on features calculated from the narrowband speech. A wideband excitation signal is generated by spectral folding from the narrowband linear prediction residual and a filter bank is utilized to divide the excitation into four sub-bands that cover the extension band. These sub-bands are weighted such that the estimated mel spectrum is realized. Bandwidth-extended speech is obtained by summing the weighted sub-bands and the original narrowband signal. Listening tests show that this new method improves speech quality compared with narrowband telephone speech and with a previously published bandwidth extension method.

82 citations


29 Sep 2011
TL;DR: The MPEG-4 High-Efficiency Advanced Audio Coding (AAC) standard is a single technology capable of compressing speech, speech mixed with music, or music signals with quality that is always at least as good as the best of two state-of-the-art reference codecs, one optimized for speech and mixed content (AMR-WB B;) and the other optimized for music and general audio (HE-AACv2).
Abstract: The MPEG Audio Subgroup has a rich history of accomplishments in creating music coding technology. At higher bit rates, MPEG technology can represent arbitrary sounds, including the human voice, with excellent quality. MPEG-1 and MPEG-2 Audio coders use perceptually shaped quantization noise as the primary tool for achieving compression. The MPEG-4 High-Efficiency Advanced Audio Coding (AAC) standard is a single technology capable of compressing speech, speech mixed with music, or music signals with quality that is always at least as good as the best of two state-of-the-art reference codecs, one optimized for speech and mixed content (AMR-WB B;) and the other optimized for music and general audio (HE-AACv2). This article provides an overview of the USAC architecture and summarizes the performance relative to the best state-of-the-art speech and audio codecs.

27 citations


01 Jun 2011
TL;DR: A method for the concealment of frame erasures in packet switched speech communication systems such as Voice over IP that is particularly tailored to the case of wireless transmission is proposed.
Abstract: We propose a method for the concealment of frame erasures in packet switched speech communication systems such as Voice over IP that is particularly tailored to the case of wireless transmission At the receiver, the concealment algorithm is assisted by specific side information that is communicated via a steganographic channel within the bitstream of the employed speech codec An exemplary implementation for the AMR codec is described and the obtained results are discussed

25 citations


Proceedings ArticleDOI
29 Nov 2011
TL;DR: W0065 present a new method for audio codec identification that does not require decoding of coded audio data, and utilizes randomness and chaotic characteristics of codedaudio to build statistical models that represent encoding process associated with different codecs.
Abstract: W0065 present a new method for audio codec identification that does not require decoding of coded audio data. The method utilizes randomness and chaotic characteristics of coded audio to build statistical models that represent encoding process associated with different codecs. The method is simple, as it does not assume knowledge on encoding structure of a codec. It is also fast, since it operates on a block of data, which is as small as a few kilobytes, selected randomly from the coded audio. Tests are performed to evaluate the effectiveness of the technique in identification of the codec used in encoding on both singly coded and transcoded audio samples

23 citations


Patent
Tomokazu Ishikawa1, Takeshi Norimatsu1, Haishan Zhong1, Kok Seng Chong1, Huan Zhou 
14 Jun 2011
TL;DR: In this paper, a hybrid audio decoder and a new hybrid audio encoder having block switching for speech signals and audio signals are proposed, which provides a combination of a low delay filter bank like AAC-ELD and a CELP coding method.
Abstract: Provided are a new hybrid audio decoder and a new hybrid audio encoder having block switching for speech signals and audio signals. Currently, very low bitrate audio coding methods for speech and audio signal are proposed. These audio coding methods cause very long delay. Generally, in coding an audio signal, algorithm delay tends to be long to achieve higher frequency resolution. In coding a speech signal, the delay needs to be reduced because the speech signal is used for telecommunication. To balance fine coding quality for these two kinds of input signals with very low bitrate, this invention provides a combination of a low delay filter bank like AAC-ELD and a CELP coding method.

14 citations


Proceedings Article
18 Jul 2011
TL;DR: The realized experiments have proved that although MP3 format is not optimal for speech compression it does not distort speech significantly especially for high or moderate bit rates and high quality of source data.
Abstract: This paper presents the study of speech recognition accuracy with respect to different levels of MP3 compression. Special attention is focused on the processing of speech signals with different quality, i.e. with different level of background noise and channel distortion. The work was motivated by possible usage of ASR for offline automatic transcription of audio recordings collected by standard wide-spread MP3 devices. The realized experiments have proved that although MP3 format is not optimal for speech compression it does not distort speech significantly especially for high or moderate bit rates and high quality of source data. The accuracy of connected digits ASR decreased consequently very slowly up to the bit rate 24 kbps. For the best case of PLP parameterization in close-talk channel just 3% decrease of recognition accuracy was observed while the size of the compressed file was approximately 10% of the original size. All results were slightly worse under presence of additive background noise and channel distortion in a signal but achieved accuracy was also acceptable in this case especially for PLP features.

13 citations


Journal ArticleDOI
TL;DR: The aim of this paper is to improve the G.711 standard, which is widely used, especially in the public switched telephone network (PSTN), and two solutions are proposed, which fulfill all the requirements for that new standard and can be implemented in its low-frequency part.
Abstract: The aim of this paper is to improve the G.711 standard, which is widely used, especially in the public switched telephone network (PSTN). Two solutions are proposed. The first solution uses only lossless coder, achieving a bit-rate decrease of 0.82 bits/sample, compared to the G.711 codec. The second solution uses forward adaptation and a lossless coder, further decreasing the bit-rate (by 1.25 bits/sample) and achieving higher average signal-to-quantization noise ratio (SQNR) in comparison with the G.711 codec. Also, the second solution is more robust than the G.711 codec, which means that it has near constant SQNR for a wide range of input signal power. That is very important for signals whose input power varies with time, such as speech and video signals. Our solutions are compatible with the G.711 codec, they have little additional complexity and delay and therefore can be applied in real-time systems, such as PSTN or VoIP. They can also be used in many other systems, such as WiMax and OFDM, as a replacement or improvement of the G.711 codec. Standardization process of the G.711.1 standard (which is a wide-band extension of the G.711 standard) is largely present. Our solutions fulfill all the requirements for that new standard; therefore they can be implemented in its low-frequency part.

13 citations


Journal ArticleDOI
TL;DR: A novel video codec has new coding tools such as an intra prediction with offset, integer sine transform, and enhanced block‐based adaptive loop filter that are used adaptively in the processing of intra prediction,transform, and loop filtering.
Abstract: We present a novel video codec for supporting entertainment-quality video. It has new coding tools such as an intra prediction with offset, integer sine transform, and enhanced block-based adaptive loop filter. These tools are used adaptively in the processing of intra prediction, transform, and loop filtering. In our experiments, the proposed codec achieved an average reduction of 13.35% in BD-rate relative to H.264/AVC for 720p sequences.

11 citations


Journal ArticleDOI
31 Aug 2011-Sensors
TL;DR: It is shown from the experiments that a speech streaming application employing the proposed ARST approach significantly improves speech quality under packet loss conditions in WMSNs.
Abstract: An adaptive redundant speech transmission (ARST) approach to improve the perceived speech quality (PSQ) of speech streaming applications over wireless multimedia sensor networks (WMSNs) is proposed in this paper. The proposed approach estimates the PSQ as well as the packet loss rate (PLR) from the received speech data. Subsequently, it decides whether the transmission of redundant speech data (RSD) is required in order to assist a speech decoder to reconstruct lost speech signals for high PLRs. According to the decision, the proposed ARST approach controls the RSD transmission, then it optimizes the bitrate of speech coding to encode the current speech data (CSD) and RSD bitstream in order to maintain the speech quality under packet loss conditions. The effectiveness of the proposed ARST approach is then demonstrated using the adaptive multirate-narrowband (AMR-NB) speech codec and ITU-T Recommendation P.563 as a scalable speech codec and the PSQ estimation, respectively. It is shown from the experiments that a speech streaming application employing the proposed ARST approach significantly improves speech quality under packet loss conditions in WMSNs.

10 citations


Proceedings ArticleDOI
08 Apr 2011
TL;DR: The implementation of CELP CODEC and its analytical evaluation of performance in terms of bit rate, coding delay and Quality of speech are discussed.
Abstract: Factors serving as constraints in today's wireless communication system include bandwidth and power. In wireless systems that require the transmission of speech, these goals are addressed by developing efficient methods of reducing the amount of information required to transmit and receive quality speech. For this reason, speech coding has been, and remains, the topic of aggressive research. This paper discusses the implementation of CELP CODEC and its analytical evaluation of performance in terms of bit rate, coding delay and Quality of speech. The CELP coder is one of the best methods for producing high quality speech at bit rates between 4.8 and 9.6 Kbps.

9 citations


Proceedings ArticleDOI
06 Jul 2011
TL;DR: The experimental results show that the presented Wyner-Ziv video codec incorporating the proposed technique yields significant and systematic compression gains of up to 23.22% with respect to the state-of-the-art DISCOVER codec.
Abstract: In contrast to traditional predictive coding, Wyner-Ziv video coding enables low-cost encoding architectures, in which the computationally expensive tasks for performing motion estimation are shifted to the decoder-side. In Wyner-Ziv video coding, side-information generation is a key aspect profoundly affecting the compression capacity of the system. This paper presents a novel technique which enables side-information refinement after DC coefficient band decoding in a transform-domain Wyner-Ziv video codec. The proposed side-information refinement approach performs overlapped block motion estimation and compensation, utilizing multi-hypothesis pixel-based prediction. The experimental results show that the presented Wyner-Ziv video codec incorporating the proposed technique yields significant and systematic compression gains of up to 23.22% with respect to the state-of-the-art DISCOVER codec.

Patent
14 Jun 2011
TL;DR: In this article, a hybrid audio decoder and a new hybrid audio encoder having block switching for speech signals and audio signals are proposed, which provides a combination of a low delay filter bank like AAC-ELD and a CELP coding method.
Abstract: Provided are a new hybrid audio decoder and a new hybrid audio encoder having block switching for speech signals and audio signals. Currently, very low bitrate audio coding methods for speech and audio signal are proposed. These audio coding methods cause very long delay. Generally, in coding an audio signal, algorithm delay tends to be long to achieve higher frequency resolution. In coding a speech signal, the delay needs to be reduced because the speech signal is used for telecommunication. To balance fine coding quality for these two kinds of input signals with very low bitrate, this invention provides a combination of a low delay filter bank like AAC-ELD and a CELP coding method.

Proceedings ArticleDOI
TL;DR: Simulation results are presented for a variety of bit rates, resolutions and coding configurations to demonstrate the high compression efficiency achieved by the proposed video codec at moderate level of encoding and decoding complexity.
Abstract: This paper describes video coding technology proposal submitted by Qualcomm Inc. in response to a joint call for proposal (CfP) issued by ITU-T SG16 Q.6 (VCEG) and ISO/IEC JTC1/SC29/WG11 (MPEG) in January 2010. Proposed video codec follows a hybrid coding approach based on temporal prediction, followed by transform, quantization, and entropy coding of the residual. Some of its key features are extended block sizes (up to 64x64), recursive integer transforms, single pass switched interpolation filters with offsets (single pass SIFO), mode dependent directional transform (MDDT) for intra-coding, luma and chroma high precision filtering, geometry motion partitioning, adaptive motion vector resolution. It also incorporates internal bit-depth increase (IBDI), and modified quadtree based adaptive loop filtering (QALF). Simulation results are presented for a variety of bit rates, resolutions and coding configurations to demonstrate the high compression efficiency achieved by the proposed video codec at moderate level of encoding and decoding complexity. For random access hierarchical B configuration (HierB), the proposed video codec achieves an average BD-rate reduction of 30.88c/o compared to the H.264/AVC alpha anchor. For low delay hierarchical P (HierP) configuration, the proposed video codec achieves an average BD-rate reduction of 32.96c/o and 48.57c/o, compared to the H.264/AVC beta and gamma anchors, respectively.

Proceedings ArticleDOI
29 Mar 2011
TL;DR: The results show perceptual evaluation of speech quality (PESQ) of the MFCC-based codec matches the state-of-the-art MELPe codec at 600 bps and exceeds the CELP codec at 2000 -- 4000 bps coding rates.
Abstract: In this paper, we propose a low bit-rate speech codec based on a hybrid scalar/vector quantization of the mel-frequency cepstral coefficients (MFCCs). We begin by showing that if a high-resolution mel-frequency cepstrum (MFC) is computed, good-quality speech reconstruction is possible from the MFCCs despite the lack of explicit phase information. By evaluating the contribution toward speech quality that individual MFCCs make and applying appropriate quantization, our results show perceptual evaluation of speech quality (PESQ) of the MFCC-based codec matches the state-of-the-art MELPe codec at 600 bps and exceeds the CELP codec at 2000 -- 4000 bps coding rates. The main advantage of the proposed codec is in distributed speech recognition (DSR) since speech features based on MFCCs can be directly obtained from code words thus eliminating additional decode and feature extract stages.

Patent
24 Mar 2011
TL;DR: In this article, an integrated echo canceller and speech codec for voice-over-internet (VoI) protocol is presented. But the speech codec is not integrated with the echo canceler.
Abstract: A method includes operating an integrated echo canceller and speech codec for voice-over internet protocol. An apparatus includes an echo canceller and a speech codec, wherein the speech codec includes a decoder and an encoder, and wherein the echo canceller and the speech codec are integrated for voice-over-internet protocol.

Proceedings ArticleDOI
01 Nov 2011
TL;DR: The simulation results show that when all the improvement schemes are combined, the performance is improved at all the bit rates compared to the previous results despite the fact that the Huffman table structure is significantly simplified.
Abstract: The internet Low Bit-rate Codec (iLBC) inherently possesses high robustness to packet loss which is one of the essential properties of Voice over Internet Protocol (IP) applications. Another important feature is the rate flexibility, which allows the speech codec to adapt its bit rate to constantly changing network condition. Previously, the multi-rate operation of the iLBC was enabled by utilizing the Discrete Cosine Transform (DCT) and entropy coding. In this paper, various approaches to improve performance are presented. The simulation results show that when all the improvement schemes are combined, the performance is improved at all the bit rates compared to the previous results despite the fact that the Huffman table structure is significantly simplified.

Patent
21 Dec 2011
TL;DR: In this article, a method for measuring voice quality in a wireless communication network includes measuring an MOS of a signal using a narrowband voice codec and a MOS using a wideband voice decoder in a cable loopback environment.
Abstract: A method for measuring voice quality in a wireless communication network includes measuring an MOS of a signal using a narrowband voice codec and an MOS of a signal using a wideband voice codec in a cable loopback environment, calculating a wideband voice codec correction coefficient using the measured MOS, measuring an MOS of a signal using the narrowband voice codec and an MOS of a signal using the wideband voice codec in a terminal connection environment; and outputting a value obtained by adding the wideband voice codec correction coefficient to the measured MOS in the terminal connection environment.

Proceedings ArticleDOI
24 May 2011
TL;DR: This paper provides a scheme where Speex audio codec was used in a wireless mobile communication network for the transmission of voice and it is stated that Speex presents itself as a better alternative to traditional AMR technology in the wireless field.
Abstract: This paper provides a scheme where Speex audio codec was used in a wireless mobile communication network for the transmission of voice. The data gathered is visualised and compared with traditional AMR audio codec. The document concludes by stating that Speex presents itself as a better alternative to traditional AMR technology in the wireless field.

Proceedings ArticleDOI
03 Mar 2011
TL;DR: A novel error concealment method for AMR-WB codec is presented, using excitation search constraint at the encoder and a linear prediction of the pitch and gains of the adaptive codebook and the innovative codebook at the decoder to enhance speech quality after a frame erasure.
Abstract: This paper presents a novel error concealment method for AMR-WB codec, using excitation search constraint at the encoder and a linear prediction of the pitch and gains of the adaptive codebook and the innovative codebook at the decoder to enhance speech quality after a frame erasure. The experimental results demonstrate that the proposed method achieves performance improvement over the existing methods.


Proceedings ArticleDOI
18 Nov 2011
TL;DR: This work considers audio coding with a pre- and post-filtered predictive structure that was recently proven to be asymptotically optimal in the rate-distortion sense, and shows that this audio coding is efficient in achieving the state-of-the-art performance.
Abstract: A natural approach to audio coding is to use a rate-distortion optimal design combined with a perceptual model. While this approach is common in transform coding, existing predictive-coding based audio coders are generally not optimal and they benefit from heuristically motivated post-filtering. As delay requirements often force the use of predictive coding, we consider audio coding with a pre- and post-filtered predictive structure that was recently proven to be asymptotically optimal in the rate-distortion sense [1]. We show that this audio coding is efficient in achieving the state-of-the-art performance. We also show that the pre-filter plays a relatively minor role. This leads to an analytic approach for optimizing the post-filter and the predictor at each rate, eliminating the need for manual re-tuning whenever a different rate is called for. In a subjective test, the theoretically optimized post-filter provided a better performance than a conventional post-filter.

Proceedings ArticleDOI
22 May 2011
TL;DR: An enhanced long term predictor (eLTP) that effectively utilizes periodic redundancies of inter- and intra- time frames is proposed that proves the superiority of the proposed algorithm compared to the reference codec.
Abstract: Unified Speech and Audio Coding (USAC) is an emerging MPEG audio standard striving for efficiently representing both speech and music signals even in very low bitrate ranges. The reference codec takes an approach of unifying two state-of-the-art speech and audio coding structures in a single platform. This paper proposes an enhanced long term predictor (eLTP) that effectively utilizes periodic redundancies of inter- and intra- time frames. Experimental results with various types of input signals confirm the superiority of the proposed algorithm compared to the reference codec.

Proceedings ArticleDOI
Xing Fan1, Michael L. Seltzer1, Jasha Droppo1, Henrique S. Malvar1, Alex Acero1 
22 May 2011
TL;DR: A new transform speech codec that jointly encodes a wideband waveform and its corresponding wideband and narrowband speech recognition features and good quality speech is obtained for playback and transcription, with PESQ scores ranging from 3.2 to 3.4.
Abstract: We propose a new transform speech codec that jointly encodes a wideband waveform and its corresponding wideband and narrowband speech recognition features. For distributed speech recognition, wideband features are compressed and transmitted as side information. The waveform is then encoded in a manner that exploits the information already captured by the speech features. Narrowband speech acoustic features can be synthesized at the server by applying a transformation to the decoded wideband features. An evaluation conducted on an in-car speech recognition task show that at 16 kbps our new system typically shows essentially no impact in word error rate compared to uncompressed audio, whereas the standard transform codec produces up to a 20% increase in word error rate. In addition, good quality speech is obtained for playback and transcription, with PESQ scores ranging from 3.2 to 3.4.


Patent
21 Feb 2011
TL;DR: In this article, a method and apparatus for transmitting video content compressed by a codec to a second device is provided, where the codec selection request frame includes an identifier of at least one codec to be used to compress the video content and requests approval of the use of the at least 1 codec.
Abstract: A method and apparatus of transmitting video content compressed by a codec to a second device is provided. The method includes: transmitting a codec selection request frame to the second device, the codec selection request frame includes an identifier of at least one codec to be used to compress the video content and requests approval of the use of the at least one codec; receiving a codec selection response frame from the second device, the codec selection response frame includes approval information indicating whether the use of the at least one codec is approved; and transmitting video content compressed by the at least one codec to the second device based on the codec selection response frame.


Proceedings ArticleDOI
26 Jul 2011
TL;DR: Two FEC schemes are proposed which take the advantage of the codec's structure characteristics and do not introduce extra delay to enhance the robustness of packet loss recovery for AVS Mobile speech and audio (AVS-M) codec.
Abstract: In this paper, we utilize sender-based Forward Error Correction (FEC) techniques to enhance the robustness of packet loss recovery for AVS Mobile speech and audio (AVS-M) codec. Two FEC schemes are proposed which take the advantage of the codec's structure characteristics and do not introduce extra delay. The objective and subjective listening tests results show that the two methods achieve higher reconstructed quality than the codec's original frame erasure scheme in the case of packet loss.

Patent
11 Apr 2011
TL;DR: In this paper, a low power audio codec with an audio buffer is described, where the audio codec is configured to provide the stored decoded audio data to an audio device while decoding audio data is not being received from the processor.
Abstract: Systems, methods, and other embodiments associated with a low power audio codec are described. According to one embodiment, an apparatus includes an audio codec having an audio buffer configured to store decoded audio data received from a processor. The audio codec is configured to provide the stored decoded audio data to an audio device while decoded audio data is not being received from the processor. According to another embodiment, a method includes receiving a request for decoded audio data from an audio codec with an audio buffer, entering a RUN mode and providing decoded audio data stored in processor memory to the audio codec for storage in the audio buffer. After receiving a buffer full signal from the audio codec the method includes entering a low power mode while the audio codec provides an audio signal to an audio device.


Journal ArticleDOI
TL;DR: The results show that audio codec G.722 with MP4V-ES generates good video quality over VoIP using wireless local area network, Whereas audio codecG.726 16 with H.261 generates low rate video and voice quality performance.
Abstract: This study evaluates video codec performance over VoIP using a campus wireless network. Today, the deployment of VoIP occurs in various platforms, including VoIP over LAN, VoIP over WAN and VoIP over VPN. Therefore, this study defines which video codec provides good video quality over VoIP transmission. The soft phone is used as a medium for communication between two parties. A network management system is used to evaluate and capture the video quality performance over VoIP. The quality of video codec is based on MOS, jitter, delay and packet loss. The experimental scope is limited to G.722 with MP4V-ES, G.726 16 with H.261 and G.726 24 with H.264. The results show that audio codec G.722 with MP4V-ES generates good video quality over VoIP using wireless local area network. Whereas audio codec G.726 16 with H.261 generates low rate video and voice quality performance. Therefore, using the appropriate video and audio, the codec selection increases video quality over VoIP transmission.