Showing papers on "Adaptive Multi-Rate audio codec published in 2012"

PDF

Open Access

[...]

01 Sep 2012

TL;DR: This document describes the Opus codec, designed for interactive speech and audio transmission over the Internet.

...read moreread less

Abstract: This document describes the Opus codec, designed for interactive speech and audio transmission over the Internet.

...read moreread less

212 citations

Journal Article•DOI•

Steganography Integration Into a Low-Bit Rate Speech Codec

[...]

Yongfeng Huang¹, Chenghao Liu², Shanyu Tang, Sen Bai²•Institutions (2)

Tsinghua University¹, Chongqing Communication Institute²

01 Dec 2012-IEEE Transactions on Information Forensics and Security

TL;DR: A new algorithm is proposed for steganography in low bit-rate VoIP audio streams by integrating information hiding into the process of speech encoding, thus maintaining synchronization between information hiding and speech encoding.

...read moreread less

Abstract: Low bit-rate speech codecs have been widely used in audio communications like VoIP and mobile communications, so that steganography in low bit-rate audio streams would have broad applications in practice. In this paper, the authors propose a new algorithm for steganography in low bit-rate VoIP audio streams by integrating information hiding into the process of speech encoding. The proposed algorithm performs data embedding while pitch period prediction is conducted during low bit-rate speech encoding, thus maintaining synchronization between information hiding and speech encoding. The steganography algorithm can achieve high quality of speech and prevent detection of steganalysis, but also has great compatibility with a standard low bit-rate speech codec without causing further delay by data embedding and extraction. Testing shows, with the proposed algorithm, the data embedding rate of the secret message can attain 4 bits/frame (133.3 bits/second).

...read moreread less

109 citations

Journal Article•

MPEG Unified Speech and Audio Coding - The ISO/MPEG Standard for High-Efficiency Audio Coding of All Content Types

[...]

26 Apr 2012-Journal of The Audio Engineering Society

TL;DR: All aspects of this standardization eort are outlined, starting with the history and motivation of the MPEG work item, describing all technical features of the nal system, and further discussing listening test results and performance numbers which show the advantages of the new system over current state-of-the-art codecs.

...read moreread less

Abstract: In early 2012 the ISO/IEC JTC1/SC29/WG11 (MPEG) nalized the new MPEG-D Unied Speech and Audio Coding standard The new codec brings together the previously separated worlds of general audio coding and speech coding It does so by integrating elements from audio coding and speech coding into a unied system The present publication outlines all aspects of this standardization eort, starting with the history and motivation of the MPEG work item, describing all technical features of the nal system, and further discussing listening test results and performance numbers which show the advantages of the new system over current state-of-the-art codecs

...read moreread less

88 citations

Proceedings Article•DOI•

Video codec identification

[...]

Paolo Bestagini¹, Ahmed A. Allam¹, Simone Milani¹, Marco Tagliasacchi¹, Stefano Tubaro¹ - Show less +1 more•Institutions (1)

Polytechnic University of Milan¹

25 Mar 2012

TL;DR: This paper considers a processing chain of two coding steps and proposes a method that aims at identifying the type of codec used in the first step, by analyzing its coding-based footprints, based on the fact that lossy coding is an almost idempotent operation.

...read moreread less

Abstract: Video content is routinely acquired and distributed in digital format. Therefore, it is customary to have the content encoded multiple times. In this paper we consider a processing chain of two coding steps and we propose a method that aims at identifying the type of codec used in the first step, by analyzing its coding-based footprints. The method relies on the fact that lossy coding is an almost idempotent operation, i.e., re-encoding the reconstructed sequence with the same codec and coding parameters produces a sequence that is highly correlated with the input one. As a consequence, it is possible to analyze this sort of correlation to identify the first codec provided that the second codec does not introduce severe quality degradation. The proposed solution finds several applications in the field of multi-media forensics, e.g. to identify the device that generated the original video stream or detect collages of different sequences.

...read moreread less

59 citations

Journal Article•DOI•

A new scheme for covert communication via 3G encoded speech

[...]

Haibo Miao¹, Liusheng Huang¹, Zhili Chen¹, Wei Yang¹, Ammar Al-Hawbani¹ - Show less +1 more•Institutions (1)

University of Science and Technology of China¹

01 Nov 2012-Computers & Electrical Engineering

TL;DR: An adaptive suboptimal pulse combination constrained (ASOPCC) method is presented to embed data on compressed speech signal of AMR-WB codec, which takes advantage of the ''redundancy'', created by non-exhaustive search of algebraic codebook, to encode secret information.

...read moreread less

46 citations

Journal Article•DOI•

Objective assessment of the WebP image coding algorithm

[...]

Giaime Ginesu¹, Maurizio Pintus¹, Daniele D. Giusto¹•Institutions (1)

University of Cagliari¹

01 Sep 2012-Signal Processing-image Communication

TL;DR: An objective evaluation of WebP is provided, by comparing it with the JPEG family algorithms, and it appears that the performance of the proposed codec is in line with that of the alternative methods, without achieving any major improvement and lacking several features.

...read moreread less

Abstract: Research on multimedia data coding is allowing for better performance in terms of compression ratio, coding features, and robustness against transmission errors. While rate-distortion performance is being improved at a slower pace if compared to what we were used to up to a decade ago, remarkable advances are being made by adding complex features, such as fast adaptive transforms, lossy to lossless coding, compressed domain processing, etc. One of the latest codec which is expected to improve on the state of the art is the WebP algorithm released by Google. In this paper we provide an objective evaluation of WebP, by comparing it with the JPEG family algorithms. From the results it appears that the performance of the proposed codec is in line with that of the alternative methods, without achieving any major improvement and lacking several features.

...read moreread less

42 citations

Patent•

Frame erasure concealment for a multi-rate speech and audio codec

[...]

Greer Steven Craig¹, Ho-Sang Sung¹•Institutions (1)

Samsung¹

11 Apr 2012

TL;DR: In this article, an audio coding terminal is presented, which includes a coding mode setting unit to set an operation mode, from plural operation modes, for input audio coding by a codec, configured to code the input audio based on the set operation mode.

...read moreread less

Abstract: An audio coding terminal and method is provided. The terminal includes a coding mode setting unit to set an operation mode, from plural operation modes, for input audio coding by a codec, configured to code the input audio based on the set operation mode such that when the set operation mode is a high frame erasure rate (FER) mode the codec codes a current frame of the input audio according to a select frame erasure concealment (FEC) mode of one or more FEC modes. Upon the setting of the operation mode to be the High FER mode, the one FEC mode is selected, from the one or more FEC modes predetermined for the High FER mode, to control the codec by incorporating of redundancy within a coding of the input audio or as separate redundancy information separate from the coded input audio according to the selected one FEC mode.

...read moreread less

35 citations

Journal Article•DOI•

Low Bit-Rate Speech Coding Through Quantization of Mel-Frequency Cepstral Coefficients

[...]

Laura E. Boucheron¹, P.L. De Leon¹, Steven Sandoval¹•Institutions (1)

New Mexico State University¹

01 Feb 2012-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The results show that the MFCC-based codec exceeds the state-of-the-art MELPe codec across the entire range of 600-2400 bps, when evaluated with the perceptual evaluation of speech quality (PESQ) (ITU-T recommendation P.862).

...read moreread less

Abstract: In this paper, we propose a low bit-rate speech codec based on vector quantization (VQ) of the mel-frequency cepstral coefficients (MFCCs) We begin by showing that if a high-resolution mel-frequency cepstrum (MFC) is computed, good-quality speech reconstruction is possible from the MFCCs despite the lack of phase information By evaluating the contribution toward speech quality that individual MFCCs make and applying appropriate quantization, our results show that the MFCC-based codec exceeds the state-of-the-art MELPe codec across the entire range of 600-2400 bps, when evaluated with the perceptual evaluation of speech quality (PESQ) (ITU-T recommendation P862) The main advantage of the proposed codec is in distributed speech recognition (DSR) since the MFCCs can be directly applied thus eliminating additional decode and feature extract stages; furthermore, the proposed codec better preserves the fidelity of MFCCs and better word accuracy rates as compared to CELP and MELPe codecs

...read moreread less

27 citations

Proceedings Article•DOI•

Highly accurate non-intrusive speech forensics for codec identifications from observed decoded signals

[...]

Frank Jenner¹, Andres Kwasinski¹•Institutions (1)

Rochester Institute of Technology¹

25 Mar 2012

TL;DR: A novel scheme for non-intrusive identification of speech codecs based upon comparing a profile of a set of noise spectra and a time-domain histogram from the decoded speech to those from the candidate codecs.

...read moreread less

Abstract: The ability to detect a particular speech codec from only the decoded audio has several useful forensic and system performance improvement applications. This paper presents a novel scheme for non-intrusive identification of speech codecs. The identification approach is based upon comparing a profile of a set of noise spectra and a time-domain histogram from the decoded speech to those from the candidate codecs. The presented results show a very high accuracy in identifying speech contemporary codecs from a diverse set of types and encoding rates. The presented codec identification scheme has a very low misidentification rate, including in the high coding rate regime where it improves on previous works by achieving perfect identification. This performance is achieved while reducing the duration of the analysis window of speech from 2 minutes to only 4 seconds.

...read moreread less

25 citations

Patent•

Voice service solutions for flexible bandwidth systems

[...]

Yuheng Huang¹, Bongyong Song¹, Edwin Chongwoo Park¹, Samir S. Soliman¹, Soumya Das¹, Olunfunmilola O. Awoniyi¹ - Show less +2 more•Institutions (1)

Qualcomm¹

07 Nov 2012

TL;DR: In this article, the authors describe tools and techniques for providing services, such as voice services, within flexible bandwidth systems, including scaling spreading factors (with rate matching tuning in some embodiments), multi-code transmission, code rate increases, AMR codec rate adjustments, and higher order modulation.

...read moreread less

Abstract: Methods, systems, and devices are disclosed for providing services, such as voice services, within flexible bandwidth systems. In general, the scaling of one or more aspects of a flexible bandwidth system may be compensated for through altering one or more aspects within a code domain. The tools and techniques may include scaling spreading factors (with rate matching tuning in some embodiments), multi-code transmission, code rate increases, AMR codec rate adjustments, and/or higher order modulation. Subframe decoding approaches for the reception scheme may also be utilized. These tools and techniques can be flexibly implemented on the mobile device and/or base station side. Some embodiments may also minimize the latency introduced by the transmission and/or reception process. Flexible bandwidths systems may utilize portions of spectrum that may be too big or too small to fit a normal bandwidth waveform.

...read moreread less

20 citations

Patent•

Codec selection for wireless communication

[...]

Siddharth S. Oroskar¹•Institutions (1)

Sprint Corporation¹

29 Aug 2012

TL;DR: In this article, a radio access network (RAN) device may receive a call setup request from a wireless communication device (WCD) seeking to set up a call, and it may be determined that the WCD is capable of conducting the call using either a first codec or a second codec.

...read moreread less

Abstract: A radio access network (RAN) device may receive a call setup request from a wireless communication device (WCD) seeking to set up a call. In response to receiving the call setup request, it may be determined that (i) the WCD is capable of conducting the call using either a first codec or a second codec, (ii) the first codec is capable of using a sampling rate of over 8,000 Hz, and (iii) the second codec uses sampling rates of 8,000 Hz or less. Based at least on this determination, the WCD may be instructed to set up the call using the second codec.

...read moreread less

Patent•

Method for Parametric Spatial Audio Coding and Decoding, Parametric Spatial Audio Coder and Parametric Spatial Audio Decoder

[...]

David Virette¹, Yue Lang¹, Jianfeng Xu¹•Institutions (1)

Huawei¹

05 Apr 2012

TL;DR: In this article, a method for parametric spatial audio coding of a multi-channel audio signal comprising a plurality of audio channel signals is provided, the method comprising: calculating at least two different spatial coding parameters for an audio channel signal of the plurality of radio channel signals, selecting at least one spatial coding parameter of the at least three different spatial encoding parameters associated with the audio channels, and setting a parameter type flag in the parameter section of the audio bitstream indicating the type of the selected spatial encoding parameter being included into the audio bitsstream.

...read moreread less

Abstract: A method for parametric spatial audio coding of a multi-channel audio signal comprising a plurality of audio channel signals is provided, the method comprising: calculating at least two different spatial coding parameters for an audio channel signal of the plurality of audio channel signals, selecting at least one spatial coding parameter of the at least two different spatial coding parameters associated with the audio channel signal on the basis of the values of the calculated spatial coding parameters; including a quantized representation of the selected spatial coding parameter into a parameter section of an audio bitstream; and setting a parameter type flag in the parameter section of the audio bitstream indicating the type of the selected spatial coding parameter being included into the audio bitstream.

...read moreread less

Proceedings Article•DOI•

Non intrusive codec identification algorithm

[...]

Dushyant Sharma¹, Patrick A. Naylor¹, Nikolay D. Gaubitch¹, Mike Brookes¹•Institutions (1)

Imperial College London¹

25 Mar 2012

TL;DR: The method uses a number of speech features which are then used to train a CART classifier and can identify a codec and its bit rate to an accuracy of 92% and detect the presence of a codec with a accuracy of 97% at -5 dB SNR.

...read moreread less

Abstract: We present a non-intrusive data driven method for codec detection and identification in the presence of background noise. The method uses a number of speech features which are then used to train a CART classifier. We demonstrate the performance of the method using several different noise types over a wide range of SNRs. Our results show that we can identify a codec and its bit rate to an accuracy of 92% and we are able to detect the presence of a codec with an accuracy of 97% at −5 dB SNR.

...read moreread less

Patent•

Media codec devices providing universality for encoded signal origination and decided signal distribution

[...]

Barrett Kreiner¹, Ryan Schaub¹•Institutions (1)

AT&T¹

18 May 2012

TL;DR: In this paper, a media codec device includes an input port that receives media signals from service providers and also receives power for the media codec devices, and a decoder that stores decoder keys associated with the service providers.

...read moreread less

Abstract: A media codec device includes an input port that receives media signals from service providers and also receives power for the media codec device, and a decoder that stores decoder keys associated with the service providers. The decoder applies the decoder keys to decode the media signals. The media codec device operates to direct at a first time a decoded first media signal to a first output device and a decoded second media signal to a second output device, and to direct at a second time the decoded first media signal to the second output device and the decoded second media signal to the first output device.

...read moreread less

Proceedings Article•DOI•

Enhanced Principal Component Using Polar Coordinate PCA for Stereo Audio Coding

[...]

Shi Dong¹, Ruimin Hu¹, Weiping Tu¹, Xiang Zheng¹, Junjun Jiang¹, Song Wang¹ - Show less +2 more•Institutions (1)

Wuhan University¹

09 Jul 2012

TL;DR: It has been proved that when multiple sound sources exist with different directions, proposed method is better than the conventional PCA method in certain conditions, and a stereo codec based on PC-PCA has been proposed to validate the performance improvement of proposed method.

...read moreread less

Abstract: High efficiency audio compression is the basic technology in audio involved multimedia application. Down mixing and parametric coding are efficient coding scheme with widely applications in some up to date audio codecs such as PS in EAAC+ and MPEG-Surround, and PCA stereo coding followed this idea to map two channels to one channel with maximum energy and parameterize the secondary channel. This paper investigates the conventional PCA method performance under general stereo model with multiple sound sources and different directions, and then proposes a Polar Coordinate based PCA (PC-PCA) stereo coding method. It has been proved that when multiple sound sources exist with different directions, proposed method is better than the conventional PCA method in certain conditions. A stereo codec based on PC-PCA has also been proposed to validate the performance improvement of proposed method.

...read moreread less

Patent•

Dynamic video switching

[...]

Xin Fang¹, Wei Shi¹, Gerald Paul Michalak¹•Institutions (1)

Qualcomm¹

19 Jan 2012

TL;DR: In this article, a dynamic codec allocation method is provided, which includes receiving a plurality of datastreams and determining a respective codec loading factor for each of the data-streams.

...read moreread less

Abstract: In an example, a dynamic codec allocation method is provided. The method includes receiving a plurality of datastreams and determining a respective codec loading factor for each of the datastreams. The datastreams are assigned to codecs, in order by respective codec loading factor, starting with the highest respective codec loading factor. Initially, the datastreams are assigned to a hardware codec, until the hardware codec is loaded to substantially maximum capacity. If the hardware codec is loaded to substantially maximum capacity, the remaining datastreams are assigned to a software codec. As new datastreams are received, the method repeats, and previously-assigned datastreams can be reassigned from a hardware codec to a software codec, and vice versa, based on their relative codec loading factors.

...read moreread less

Book Chapter•DOI•

An Adaptive Codec Switching Scheme for SIP-Based VoIP

[...]

Ismet Aktas¹, Florian Schmidt¹, Elias Weingärtner¹, Cai-Julian Schnelke¹, Klaus Wehrle¹ - Show less +1 more•Institutions (1)

RWTH Aachen University¹

27 Aug 2012

TL;DR: This paper proposes an adaptive end-to-end based codec switching scheme that fully conforms to the SIP standard, and evaluation with a real-world prototype based on Linphone shows that the scheme adapts well to changing network conditions, improving overall speech quality.

...read moreread less

Abstract: Contemporary Voice-Over-IP (VoIP) systems typically negotiate only one codec for the entire VoIP session life time. However, as different codecs perform differently well under certain network conditions like delay, jitter or packet loss, this can lead to a reduction of quality if those conditions change during the call. This paper makes two core contributions: First, we compare the speech quality of a set of standard VoIP codecs given different network conditions. Second, we propose an adaptive end-to-end based codec switching scheme that fully conforms to the SIP standard. Our evaluation with a real-world prototype based on Linphone shows that our codec switching scheme adapts well to changing network conditions, improving overall speech quality.

...read moreread less

Proceedings Article•DOI•

Scalable multi-rate iLBC

[...]

Koji Seto¹, Tokunbo Ogunfunmi¹•Institutions (1)

Santa Clara University¹

20 May 2012

TL;DR: The benefits of scalable structure constructed by the addition of enhancement layer to the core layer of multi-rate iLBC are presented and the experimental simulation results show that the proposed framework can improve speech quality especially at high bit rates.

...read moreread less

Abstract: Rate flexibility and high robustness to packet loss are the essential features of speech codec for voice communications over Internet Protocol (IP) networks. Multi-rate internet Low Bit-rate Codec (iLBC) is one of the speech codecs that possesses both properties and was presented by our previous papers. However the speech quality is limited in clean channel conditions because of the limitations of the current frame-independent coding scheme in time domain. In this paper, the benefits of scalable structure constructed by the addition of enhancement layer to the core layer of multi-rate iLBC are presented. The experimental simulation results show that the proposed framework can improve speech quality especially at high bit rates.

...read moreread less

Dissertation•

Low Delay Transform for High Quality Low Delay Audio Coding

[...]

David Virette¹•Institutions (1)

Institut de Recherche en Informatique et Systèmes Aléatoires¹

10 Dec 2012

TL;DR: A seamless reconstruction method has been derived to increase the flexibility of transform coding schemes with the possibility to select a transform for a frame independently from its neighbouring frames, based on a new low delay window design procedure derived to obtain an analytic definition for a new family of transforms.

...read moreread less

Abstract: In recent years there has been a phenomenal increase in the number of products and applications which make use of audio coding formats. Among the most successful audio coding schemes, the MPEG-1 Layer III (mp3), the MPEG-2 Advanced Audio Coding (AAC) or its evolution MPEG-4 High Efficiency-Advanced Audio Coding (HE-AAC) can be cited. More recently, perceptual audio coding has been adapted to achieve coding at low-delay such to become suitable for conversational applications. Traditionally, the use of filter bank such as the Modified Discrete Cosine Transform (MDCT) is a central component of perceptual audio coding and its adaptation to low delay audio coding has become an important research topic. Low delay transforms have been developed in order to retain the performance of standard audio coding while reducing dramatically the associated algorithmic delay. This work presents some elements allowing to better accommodate the delay reduction constraint. Among the contributions, a low delay block switching tool which allows the direct transition between long transform and short transform without the insertion of transition window. The same principle has been extended to define new perfect reconstruction conditions for the MDCT with relaxed constraints compared to the original definition. As a consequence, a seamless reconstruction method has been derived to increase the flexibility of transform coding schemes with the possibility to select a transform for a frame independently from its neighbouring frames. Finally, based on this new approach, a new low delay window design procedure has been derived to obtain an analytic definition for a new family of transforms, permitting high quality with a substantial coding delay reduction. The performance of the proposed transforms has been thoroughly evaluated, an evaluation framework involving an objective measurement of the optimal transform sequence is proposed. It confirms the relevance of the proposed transforms used for audio coding. In addition, the new approaches have been successfully applied to the recent standardisation work items, such as the low delay audio coding developed at MPEG (LD-AAC and ELD-AAC) and they have been evaluated with numerous subjective testing, showing a significant improvement of the quality for transient signals. The new low delay window design has been adopted in G.718, a scalable speech and audio codec standardized in ITU-T and has demonstrated its benefit in terms of delay reduction while maintaining the audio quality of a traditional MDCT.

...read moreread less

Journal Article•

Watermark-Aided Pre-Echo Reduction in Low Bit-Rate Audio Coding

[...]

Imen Samaali, Gael Mahe, Monia Turki

09 Jul 2012-Journal of The Audio Engineering Society

Journal Article•DOI•

Overall performance evaluation of adaptive multi rate 06.90 speech codec based on code excited linear prediction algorithm using MATLAB

[...]

Ninad Bhatt¹, Yogeshwar Kosta•Institutions (1)

Veer Narmad South Gujarat University¹

01 Jun 2012-International Journal of Speech Technology

TL;DR: Simulation results clearly advocate that, it is possible to produce variable bitrates (tuning to channel conditions) in CELP coder by affecting coefficients of the coder while still maintaining a good quality of speech.

...read moreread less

Abstract: Today, the primary constrain in wireless communication system is limited bandwidth and power. Wireless systems involved in transmission of speech envisage that efficient and effective methods need to be developed for maintaining quality-of-speech, especially at the receiving end, with maximum saving of bandwidth and power. Amongst all elements of the communication system (transmitter, channel and receiver), transmission channel (carrier of information/data, also called the medium) is the most critical and plays a key role in the transmission and reception of information/data. Channel conditions decide the quality of speech at receiver. Modeling a channel is a complex task. Many techniques are adopted to mitigate the effect of the channel. AMR (Adaptive Multi Rate) is one such technique that counteracts the deleterious effect of the channel on speech. This technique employs variable bit rate that dynamically switches to specific modes of operation (switching bit rates--called modes of operation) depending upon the channel conditions. In this paper, the application of Code Excited Linear Prediction (CELP) source coder on speech followed by AMR codec is investigated and studied. An e-test bench using MATLAB is created to implement the CELP based AMR Codec scheme, and the same studied and investigated through a series of simulation. Here, both subjective and objective evaluations are carried out. Objective evaluations are categorized into waveform based, spectral based and perceptual based analysis. The results of the simulations are recorded and compared in various graphs and tables, which include calculation of various parameters like Absolute Error (ABS), Mean Square Error (MSE), Root Mean Square Error (RMSE), Signal to Noise Ratio (SNR), segmental SNR (segSNR) (Y. Hu and P. Loizou in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 1, pp. 153---156, 2006a; Proc. Interspeech, pp. 1447---1450, 2006b), Weighted-Slope Spectral distance (WSS) (Y. Hu and P. Loizou in Speech Commun. 49, 588---601, 2007), Perceptual Evaluation of Speech Quality (PESQ) (ITU-T rec. P.862, 2000), Log-Likelihood Ratio (LLR), Itakura-Saito Distance measure (ISD), Cepstrum Distance Measures (CEP) (V. Turbin and N. Faucheur in Proc. Online Workshop Meas. Speech Audio Quality Netw., pp. 81---84, 2005), Frequency Weighted Segmental SNR (fwSNRseg), Predicted rating of overall Quality (Covl), Rating of Speech Distortion (Csig), Rating of Background Distortion (Cbak) (ITU-T rec. P.835, 2003) and MeanOpinion Score (MOS). Simulation results clearly advocate that, it is possible to producevariable bitrates (tuning to channel conditions) in CELP coder by affecting coefficients of the coder while still maintaining a good quality of speech. Further, higher the bit-rate used, the better is the quality of speech (which can be verified from the results obtained with PESQ and MOS analysis) and at the same time offered simulation delay time also increases.

...read moreread less

Proceedings Article•

A blind bandwidth extension method of audio signals based on Volterra series

[...]

Xingtao Zhang¹, Changchun Bao¹, Xin Liu¹, Liyan Zhang¹•Institutions (1)

Beijing University of Technology¹

01 Dec 2012

TL;DR: A blind bandwidth extension method of audio signals is proposed in which the fine structure of high-frequency information is recovered based on Volterra series and it is found that the proposed method outperforms the chaotic prediction method and nearest-neighbor matching method.

...read moreread less

Abstract: In this paper, a blind bandwidth extension method of audio signals is proposed in which the fine structure of high-frequency information is recovered based on Volterra series. Combining with Gaussian mixture model and codebook mapping to adjust the spectrum envelope and energy gain of the extended high-frequency components separately, the bandwidth of audio signals is extended to super-wideband from wideband. Furthermore, the proposed method is applied into a real audio codec. The performance of the proposed method is evaluated through objective and subjective tests on the audio signals selected from MPEG items, and it is found that the proposed method outperforms the chaotic prediction method and nearest-neighbor matching method. When the proposed algorithm is applied into ITU-T G.722.1 wideband audio codec, the performance is comparable with that of G.722.1C super-wideband audio codec at 24 kbps.

...read moreread less

Proceedings Article•DOI•

Packet loss protection for interactive audio object rendering: A multiple description approach

[...]

Xiguang Zheng¹, Christian Ritz¹•Institutions (1)

University of Wollongong¹

05 Jul 2012

TL;DR: Results from spectral distortion measurements indicate the proposed scheme maintains the perceptual quality of the audio objects across a wide variety of packet loss conditions.

...read moreread less

Abstract: This paper presents a new framework for compression and transmission of simultaneously occurring audio objects through packet loss channels in order to maintain user Quality of Experience (QoE). The audio objects are compressed into two mono mixtures, by exploiting sparsity of multichannel audio signals to identify the two most dominant time-frequency components in terms of an energy measure. These mixtures are further compressed using the MP3 audio codec, with the optimised transmission model selected from several channel coding models based on the Forward Error Correction (FEC) and Multiple Description Coding (MDC) packet loss protection techniques. Audio objects can be recovered robustly from any received description(s) allowing real-time selective reproduction at the listeners' end. Results from spectral distortion measurements indicate the proposed scheme maintains the perceptual quality of the audio objects across a wide variety of packet loss conditions.

...read moreread less

Proceedings Article•DOI•

Very low-delay H.264 codec for consumer applications

[...]

Hiroki Mizosoe¹, Mitsuhiro Okada¹, Hironori Komi¹, Manabu Sasamoto¹, Yoshinori Hatori² - Show less +1 more•Institutions (2)

Hitachi¹, Tokyo Institute of Technology²

01 Mar 2012

TL;DR: A very low-delay full HD codec with relatively lower bitrate focusing especially on consumer or small business applications and a new codec pipeline control scheme for the versatile H.264 codec platform already developed is developed.

...read moreread less

Abstract: We have newly developed a very low-delay full HD codec with relatively lower bitrate focusing especially on consumer or small business applications. We have analyzed several important factors for achieving low-delay, and developed a new codec pipeline control scheme for our versatile H.264 codec platform already developed. We implemented the algorithm on the codec platform, and obtained a result of 10 ms minimum delay with several test sequences at bitrate of 8 to 10 Mbps.

...read moreread less

Proceedings Article•DOI•

On the performance of IP and mobile based Automatic Speaker Verification

[...]

Nibedita Nandan¹, Goutam Saha¹•Institutions (1)

Indian Institute of Technology Kharagpur¹

03 Apr 2012

TL;DR: It has been shown that the loss due to GSM-AMR codec is very significant for speaker verification compared to undecoded speech, though the packet loss and bit rate may degrade the quality of speech but it is not significant to detection of speaker's identity.

...read moreread less

Abstract: Automatic Speaker Verification (ASV) is a challenging task over the mobile/IP based system as the coding introduces some loss in system performance This paper reports on the work in progress to examine the impact of GSM-AMR codec used in mobile at its various bit rates and G729 codec for VoIP, along with different kind of noise and packet loss scenario for the speech signal PURE YOHO database has been used for the evaluation of this task Respective encoder and decoders are used back to back on wideband clean microphone speech to simulate the real-life situation Evaluation of performance is done through the measurement of Equal Error Rate (EER) It has been shown that the loss due to GSM-AMR codec is very significant for speaker verification compared to undecoded speech Though the packet loss and bit rate may degrade the quality of speech but it is not significant to detection of speaker's identity

...read moreread less

Proceedings Article•DOI•

AMR speech codec realization and optimization based on Tmscx55 DSP

[...]

Yan Zhao¹, Yuyan Zhang¹, Mang Jing¹, Junhui Zhao²•Institutions (2)

Beijing University of Posts and Telecommunications¹, Beijing Jiaotong University²

24 Sep 2012

TL;DR: This paper verifies that the AMR codec realization and code optimization enhance the code efficiency and achieve satisfactory results on the TI® TMS320VC5509 DSP hardware platform.

...read moreread less

Abstract: Timeliness is the key requirement when AMR codec is realized in DSP. In this paper, the code optimization is implemented for the AMR codec when realizing on the TI® TMS320VC5509 DSP hardware platform. In order to improve the code efficiency, comprehensive assembly instructions are rewritten firstly after the comprehensive analysis on the AMR algorithm, and then hybrid coding is implemented to combine the advantage of C programming and DSP-based assembly language programming. The test on DSP verifies that the AMR codec realization and code optimization enhance the code efficiency and achieve satisfactory results.

...read moreread less

Proceedings Article•DOI•

Pitch prediction in frequency domain for ITU-T G.719 audio codec

[...]

Linlin Jiang¹, Shenghui Zhao¹, Jing Wang¹, Jingming Kuang¹•Institutions (1)

Beijing Institute of Technology¹

01 Oct 2012

TL;DR: The results of both the objective and the subjective evaluation show that the proposed method achieves higher reconstructed quality for the speech signal than the original method, especially for the lower coding rates.

...read moreread less

Abstract: Pitch prediction in time domain is of great importance to improve the coding quality of speech. In this paper, we propose a method of frequency-domain pitch prediction for the full band ITU-T G.719 codec. The results of both the objective and the subjective evaluation show that the proposed method achieves higher reconstructed quality for the speech signal than the original method, especially for the lower coding rates.

...read moreread less

Journal Article•DOI•

Index assignment-based channel coding

[...]

Huseyin Oztoprak¹, Stephane Pierre Villette¹, Ahmet M. Kondoz¹•Institutions (1)

University of Surrey¹

24 Jan 2012-Iet Communications

TL;DR: The results demonstrate that the performance of IACC and IACC concatenated with convolutional coding can be superior to that of conventional convolutionAL coding at high and moderate bit error rates, respectively.

...read moreread less

Abstract: A scheme named index assignment-based channel coding (IACC) has been developed for resilience of speech/audio codecs against the bit errors commonly experienced in wireless channels Although IACC is a type of joint source channel coding, it does not intervene with the source codec design The proposed scheme takes into account source characteristics and adjusts the amount of coding according to the sensitivity of different values of the source parameters It is shown that source characteristics play an important role in the performance of IACC The performance of IACC has been evaluated by applying it to parameters generated by adaptive multi-rate wideband (AMR-WB+) audio codec A method for perceptual training of IACC codes is also proposed The results demonstrate that the performance of IACC and IACC concatenated with convolutional coding can be superior to that of conventional convolutional coding at high and moderate bit error rates, respectively

...read moreread less

Proceedings Article•

Wide-band speech coding using kernel methods and bandwidth extension based on parametric stereo

[...]

Gh. Alipoor¹, M. H. Savoji¹•Institutions (1)

Shahid Beheshti University¹

18 Oct 2012

TL;DR: A novel wide-band speech coding scheme is developed, based on kernel methods and bandwidth extension, that is built on the nonlinear narrow-band codec at the extra cost of a small increase in the bite rate.

...read moreread less

Abstract: A novel wide-band speech coding scheme is developed, in this paper, based on kernel methods and bandwidth extension. The KLMS algorithm, a kernelized version of the well-known LMS algorithm, is employed in the framework of the backward ADPCM technique for encoding the narrow-band part of the wide-band speech. Simulation results show that utilizing this nonlinear method results in an average improvement of up to 3.4 dB in the SNR and 0.28 in the PESQ measure of the decoded speech. The resultant narrow-band codec is subsequently extended to the wide-band speech using a novel bandwidth extension technique inspired by the parametric stereo coding. It is shown that the KLMS algorithm is also effective in this framework. This leads to a wide-band speech coding scheme built on the nonlinear narrow-band codec at the extra cost of a small increase in the bite rate.

...read moreread less

Audio and speech compression using sinusoidal modeling and wavelet residuum coding

[...]

Martin Turi Nagy, Radoslav Vargic

25 Oct 2012

TL;DR: A new method for compression of audio and speech signals based on sinusoidal modeling with added wavelet based coding of the residual signal with results show that the presented method provides a promising tool for speech compression and speech databases.

...read moreread less

Abstract: This paper presents a new method for compression of audio and speech signals based on sinusoidal modeling with added wavelet based coding of the residual signal. Wavelets are introduced as effective tool for representation and compression of atonal and transient signals. The method is proposed for usage in speech databases. The presented method is evaluated by the means of PSNR and PESQ/ODG. The results are compared with performance of common methods. The results show that the presented method provides a promising tool for speech compression and speech databases.

...read moreread less