scispace - formally typeset
Search or ask a question

Showing papers on "Adaptive Multi-Rate audio codec published in 2012"


01 Sep 2012
TL;DR: This document describes the Opus codec, designed for interactive speech and audio transmission over the Internet.
Abstract: This document describes the Opus codec, designed for interactive speech and audio transmission over the Internet.

212 citations


Journal ArticleDOI
TL;DR: A new algorithm is proposed for steganography in low bit-rate VoIP audio streams by integrating information hiding into the process of speech encoding, thus maintaining synchronization between information hiding and speech encoding.
Abstract: Low bit-rate speech codecs have been widely used in audio communications like VoIP and mobile communications, so that steganography in low bit-rate audio streams would have broad applications in practice. In this paper, the authors propose a new algorithm for steganography in low bit-rate VoIP audio streams by integrating information hiding into the process of speech encoding. The proposed algorithm performs data embedding while pitch period prediction is conducted during low bit-rate speech encoding, thus maintaining synchronization between information hiding and speech encoding. The steganography algorithm can achieve high quality of speech and prevent detection of steganalysis, but also has great compatibility with a standard low bit-rate speech codec without causing further delay by data embedding and extraction. Testing shows, with the proposed algorithm, the data embedding rate of the secret message can attain 4 bits/frame (133.3 bits/second).

109 citations


Journal Article
TL;DR: All aspects of this standardization eort are outlined, starting with the history and motivation of the MPEG work item, describing all technical features of the nal system, and further discussing listening test results and performance numbers which show the advantages of the new system over current state-of-the-art codecs.
Abstract: In early 2012 the ISO/IEC JTC1/SC29/WG11 (MPEG) nalized the new MPEG-D Unied Speech and Audio Coding standard The new codec brings together the previously separated worlds of general audio coding and speech coding It does so by integrating elements from audio coding and speech coding into a unied system The present publication outlines all aspects of this standardization eort, starting with the history and motivation of the MPEG work item, describing all technical features of the nal system, and further discussing listening test results and performance numbers which show the advantages of the new system over current state-of-the-art codecs

88 citations


Proceedings ArticleDOI
25 Mar 2012
TL;DR: This paper considers a processing chain of two coding steps and proposes a method that aims at identifying the type of codec used in the first step, by analyzing its coding-based footprints, based on the fact that lossy coding is an almost idempotent operation.
Abstract: Video content is routinely acquired and distributed in digital format. Therefore, it is customary to have the content encoded multiple times. In this paper we consider a processing chain of two coding steps and we propose a method that aims at identifying the type of codec used in the first step, by analyzing its coding-based footprints. The method relies on the fact that lossy coding is an almost idempotent operation, i.e., re-encoding the reconstructed sequence with the same codec and coding parameters produces a sequence that is highly correlated with the input one. As a consequence, it is possible to analyze this sort of correlation to identify the first codec provided that the second codec does not introduce severe quality degradation. The proposed solution finds several applications in the field of multi-media forensics, e.g. to identify the device that generated the original video stream or detect collages of different sequences.

59 citations


Journal ArticleDOI
TL;DR: An adaptive suboptimal pulse combination constrained (ASOPCC) method is presented to embed data on compressed speech signal of AMR-WB codec, which takes advantage of the ''redundancy'', created by non-exhaustive search of algebraic codebook, to encode secret information.

46 citations


Journal ArticleDOI
TL;DR: An objective evaluation of WebP is provided, by comparing it with the JPEG family algorithms, and it appears that the performance of the proposed codec is in line with that of the alternative methods, without achieving any major improvement and lacking several features.
Abstract: Research on multimedia data coding is allowing for better performance in terms of compression ratio, coding features, and robustness against transmission errors. While rate-distortion performance is being improved at a slower pace if compared to what we were used to up to a decade ago, remarkable advances are being made by adding complex features, such as fast adaptive transforms, lossy to lossless coding, compressed domain processing, etc. One of the latest codec which is expected to improve on the state of the art is the WebP algorithm released by Google. In this paper we provide an objective evaluation of WebP, by comparing it with the JPEG family algorithms. From the results it appears that the performance of the proposed codec is in line with that of the alternative methods, without achieving any major improvement and lacking several features.

42 citations


Patent
11 Apr 2012
TL;DR: In this article, an audio coding terminal is presented, which includes a coding mode setting unit to set an operation mode, from plural operation modes, for input audio coding by a codec, configured to code the input audio based on the set operation mode.
Abstract: An audio coding terminal and method is provided. The terminal includes a coding mode setting unit to set an operation mode, from plural operation modes, for input audio coding by a codec, configured to code the input audio based on the set operation mode such that when the set operation mode is a high frame erasure rate (FER) mode the codec codes a current frame of the input audio according to a select frame erasure concealment (FEC) mode of one or more FEC modes. Upon the setting of the operation mode to be the High FER mode, the one FEC mode is selected, from the one or more FEC modes predetermined for the High FER mode, to control the codec by incorporating of redundancy within a coding of the input audio or as separate redundancy information separate from the coded input audio according to the selected one FEC mode.

35 citations


Journal ArticleDOI
TL;DR: The results show that the MFCC-based codec exceeds the state-of-the-art MELPe codec across the entire range of 600-2400 bps, when evaluated with the perceptual evaluation of speech quality (PESQ) (ITU-T recommendation P.862).
Abstract: In this paper, we propose a low bit-rate speech codec based on vector quantization (VQ) of the mel-frequency cepstral coefficients (MFCCs) We begin by showing that if a high-resolution mel-frequency cepstrum (MFC) is computed, good-quality speech reconstruction is possible from the MFCCs despite the lack of phase information By evaluating the contribution toward speech quality that individual MFCCs make and applying appropriate quantization, our results show that the MFCC-based codec exceeds the state-of-the-art MELPe codec across the entire range of 600-2400 bps, when evaluated with the perceptual evaluation of speech quality (PESQ) (ITU-T recommendation P862) The main advantage of the proposed codec is in distributed speech recognition (DSR) since the MFCCs can be directly applied thus eliminating additional decode and feature extract stages; furthermore, the proposed codec better preserves the fidelity of MFCCs and better word accuracy rates as compared to CELP and MELPe codecs

27 citations


Proceedings ArticleDOI
25 Mar 2012
TL;DR: A novel scheme for non-intrusive identification of speech codecs based upon comparing a profile of a set of noise spectra and a time-domain histogram from the decoded speech to those from the candidate codecs.
Abstract: The ability to detect a particular speech codec from only the decoded audio has several useful forensic and system performance improvement applications. This paper presents a novel scheme for non-intrusive identification of speech codecs. The identification approach is based upon comparing a profile of a set of noise spectra and a time-domain histogram from the decoded speech to those from the candidate codecs. The presented results show a very high accuracy in identifying speech contemporary codecs from a diverse set of types and encoding rates. The presented codec identification scheme has a very low misidentification rate, including in the high coding rate regime where it improves on previous works by achieving perfect identification. This performance is achieved while reducing the duration of the analysis window of speech from 2 minutes to only 4 seconds.

25 citations


Patent
07 Nov 2012
TL;DR: In this article, the authors describe tools and techniques for providing services, such as voice services, within flexible bandwidth systems, including scaling spreading factors (with rate matching tuning in some embodiments), multi-code transmission, code rate increases, AMR codec rate adjustments, and higher order modulation.
Abstract: Methods, systems, and devices are disclosed for providing services, such as voice services, within flexible bandwidth systems. In general, the scaling of one or more aspects of a flexible bandwidth system may be compensated for through altering one or more aspects within a code domain. The tools and techniques may include scaling spreading factors (with rate matching tuning in some embodiments), multi-code transmission, code rate increases, AMR codec rate adjustments, and/or higher order modulation. Subframe decoding approaches for the reception scheme may also be utilized. These tools and techniques can be flexibly implemented on the mobile device and/or base station side. Some embodiments may also minimize the latency introduced by the transmission and/or reception process. Flexible bandwidths systems may utilize portions of spectrum that may be too big or too small to fit a normal bandwidth waveform.

20 citations


Patent
29 Aug 2012
TL;DR: In this article, a radio access network (RAN) device may receive a call setup request from a wireless communication device (WCD) seeking to set up a call, and it may be determined that the WCD is capable of conducting the call using either a first codec or a second codec.
Abstract: A radio access network (RAN) device may receive a call setup request from a wireless communication device (WCD) seeking to set up a call. In response to receiving the call setup request, it may be determined that (i) the WCD is capable of conducting the call using either a first codec or a second codec, (ii) the first codec is capable of using a sampling rate of over 8,000 Hz, and (iii) the second codec uses sampling rates of 8,000 Hz or less. Based at least on this determination, the WCD may be instructed to set up the call using the second codec.

Patent
David Virette1, Yue Lang1, Jianfeng Xu1
05 Apr 2012
TL;DR: In this article, a method for parametric spatial audio coding of a multi-channel audio signal comprising a plurality of audio channel signals is provided, the method comprising: calculating at least two different spatial coding parameters for an audio channel signal of the plurality of radio channel signals, selecting at least one spatial coding parameter of the at least three different spatial encoding parameters associated with the audio channels, and setting a parameter type flag in the parameter section of the audio bitstream indicating the type of the selected spatial encoding parameter being included into the audio bitsstream.
Abstract: A method for parametric spatial audio coding of a multi-channel audio signal comprising a plurality of audio channel signals is provided, the method comprising: calculating at least two different spatial coding parameters for an audio channel signal of the plurality of audio channel signals, selecting at least one spatial coding parameter of the at least two different spatial coding parameters associated with the audio channel signal on the basis of the values of the calculated spatial coding parameters; including a quantized representation of the selected spatial coding parameter into a parameter section of an audio bitstream; and setting a parameter type flag in the parameter section of the audio bitstream indicating the type of the selected spatial coding parameter being included into the audio bitstream.

Proceedings ArticleDOI
25 Mar 2012
TL;DR: The method uses a number of speech features which are then used to train a CART classifier and can identify a codec and its bit rate to an accuracy of 92% and detect the presence of a codec with a accuracy of 97% at -5 dB SNR.
Abstract: We present a non-intrusive data driven method for codec detection and identification in the presence of background noise. The method uses a number of speech features which are then used to train a CART classifier. We demonstrate the performance of the method using several different noise types over a wide range of SNRs. Our results show that we can identify a codec and its bit rate to an accuracy of 92% and we are able to detect the presence of a codec with an accuracy of 97% at −5 dB SNR.

Patent
Barrett Kreiner1, Ryan Schaub1
18 May 2012
TL;DR: In this paper, a media codec device includes an input port that receives media signals from service providers and also receives power for the media codec devices, and a decoder that stores decoder keys associated with the service providers.
Abstract: A media codec device includes an input port that receives media signals from service providers and also receives power for the media codec device, and a decoder that stores decoder keys associated with the service providers. The decoder applies the decoder keys to decode the media signals. The media codec device operates to direct at a first time a decoded first media signal to a first output device and a decoded second media signal to a second output device, and to direct at a second time the decoded first media signal to the second output device and the decoded second media signal to the first output device.

Proceedings ArticleDOI
Shi Dong1, Ruimin Hu1, Weiping Tu1, Xiang Zheng1, Junjun Jiang1, Song Wang1 
09 Jul 2012
TL;DR: It has been proved that when multiple sound sources exist with different directions, proposed method is better than the conventional PCA method in certain conditions, and a stereo codec based on PC-PCA has been proposed to validate the performance improvement of proposed method.
Abstract: High efficiency audio compression is the basic technology in audio involved multimedia application. Down mixing and parametric coding are efficient coding scheme with widely applications in some up to date audio codecs such as PS in EAAC+ and MPEG-Surround, and PCA stereo coding followed this idea to map two channels to one channel with maximum energy and parameterize the secondary channel. This paper investigates the conventional PCA method performance under general stereo model with multiple sound sources and different directions, and then proposes a Polar Coordinate based PCA (PC-PCA) stereo coding method. It has been proved that when multiple sound sources exist with different directions, proposed method is better than the conventional PCA method in certain conditions. A stereo codec based on PC-PCA has also been proposed to validate the performance improvement of proposed method.

Patent
19 Jan 2012
TL;DR: In this article, a dynamic codec allocation method is provided, which includes receiving a plurality of datastreams and determining a respective codec loading factor for each of the data-streams.
Abstract: In an example, a dynamic codec allocation method is provided. The method includes receiving a plurality of datastreams and determining a respective codec loading factor for each of the datastreams. The datastreams are assigned to codecs, in order by respective codec loading factor, starting with the highest respective codec loading factor. Initially, the datastreams are assigned to a hardware codec, until the hardware codec is loaded to substantially maximum capacity. If the hardware codec is loaded to substantially maximum capacity, the remaining datastreams are assigned to a software codec. As new datastreams are received, the method repeats, and previously-assigned datastreams can be reassigned from a hardware codec to a software codec, and vice versa, based on their relative codec loading factors.

Book ChapterDOI
27 Aug 2012
TL;DR: This paper proposes an adaptive end-to-end based codec switching scheme that fully conforms to the SIP standard, and evaluation with a real-world prototype based on Linphone shows that the scheme adapts well to changing network conditions, improving overall speech quality.
Abstract: Contemporary Voice-Over-IP (VoIP) systems typically negotiate only one codec for the entire VoIP session life time. However, as different codecs perform differently well under certain network conditions like delay, jitter or packet loss, this can lead to a reduction of quality if those conditions change during the call. This paper makes two core contributions: First, we compare the speech quality of a set of standard VoIP codecs given different network conditions. Second, we propose an adaptive end-to-end based codec switching scheme that fully conforms to the SIP standard. Our evaluation with a real-world prototype based on Linphone shows that our codec switching scheme adapts well to changing network conditions, improving overall speech quality.

Proceedings ArticleDOI
20 May 2012
TL;DR: The benefits of scalable structure constructed by the addition of enhancement layer to the core layer of multi-rate iLBC are presented and the experimental simulation results show that the proposed framework can improve speech quality especially at high bit rates.
Abstract: Rate flexibility and high robustness to packet loss are the essential features of speech codec for voice communications over Internet Protocol (IP) networks. Multi-rate internet Low Bit-rate Codec (iLBC) is one of the speech codecs that possesses both properties and was presented by our previous papers. However the speech quality is limited in clean channel conditions because of the limitations of the current frame-independent coding scheme in time domain. In this paper, the benefits of scalable structure constructed by the addition of enhancement layer to the core layer of multi-rate iLBC are presented. The experimental simulation results show that the proposed framework can improve speech quality especially at high bit rates.

Dissertation
10 Dec 2012
TL;DR: A seamless reconstruction method has been derived to increase the flexibility of transform coding schemes with the possibility to select a transform for a frame independently from its neighbouring frames, based on a new low delay window design procedure derived to obtain an analytic definition for a new family of transforms.
Abstract: In recent years there has been a phenomenal increase in the number of products and applications which make use of audio coding formats. Among the most successful audio coding schemes, the MPEG-1 Layer III (mp3), the MPEG-2 Advanced Audio Coding (AAC) or its evolution MPEG-4 High Efficiency-Advanced Audio Coding (HE-AAC) can be cited. More recently, perceptual audio coding has been adapted to achieve coding at low-delay such to become suitable for conversational applications. Traditionally, the use of filter bank such as the Modified Discrete Cosine Transform (MDCT) is a central component of perceptual audio coding and its adaptation to low delay audio coding has become an important research topic. Low delay transforms have been developed in order to retain the performance of standard audio coding while reducing dramatically the associated algorithmic delay. This work presents some elements allowing to better accommodate the delay reduction constraint. Among the contributions, a low delay block switching tool which allows the direct transition between long transform and short transform without the insertion of transition window. The same principle has been extended to define new perfect reconstruction conditions for the MDCT with relaxed constraints compared to the original definition. As a consequence, a seamless reconstruction method has been derived to increase the flexibility of transform coding schemes with the possibility to select a transform for a frame independently from its neighbouring frames. Finally, based on this new approach, a new low delay window design procedure has been derived to obtain an analytic definition for a new family of transforms, permitting high quality with a substantial coding delay reduction. The performance of the proposed transforms has been thoroughly evaluated, an evaluation framework involving an objective measurement of the optimal transform sequence is proposed. It confirms the relevance of the proposed transforms used for audio coding. In addition, the new approaches have been successfully applied to the recent standardisation work items, such as the low delay audio coding developed at MPEG (LD-AAC and ELD-AAC) and they have been evaluated with numerous subjective testing, showing a significant improvement of the quality for transient signals. The new low delay window design has been adopted in G.718, a scalable speech and audio codec standardized in ITU-T and has demonstrated its benefit in terms of delay reduction while maintaining the audio quality of a traditional MDCT.


Journal ArticleDOI
TL;DR: Simulation results clearly advocate that, it is possible to produce variable bitrates (tuning to channel conditions) in CELP coder by affecting coefficients of the coder while still maintaining a good quality of speech.
Abstract: Today, the primary constrain in wireless communication system is limited bandwidth and power. Wireless systems involved in transmission of speech envisage that efficient and effective methods need to be developed for maintaining quality-of-speech, especially at the receiving end, with maximum saving of bandwidth and power. Amongst all elements of the communication system (transmitter, channel and receiver), transmission channel (carrier of information/data, also called the medium) is the most critical and plays a key role in the transmission and reception of information/data. Channel conditions decide the quality of speech at receiver. Modeling a channel is a complex task. Many techniques are adopted to mitigate the effect of the channel. AMR (Adaptive Multi Rate) is one such technique that counteracts the deleterious effect of the channel on speech. This technique employs variable bit rate that dynamically switches to specific modes of operation (switching bit rates--called modes of operation) depending upon the channel conditions. In this paper, the application of Code Excited Linear Prediction (CELP) source coder on speech followed by AMR codec is investigated and studied. An e-test bench using MATLAB is created to implement the CELP based AMR Codec scheme, and the same studied and investigated through a series of simulation. Here, both subjective and objective evaluations are carried out. Objective evaluations are categorized into waveform based, spectral based and perceptual based analysis. The results of the simulations are recorded and compared in various graphs and tables, which include calculation of various parameters like Absolute Error (ABS), Mean Square Error (MSE), Root Mean Square Error (RMSE), Signal to Noise Ratio (SNR), segmental SNR (segSNR) (Y. Hu and P. Loizou in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 1, pp. 153---156, 2006a; Proc. Interspeech, pp. 1447---1450, 2006b), Weighted-Slope Spectral distance (WSS) (Y. Hu and P. Loizou in Speech Commun. 49, 588---601, 2007), Perceptual Evaluation of Speech Quality (PESQ) (ITU-T rec. P.862, 2000), Log-Likelihood Ratio (LLR), Itakura-Saito Distance measure (ISD), Cepstrum Distance Measures (CEP) (V. Turbin and N. Faucheur in Proc. Online Workshop Meas. Speech Audio Quality Netw., pp. 81---84, 2005), Frequency Weighted Segmental SNR (fwSNRseg), Predicted rating of overall Quality (Covl), Rating of Speech Distortion (Csig), Rating of Background Distortion (Cbak) (ITU-T rec. P.835, 2003) and MeanOpinion Score (MOS). Simulation results clearly advocate that, it is possible to producevariable bitrates (tuning to channel conditions) in CELP coder by affecting coefficients of the coder while still maintaining a good quality of speech. Further, higher the bit-rate used, the better is the quality of speech (which can be verified from the results obtained with PESQ and MOS analysis) and at the same time offered simulation delay time also increases.

Proceedings Article
01 Dec 2012
TL;DR: A blind bandwidth extension method of audio signals is proposed in which the fine structure of high-frequency information is recovered based on Volterra series and it is found that the proposed method outperforms the chaotic prediction method and nearest-neighbor matching method.
Abstract: In this paper, a blind bandwidth extension method of audio signals is proposed in which the fine structure of high-frequency information is recovered based on Volterra series. Combining with Gaussian mixture model and codebook mapping to adjust the spectrum envelope and energy gain of the extended high-frequency components separately, the bandwidth of audio signals is extended to super-wideband from wideband. Furthermore, the proposed method is applied into a real audio codec. The performance of the proposed method is evaluated through objective and subjective tests on the audio signals selected from MPEG items, and it is found that the proposed method outperforms the chaotic prediction method and nearest-neighbor matching method. When the proposed algorithm is applied into ITU-T G.722.1 wideband audio codec, the performance is comparable with that of G.722.1C super-wideband audio codec at 24 kbps.

Proceedings ArticleDOI
05 Jul 2012
TL;DR: Results from spectral distortion measurements indicate the proposed scheme maintains the perceptual quality of the audio objects across a wide variety of packet loss conditions.
Abstract: This paper presents a new framework for compression and transmission of simultaneously occurring audio objects through packet loss channels in order to maintain user Quality of Experience (QoE). The audio objects are compressed into two mono mixtures, by exploiting sparsity of multichannel audio signals to identify the two most dominant time-frequency components in terms of an energy measure. These mixtures are further compressed using the MP3 audio codec, with the optimised transmission model selected from several channel coding models based on the Forward Error Correction (FEC) and Multiple Description Coding (MDC) packet loss protection techniques. Audio objects can be recovered robustly from any received description(s) allowing real-time selective reproduction at the listeners' end. Results from spectral distortion measurements indicate the proposed scheme maintains the perceptual quality of the audio objects across a wide variety of packet loss conditions.

Proceedings ArticleDOI
01 Mar 2012
TL;DR: A very low-delay full HD codec with relatively lower bitrate focusing especially on consumer or small business applications and a new codec pipeline control scheme for the versatile H.264 codec platform already developed is developed.
Abstract: We have newly developed a very low-delay full HD codec with relatively lower bitrate focusing especially on consumer or small business applications. We have analyzed several important factors for achieving low-delay, and developed a new codec pipeline control scheme for our versatile H.264 codec platform already developed. We implemented the algorithm on the codec platform, and obtained a result of 10 ms minimum delay with several test sequences at bitrate of 8 to 10 Mbps.

Proceedings ArticleDOI
03 Apr 2012
TL;DR: It has been shown that the loss due to GSM-AMR codec is very significant for speaker verification compared to undecoded speech, though the packet loss and bit rate may degrade the quality of speech but it is not significant to detection of speaker's identity.
Abstract: Automatic Speaker Verification (ASV) is a challenging task over the mobile/IP based system as the coding introduces some loss in system performance This paper reports on the work in progress to examine the impact of GSM-AMR codec used in mobile at its various bit rates and G729 codec for VoIP, along with different kind of noise and packet loss scenario for the speech signal PURE YOHO database has been used for the evaluation of this task Respective encoder and decoders are used back to back on wideband clean microphone speech to simulate the real-life situation Evaluation of performance is done through the measurement of Equal Error Rate (EER) It has been shown that the loss due to GSM-AMR codec is very significant for speaker verification compared to undecoded speech Though the packet loss and bit rate may degrade the quality of speech but it is not significant to detection of speaker's identity

Proceedings ArticleDOI
24 Sep 2012
TL;DR: This paper verifies that the AMR codec realization and code optimization enhance the code efficiency and achieve satisfactory results on the TI® TMS320VC5509 DSP hardware platform.
Abstract: Timeliness is the key requirement when AMR codec is realized in DSP. In this paper, the code optimization is implemented for the AMR codec when realizing on the TI® TMS320VC5509 DSP hardware platform. In order to improve the code efficiency, comprehensive assembly instructions are rewritten firstly after the comprehensive analysis on the AMR algorithm, and then hybrid coding is implemented to combine the advantage of C programming and DSP-based assembly language programming. The test on DSP verifies that the AMR codec realization and code optimization enhance the code efficiency and achieve satisfactory results.

Proceedings ArticleDOI
01 Oct 2012
TL;DR: The results of both the objective and the subjective evaluation show that the proposed method achieves higher reconstructed quality for the speech signal than the original method, especially for the lower coding rates.
Abstract: Pitch prediction in time domain is of great importance to improve the coding quality of speech. In this paper, we propose a method of frequency-domain pitch prediction for the full band ITU-T G.719 codec. The results of both the objective and the subjective evaluation show that the proposed method achieves higher reconstructed quality for the speech signal than the original method, especially for the lower coding rates.

Journal ArticleDOI
TL;DR: The results demonstrate that the performance of IACC and IACC concatenated with convolutional coding can be superior to that of conventional convolutionAL coding at high and moderate bit error rates, respectively.
Abstract: A scheme named index assignment-based channel coding (IACC) has been developed for resilience of speech/audio codecs against the bit errors commonly experienced in wireless channels Although IACC is a type of joint source channel coding, it does not intervene with the source codec design The proposed scheme takes into account source characteristics and adjusts the amount of coding according to the sensitivity of different values of the source parameters It is shown that source characteristics play an important role in the performance of IACC The performance of IACC has been evaluated by applying it to parameters generated by adaptive multi-rate wideband (AMR-WB+) audio codec A method for perceptual training of IACC codes is also proposed The results demonstrate that the performance of IACC and IACC concatenated with convolutional coding can be superior to that of conventional convolutional coding at high and moderate bit error rates, respectively

Proceedings Article
18 Oct 2012
TL;DR: A novel wide-band speech coding scheme is developed, based on kernel methods and bandwidth extension, that is built on the nonlinear narrow-band codec at the extra cost of a small increase in the bite rate.
Abstract: A novel wide-band speech coding scheme is developed, in this paper, based on kernel methods and bandwidth extension. The KLMS algorithm, a kernelized version of the well-known LMS algorithm, is employed in the framework of the backward ADPCM technique for encoding the narrow-band part of the wide-band speech. Simulation results show that utilizing this nonlinear method results in an average improvement of up to 3.4 dB in the SNR and 0.28 in the PESQ measure of the decoded speech. The resultant narrow-band codec is subsequently extended to the wide-band speech using a novel bandwidth extension technique inspired by the parametric stereo coding. It is shown that the KLMS algorithm is also effective in this framework. This leads to a wide-band speech coding scheme built on the nonlinear narrow-band codec at the extra cost of a small increase in the bite rate.

25 Oct 2012
TL;DR: A new method for compression of audio and speech signals based on sinusoidal modeling with added wavelet based coding of the residual signal with results show that the presented method provides a promising tool for speech compression and speech databases.
Abstract: This paper presents a new method for compression of audio and speech signals based on sinusoidal modeling with added wavelet based coding of the residual signal. Wavelets are introduced as effective tool for representation and compression of atonal and transient signals. The method is proposed for usage in speech databases. The presented method is evaluated by the means of PSNR and PESQ/ODG. The results are compared with performance of common methods. The results show that the presented method provides a promising tool for speech compression and speech databases.