scispace - formally typeset
Search or ask a question

Showing papers on "Adaptive Multi-Rate audio codec published in 2016"


Journal ArticleDOI
TL;DR: This paper considers a processing chain of two coding steps, and proposes a method that exploits coding-based footprints to identify both the codec and the size of the group of pictures used in the first coding step, and was extensively validated on a very large data set of video sequences generated by encoding content with a diversity of codecs and different encoding parameters.
Abstract: Video content is routinely acquired and distributed in a digital compressed format. In many cases, the same video content is encoded multiple times. This is the typical scenario that arises when a video, originally encoded directly by the acquisition device, is then re-encoded, either after an editing operation, or when uploaded to a sharing website. The analysis of the bitstream reveals details of the last compression step (i.e., the codec adopted and the corresponding encoding parameters), while masking the previous compression history. Therefore, in this paper, we consider a processing chain of two coding steps, and we propose a method that exploits coding-based footprints to identify both the codec and the size of the group of pictures (GOPs) used in the first coding step. This sort of analysis is useful in video forensics, when the analyst is interested in determining the characteristics of the originating source device, and in video quality assessment, since quality is determined by the whole compression history. The proposed method relies on the fact that lossy coding is an (almost) idempotent operation. That is, re-encoding a video sequence with the same codec and coding parameters produces a sequence that is similar to the former. As a consequence, if the second codec in the chain does not significantly alter the sequence, it is possible to analyze this sort of similarity to identify the first codec and the adopted GOP size. The method was extensively validated on a very large data set of video sequences generated by encoding content with a diversity of codecs (MPEG-2, MPEG-4, H.264/AVC, and DIRAC) and different encoding parameters. In addition, a proof of concept showing that the proposed method can also be used on videos downloaded from YouTube is reported.

41 citations


Journal ArticleDOI
TL;DR: Results demonstrate the approach successfully encodes multiple spatial speech scenes at low bit rates whilst maintaining perceptual quality in both anechoic and reverberant environments.
Abstract: This paper describes a system for encoding and communicating navigable speech soundfields for applications such as immersive audio/visual conferencing, audio surveillance of large spaces and free viewpoint television. The system relies on recording speech soundfields using compact co-incident microphone arrays that are then processed to identify sources and their spatial location using the well-known assumption that speech signals are sparse in the time-frequency domain. A low-delay Direction of Arrival (DOA)-based frequency domain sound source separation approach is proposed that requires only 250 ms of speech signal. Joint compression is achieved through a previously proposed perceptual analysis-by-synthesis spatial audio coding scheme that encodes sources into a mixture signal that can be compressed by a standard speech codec at 32 kbps. By also transmitting side information representing the original spatial location of each source, the received mixtures can be decoded and then flexibly reproduced using loudspeakers at a chosen listening point within a synthesised speech scene. The system was implemented based on this framework for an example application encoding a three-talker navigable speech scene at a total bit rate of 48 kbps. Subjective listening tests were conducted to evaluate the quality of the reproduced speech scenes at a new listening point as compared to a true recording at that point. Results demonstrate the approach successfully encodes multiple spatial speech scenes at low bit rates whilst maintaining perceptual quality in both anechoic and reverberant environments.

20 citations


Journal ArticleDOI
TL;DR: A novel steganography algorithm is proposed that performs well in imperceptibility with a hiding capacity of 550 bits/s and the real-time and anti-detection performances are also satisfactory.
Abstract: The rapid development of speech communication technology has made it possible for low bit-rate speech to become appropriate steganographic cover media. To incorporate data hiding into the low bit-rate speech codec, a novel steganography algorithm is proposed in this paper. By analyzing the encoding rule of fixed codebook vector, the way of transposing encoding locations of adjacent pulses is found to be suitable for data embedding with good imperceptibility. Based on encoding location transposition of adjacent pulses, the relationship between adjacent pulse locations is used to embed secret data while the fixed codebook search is being conducted during the encoding process of G.729 codec, which can maintain synchronization between data embedding and speech encoding. The experimental results demonstrate that the proposed steganography algorithm performs well in imperceptibility with a hiding capacity of 550 bits/s. Furthermore, the real-time and anti-detection performances are also satisfactory.

17 citations




Journal ArticleDOI
TL;DR: A universal VoIP steganography model is proposed that uses the PESQ deterioration rate and the decoding error to automatically choose a data-embedding algorithm for each VoIP bitstream, which enables ones to achieve covert communications using a low bit-rate speech codec efficiently and securely.
Abstract: Low bit-rate speech codec offers so many advantages over other codecs that it has become increasingly popular in audio communications such as mobile and Voice over Internet Protocol (VoIP) communications, and thus, researching steganography in low bit-rate speech codec is of important significance. In this study, we proposed a universal VoIP steganography model for low bit-rate speech codec that uses the PESQ deterioration rate and the decoding error to automatically choose a data-embedding algorithm for each VoIP bitstream, which enables ones to achieve covert communications using a low bit-rate speech codec efficiently and securely. Because no or little attention has been paid to steganography in Internet Speech Audio Codec, it was chosen as the test codec to verify the effectiveness, security, and practicability of the proposed steganography model. The experimental results show that, with the proposed steganography model, it achieved the average PESQ deterioration rate of 4.04% (less than 5%, indicating strong imperceptibility) and a high data-hiding capacity up to 12 bits/frame (400 bits/s, three times larger than other methods), and the proposed steganography model could effectively resist the latest steganalysis. Copyright © 2015 John Wiley & Sons, Ltd.

11 citations


Proceedings ArticleDOI
03 Mar 2016
TL;DR: The test results show that EVS outperforms other speech codecs used in the test and it can be used to generate speaker models that are quite robust to varying compression levels.
Abstract: In this paper, we investigate the effect of lossy speech compression on text-independent speaker verification task. We have evaluated the voice biometrics performance over several state-of-the art speech codecs including recently released Enhanced Voice Services (EVS) codec. The tests were performed in both codec-matched and codec-mismatched scenarios. The test results show that EVS outperforms other speech codecs used in our test and it can be used to generate speaker models that are quite robust to varying compression levels. It was also shown that if a speech codec of higher quality (EVS, G711) is included in training data (mismatched and partially mismatched scenarios), the automatic speaker verification (ASV) gives better results than in the case of matched scenario.

10 citations


Proceedings ArticleDOI
11 Jul 2016
TL;DR: This work proposes a codec and ROI detector independent pre- and postprocessing framework that enables the usage of off-the-shelf hard-/software and an easy adaption to the latest video coding technology.
Abstract: For low bit rate scenarios (video conferencing, aerial surveillance), conventional video coding is unable to meet the small bit rate and high quality requirements. In contrast to that Region of Interest (ROI) coding provides an efficient compression by improving the quality of ROIs at the expense of non-ROIs. We also transmit ROI only, but reconstruct non-ROI from already transmitted content by means of global motion compensation in order to provide a high quality for the full frame. Previous ROI coding systems modified the video codec to control the coding of individual blocks. We propose a codec and ROI detector independent pre- and postprocessing framework instead. This enables the usage of off-the-shelf hard-/software and an easy adaption to the latest video coding technology. Maintaining the performance of subsequent computer vision tasks, we reduce the bit rate by 90–95 % to less than 1 Mbit/s using HEVC for full HDTV videos.

8 citations


Proceedings ArticleDOI
04 Jun 2016
TL;DR: A steganography scheme using a 3D-sudoku matrix to enlarge capacity and improve quality of speech and theoretical analysis is provided to demonstrate that the concealment and the hidden capacity are greatly improved with the proposed scheme.
Abstract: Redundant information of low-bit rate speech is extremely small, thus it's very difficult to implement large capacity steganography on the low-bit rate speech. Based on multiple vector quantization characteristics of the Line Spectrum Pair (LSP) of the speech codec, this paper proposes a steganography scheme using a 3D-sudoku matrix to enlarge capacity and improve quality of speech. A cyclically moving algorithm to construct 3D-Sudoku matrix for steganography is proposed in this paper, as well as an embedding and an extracting algorithm of steganography based on 3D-Sudoku matrix in low-bit rate speech codec. Theoretical analysis is provided to demonstrate that the concealment and the hidden capacity are greatly improved with the proposed scheme. Experimental results show the hidden capacity is raised to 200bps in ITU-T G.723.1 codec. Moreover, the quality of steganography speech in Perceptual Evaluation of Speech Quality (PESQ) reduces no more than 4%, indicating little impact on the quality of speech.

8 citations


Posted Content
TL;DR: Daala as discussed by the authors is a new royalty-free video codec based on perceptually-driven coding techniques, which uses a keyframe format for still picture coding and shows how it has improved over the past year.
Abstract: Daala is a new royalty-free video codec based on perceptually-driven coding techniques. We explore using its keyframe format for still picture coding and show how it has improved over the past year. We believe the technology used in Daala could be the basis of an excellent, royalty-free image format.

7 citations


Proceedings ArticleDOI
01 Sep 2016
TL;DR: This paper combines the designed decoder and encoder for HEVC and proposes a low complexity HEVC video codec processor that has an optimal mode decision with a simplified Rate Distortion Optimization (RDO) and a low power Skip mode.
Abstract: A future video codec processor will have to adopt the newly standardized High Efficiency Video Coding (HEVC/H.265) in a short time due to the limit of H.264's coding efficiency for large sized UHD images. This paper combines our designed decoder and encoder for HEVC and proposes a low complexity HEVC video codec processor. We developed this codec processor with Samsung 28nm CMOS process in this year and the size of this low complexity codec keeps within the bounds of that of a conventional H.264/AVC chip. This single core based processor has an optimal mode decision with a simplified Rate Distortion Optimization (RDO) and a low power Skip mode. The encoder's BD-rate loss is 35% compared with HM-13.0 and the power consumption is below 250mW when entering the Skip mode. The chip and its internal SRAM size are 7.3 × 7.5mm2 and 300kB each and the maximum frequency is 600MHz when 4K-UHD encoding mode at 30 fps.

Proceedings ArticleDOI
01 Aug 2016
TL;DR: Experimental results show the proposed HEVC based multi-view video codec out performs the anchor standard MV-HEVC codec in terms of bitrate and PSNR.
Abstract: This paper presents a HEVC based multi-view video codec. The frames of the multi-view videos are interleaved to generate a monoscopic video sequence. The interleaving is conducted in a way to increase the exploitation of the temporal and inter-views correlations. The MV-HEVC standard codec is configured to work as a single layered codec, which functions as a monoscipic HEVC codec with AVC capabilities, and used to encode interleaved multi-view video frames. The performance of the codec is compared with the anchor standard MV-HEVC codec by coding the three standard multi-view video sequences: "Balloon", "Kendo" and "Newspaper1". Experimental results show the proposed codec out performs the anchor standard MV-HEVC codec in terms of bitrate and PSNR.


Journal ArticleDOI
TL;DR: A linear prediction technique for reversible data hiding of audio waveforms is improved and the proposed variable expansion method is able to control the payload size through varying the expansion factor.

Journal ArticleDOI
TL;DR: This project presents the algorithms for lossless audio coding communication, the higher efficiency of this standard and other assets such as compression ratio, encoding and decoding speed of thisStandard.
Abstract: Developed by IEEE-SA (Standard Association) and released in August 2013, IEEE 1857.2 is the latest standard for lossless audio compression. This recent standard for advanced audio coding (AAC) is an efficient lossless audio codec (coding decoding) technique, in improving audio quality for compression and decompression, optimizing bandwidth during transmission, saving storage space, it also speeds up the video and audio streaming, and others video and audio data. It defines a set of tools to support specific audio coding functions, including general audio coding and lossless coding. This project presents the algorithms for lossless audio coding communication, the higher efficiency of this standard and other assets such as compression ratio, encoding and decoding speed of this standard. Different types of audio files have compressed and the results show a good compression ratio (CR) which depends on audio file type, and good encoding and decoding speed.

Patent
16 Mar 2016
TL;DR: In this article, a communication node determines a codec and a codec mode that are to be used by the two terminals in a handover of one of two terminals communicating in a first network to a second network different from the first network, and a signaling generation unit 510 generates a signaling for requesting two terminals to change to the established codec and codec mode.
Abstract: An IMS node 310 relates to a communication node that, upon handover of one of two terminals communicating in a first network to a second network different from the first network, determines a codec and a codec mode that are to be used by the two terminals. A determination unit 506 establishes, as the codec and codec mode to be used by the two terminals, the common parts among information indicating the codecs and codec modes used for the communications in the first network, information indicating the codecs and codec modes supported by one of the two terminals, and information indicating the codecs and codec modes supported by the second network. A signaling generation unit 510 generates a signaling for requesting the two terminals to change to the established codec and codec mode that are to be used by the two terminals.

Journal ArticleDOI
TL;DR: A Normalized Split Vector Quantization (NSVQ) is proposed to quantize the variable dimension LT-DCM vectors, designed according to the properties of the DCM vectors obtained from a standard speech database.
Abstract: The long-term harmonic plus noise model (LT-HNM) for speech shows an interesting data compression, since it exploits the smooth evolution of the time trajectories of the short-term harmonic plus noise model parameters, by applying a discrete cosine model (DCM). In this paper, we extend the LT-HNM to a complete low bit-rate speech coder. A Normalized Split Vector Quantization (NSVQ) is proposed to quantize the variable dimension LT-DCM vectors. The NSVQ is designed according to the properties of the DCM vectors obtained from a standard speech database. The obtained LT-HNM coder reaches an average bit-rate of 2.7kbps for wideband speech. The proposed coder is evaluated in terms of modeling and coding errors, bit-rate, listening quality and intelligibility. Index Terms Low bit-rate, speech coding, long term modeling, harmonic plus noise model, variable dimension vector quantization.

Proceedings ArticleDOI
16 May 2016
TL;DR: The results show that EVS codec provides the best performance over all the investigated scenarios in this study, and deploying G.729.1 codec in a training process of the verification system providing the best equal error rate in the fully-codec mismatched scenario.
Abstract: An automatic verification of person's identity from its voice is a part of modern telecommunication services. In order to execute a verification task, a speech signal has to be transmitted to a remote server. So, a performance of the verification system can be influenced by various distortions that can occur when transmitting a speech signal through a communication channel. This paper studies an effect of the state of art wideband (WB) speech codecs on a performance of automatic speaker verification in the context of a channel/codec mismatch between enrollment and test utterances. The speaker verification system is developed on GMM-UBM method. The results show that EVS codec provides the best performance over all the investigated scenarios in this study. Moreover, deploying G.729.1 codec in a training process of the verification system provides the best equal error rate in the fully-codec mismatched scenario. Anyhow, differences between the equal error rates reported for all of the codecs involved in this scenario are mostly nonsignificant.

Journal ArticleDOI
TL;DR: Compared to current VQ in MFCC codec, the output speech quality is substantially improved in terms of frequency-weighted segmental SNR, short-time objective intelligibility score, perceptual evaluation of speech quality score, and mean opinion score.
Abstract: This letter presents a perceptually weighted analysis-by-synthesis vector quantization (VQ) algorithm for low bit rate MFCC codec. Different from conventional VQ of mel-frequency cepstral coefficients (MFCCs) vector, this algorithm uses an analysis-by-synthesis technique and aims to minimize the perceptually weighted spectral reconstruction distortion rather than the distortion of MFCCs vector itself. Also, to reduce the computational complexity, we propose a practical suboptimal codebook searching technique and embed it into the split and multistage VQ framework. Objective and subjective experimental results on Mandarin speech show that the proposed algorithm yields intelligible and natural sounding speech for speech coding at 600–2400 bit/s. Compared to current VQ in MFCC codec, the output speech quality is substantially improved in terms of frequency-weighted segmental SNR, short-time objective intelligibility score, perceptual evaluation of speech quality score, and mean opinion score.

Journal ArticleDOI
Byonghwa Lee1, Kwangki Kim, Minsoo Hahn1
TL;DR: An efficient residual coding method of the SAOC-TSC is proposed to reduce the side information bitrate without audio quality degradation or complexity increase.
Abstract: In interactive audio services, users can render audio objects rather freely to match their desires and the spatial audio object coding (SAOC) scheme is fairly good both in the sense of bitrate and audio quality. But rather perceptible audio quality degradation can occur when an object is suppressed or played alone. To complement this, the SAOC scheme with Two-Step Coding (SAOC-TSC) was proposed. But the bitrate of the side information increases two times compared to that of the original SAOC due to the bitrate needed for the residual coding used to enhance the audio quality. In this paper, an efficient residual coding method of the SAOC-TSC is proposed to reduce the side information bitrate without audio quality degradation or complexity increase. key words: interactive audio service, spatial audio object coding, residual coding

Book ChapterDOI
01 Jan 2016
TL;DR: A method to estimate two codec-related parameters that used to calculate the E-model, which are called equipment impairment factor \( I e) and packet loss robustness factor \( Bpl \) of the non ITU-T codec are proposed.
Abstract: The aim of this research is to the improve performance of the E-model, which is one of the most successful non-intrusive speech quality prediction models for voice communication over a packet based network. However, the E-model still has limitations. The calculation method of the E-model is restricted to a set of voice codecs from ITU-T. This paper proposes a method to estimate two codec-related parameters that used to calculate the E-model, which are called equipment impairment factor \( I_{e} \) and packet loss robustness factor \( Bpl \) of the non ITU-T codec. The process to estimate both parameters uses a curve fitting method to calculate \( I_{e} \) values from PESQ results under various levels of network packet loss. The set of \( I_{e} \) and \( Bpl \) of eight narrowband codecs (G.711, G.729, GSM, AMR, iLBC, Speex, Silk, and Opus) are presented. Statistical analysis was also performed for model validation. The results show that the E-model with our I e and Bpl parameters achieved a good accuracy and a good correspondence with PESQ MOS among the eight codecs.

Proceedings ArticleDOI
22 May 2016
TL;DR: Experimental simulation results show that the DWT is a promising technique to use for encoding highly non-stationary signals such as the speech coding error and it is shown that the best choice of wavelet depends on many factors including the order and number of levels of the wavelet tree, delay and how well it approximates the human auditory system.
Abstract: We developed scalable narrowband and wideband speech coding schemes based on the internet low bitrate codec (iLBC). Some of these newer codecs used the Discrete Wavelet Transform (DWT) instead of the Modified Discrete Cosine Transform (MDCT). This paper explores the choice of wavelet packet transform (WPT) for an application for a new scalable speech codec for IP networks using the Discrete Wavelet Transform (DWT) to encode the core-layer coding error in the enhancement layer. The issues regarding the design and in particular the choice of wavelet for the wideband codec are discussed. Experimental simulation results show that the DWT is a promising technique to use for encoding highly non-stationary signals such as the speech coding error. The wideband codec achieved speech quality equivalent to ITU-G.718 and similar codecs and is more robust. We also show that the best choice of wavelet depends on many factors including the order and number o f levels of the wavelet tree, delay and how well it approximates the human auditory system.

Proceedings ArticleDOI
01 Jul 2016
TL;DR: A robust and full-duplex real-time speech communication system based on Texas Instrument's 32-bit floating point DSP TMS320C6748 based on ITU-T G.723.1 as the audio codec and several methods are introduced in this paper to optimize the algorithm.
Abstract: A qualified speech communication is quite important in modern communication. Thus, the implementation of security algorithms is also very important to achieve real-time applications. In this paper, we design a robust and full-duplex real-time speech communication system based on Texas Instrument's 32-bit floating point DSP TMS320C6748. We use ITU-T G.723.1 as the audio codec in this system. According to the hardware of the system and also to meet the requirement of high-quality and real-time communication, several methods are introduced in this paper to optimize the algorithm. Through the optimization, time spending in compression and decompression reduce from more than 1000ms to about 10ms which guarantees a real-time communication. Our designed system has been successfully used for the speech communication of railway system.

Journal ArticleDOI
TL;DR: This letter presents a triangular inequality elimination algorithm equipped with a dynamic and an intersection mechanism, named DI‐TIE, as a means to considerably improve the performance of ISF coefficient quantization in AMR‐WB speech codec.
Abstract: The adaptive multi-rate wideband (AMR-WB) speech codec with a sampling rate of 16 kHz is one of the speech codecs applied to 4G mobile communication systems as a way to remarkably improve the speech quality of a smartphone. However, a major drawback is that the vector quantization of the immittance spectral frequency (ISF) coefficients takes up the second largest of the total computational load of the AMR-WB encoder. In other words, the speech quality is improved at the cost of high battery power consumption. Accordingly, this letter presents a triangular inequality elimination algorithm equipped with a dynamic and an intersection mechanism, named DI-TIE, as a means to considerably improve the performance of ISF coefficient quantization in AMR-WB speech codec. Both mechanisms are designed in a way that recursively enhances the performance of TIE algorithm. This proposal is demonstrated by experimental results as a superior search algorithm relative to a conventional TIE and a multiple TIE (MTIE) approaches. With a full search algorithm as a benchmark for search load comparison, this work provides a search load reduction of more than 77%, a figure far beyond the 36% in the TIE and 49% in the MTIE approach. © 2016 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.

Patent
Zhe Wang1
28 Oct 2016
TL;DR: In this article, an audio coding method and a related apparatus are disclosed, which includes estimating reference linear prediction efficiency of a current audio frame, determining an audio coder scheme that matches the reference linear coder's prediction efficiency, and performing audio coding on the current frame according to the audio coding scheme.
Abstract: An audio coding method and a related apparatus are disclosed. The audio coding method includes: estimating reference linear prediction efficiency of a current audio frame; determining an audio coding scheme that matches the reference linear prediction efficiency of the foregoing current audio frame; and performing audio coding on the foregoing current audio frame according to the audio coding scheme that matches the reference linear prediction efficiency of the foregoing current audio frame. The technical solutions provided in embodiments of the present disclosure help reduce overheads of audio coding.

Proceedings ArticleDOI
01 Oct 2016
TL;DR: An encoder based H.265/HEVC 4K slim codec that shares the prediction unit of encoder except an entropy decoder and saves the total size by 40 % compared to an independent codec with encoder and encoder separately.
Abstract: H.265/High Efficiency Video Coding (HEVC) is the latest next generation video compression standard posterior to H.264/AVC. However, despite its superior coding efficiency to the previous video coding standards, the complexity to implement it is an obstacle to overcome. Especially, combining the separate encoder and decoder has a disadvantage on the aspect of the size and power consumption. To solve these problems, we design an encoder based H.265/HEVC 4K slim codec. The decoder within this codec shares the prediction unit of encoder except an entropy decoder. The proposed shared prediction unit architecture saves the total size by 40 % compared to an independent codec with encoder and encoder separately. The size of logic is 2.8 M gates with 120 kB internal SRAM and the power consumption of this slim codec is within a level of encoder. The function of slim codec is verified on our designed the Xilinx Virtex-7 platform and the 4K UHD codec chip operating at 600 MHz is going to implement on a 28 nm CMOS process in this year.

Proceedings ArticleDOI
01 Jan 2016
TL;DR: An additional speech compression by introducing a perceptual frequency masking is presented, which results in an important bit rate reduction when applying the p-LT-HNM codec to the TIMIT speech database, while preserving the same listening quality.
Abstract: The long-term (LT) modeling of speech is used to exploit the time correlation between successive speech parameters. A very low speech codec was developed based on this property, by applying the LT modeling on the harmonic plus noise model. This paper presents an additional speech compression by introducing a perceptual frequency masking. Inaudible components will not be considered in the LT modeling, reducing the data size to be coded. The resulting wideband speech codec (p-LT-HNM codec) is based on the perceptual LT harmonic plus noise. An important bit rate reduction is obtained when applying the p-LT-HNM codec to the TIMIT speech database, while preserving the same listening quality.

Patent
08 Dec 2016
TL;DR: In this article, the authors present techniques for inline switching of video and/or audio codecs for video and audio data, where a first device encodes data that includes portions(s) that are encoded using a first codec and portion(s), which are encoded by a second codec.
Abstract: Techniques described herein are directed to the inline switching of video and/or audio codecs for video and/or audio data. A first device encodes data that includes portion(s) that are encoded using a first codec and portion(s) that are encoded using a second codec. The encoder may further encode supplemental information in the data. The supplemental information is used by the decoder to determine a transition between the first and second portion(s). The decoder can thus anticipate the transition and properly switch the codec used to decode the data in real-time. Techniques described herein are also directed to the splicing of locally-stored content into content received from a remotely-located source. For example, targeted advertisements that are stored locally may be played back during commercial breaks of live content. The locally-stored targeted advertisements may replace the commercials provided via the remotely-located source.

Patent
Lei Miao1, Zexin Liu1
01 Feb 2016
TL;DR: In this paper, an audio signal coding method and apparatus is described, which includes categorizing audio signals into high-frequency audio signals and low-frequency signals, and coding the lowfrequency signals by using a corresponding lowfrequency coding manner according to the characteristics of lowfrequency signal.
Abstract: The present invention relates to an audio signal coding method and apparatus. The method includes: categorizing audio signals into high-frequency audio signals and low-frequency audio signals; coding the low-frequency audio signals by using a corresponding low-frequency coding manner according to characteristics of low-frequency audio signals; and selecting a bandwidth extension mode to code the high-frequency audio signals according to the low-frequency coding manner and/or characteristics of the audio signals.

Book ChapterDOI
23 Nov 2016
TL;DR: An error mitigation scheme which combines two different approaches, a replacement super vector technique which provides replacements to reconstruct both the LPC coefficients and the excitation signal along bursts of lost packets, and a Forward Error Code technique in order to minimize the error propagation after the last lost frame.
Abstract: In this paper, we propose an error mitigation scheme which combines two different approaches, a replacement super vector technique which provides replacements to reconstruct both the LPC coefficients and the excitation signal along bursts of lost packets, and a Forward Error Code (FEC) technique in order to minimize the error propagation after the last lost frame. Moreover, this FEC code is embedded into the bitstream in order to avoid the bitrate increment and keep the codec working in a compliant way on clean transmissions. The success of our recovery technique deeply relies on a quantization of the speech parameters (LPC coefficients and the excitation signal), especially in the case of the excitation signal where a modified version of the well-known Linde-Buzo-Gray (LBG) algorithm is applied. The performance of our proposal is evaluated over the AMR codec in terms of speech quality by using the PESQ algorithm. Our proposal achieves a noticeable improvement over the standard AMR legacy codec under adverse channel conditions without incurring neither on high computational costs or delays during the decoding stage nor consuming any additional bitrate.