scispace - formally typeset
Search or ask a question

Showing papers on "Code-excited linear prediction published in 2016"


Proceedings ArticleDOI
01 Jul 2016
TL;DR: This paper proposes the addition of two new prediction tools to the HEVC framework, to improve its coding efficiency and based on the local linear embedding-based prediction and the self-similarity compensated prediction.
Abstract: Light field imaging is a promising new technology that allows the user not only to change the focus and perspective after taking a picture, as well as to generate 3D content, among other applications. However, light field images are characterized by large amounts of data and there is a lack of coding tools to efficiently encode this type of content. Therefore, this paper proposes the addition of two new prediction tools to the HEVC framework, to improve its coding efficiency. The first tool is based on the local linear embedding-based prediction and the second one is based on the self-similarity compensated prediction. Experimental results show improvements over JPEG and HEVC in terms of average bitrate savings of 71.44% and 31.87%, and average PSNR gains of 4.73dB and 0.89dB, respectively.

78 citations


Posted Content
TL;DR: An overview of Speex, the technology involved in it and how it can be used in applications is presented.
Abstract: The Speex project has been started in 2002 to address the need for a free, open-source speech codec. Speex is based on the Code Excited Linear Prediction (CELP) algorithm and, unlike the previously existing Vorbis codec, is optimised for transmitting speech for low latency communication over an unreliable packet network. This paper presents an overview of Speex, the technology involved in it and how it can be used in applications. The most recent developments in Speex, such as the fixed-point port, acoustic echo cancellation and noise suppression are also addressed.

52 citations


Journal ArticleDOI
TL;DR: This paper proposes to solve the optimization problem by combining splitting methods with two approaches: the Douglas-Rachford method and the alternating direction method of multipliers, allowing to obtain solutions with a higher computational efficiency, orders of magnitude faster than with general purpose software based on interior-point methods.

16 citations


Journal ArticleDOI
TL;DR: This paper introduces an interpretation of directional prediction as a particular case of linear prediction, which uses the first-order linear filters and a set of geometric transformations, and motivates the proposal of a generalized intra prediction framework, whereby theFirst- order linear filters are replaced by adaptive linear filters with sparsity constraints.
Abstract: Directional intra prediction plays an important role in current state-of-the-art video coding standards. In directional prediction, neighbouring samples are projected along a specific direction to predict a block of samples. Ultimately, each prediction mode can be regarded as a set of very simple linear predictors, a different one for each pixel of a block. Therefore, a natural question that arises is whether one could use the theory of linear prediction in order to generate intra prediction modes that provide increased coding efficiency. However, such an interpretation of each directional mode as a set of linear predictors is too poor to provide useful insights for their design. In this paper, we introduce an interpretation of directional prediction as a particular case of linear prediction, which uses the first-order linear filters and a set of geometric transformations. This interpretation motivated the proposal of a generalized intra prediction framework, whereby the first-order linear filters are replaced by adaptive linear filters with sparsity constraints. In this context, we investigate the use of efficient sparse linear models, adaptively estimated for each block through the use of different algorithms, such as matching pursuit, least angle regression, least absolute shrinkage and selection operator, or elastic net. The proposed intra prediction framework was implemented and evaluated within the state-of-the-art high efficiency video coding standard. Experiments demonstrated the advantage of this predictive solution, mainly in the presence of images with complex features and textured areas, achieving higher average bitrate savings than other related sparse representation methods proposed in the literature.

8 citations


Journal ArticleDOI
TL;DR: It is demonstrated with the experimental results that the proposed RDPCM coding technique provides a significant coding gain over the state-of-the-art reference codec for screen content video coding.
Abstract: In this paper, a residual differential pulse code modulation (RDPCM) coding technique using a weighted linear combination of neighboring residual samples is proposed to provide coding efficiency in the screen content video coding. The RDPCM performs the sample-based prediction of residue to reduce spatial redundancies. The proposed method uses the $l_1$ optimization in the weight derivation by considering the statistical characteristics of graphical components in videos in an intracoding. Specifically we use the least absolute shrinkage and selection operator to derive the weights because the solution is accurate in high variance residue. Furthermore, we enhance parallelism in a line processing by restricting the support to the row-wise prediction to above samples or the column-wise prediction to the left samples. The proposed method uses an explicit RDPCM scheme, so a coding mode determined by rate-distortion optimization is transmitted to a decoder. For coding the overhead, we develop a context design in CABAC based on correlation between an intraprediction direction and an RDPCM prediction mode. It is demonstrated with the experimental results that the proposed method provides a significant coding gain over the state-of-the-art reference codec for screen content video coding.

8 citations


Proceedings ArticleDOI
01 Dec 2016
TL;DR: A Compressive Sensing based CELP coder that allows bit rate scalability by varying the dimension of the measurement vectors is designed and implemented in this paper.
Abstract: Code Excited Linear Prediction (CELP), one of the most famous hybrid speech coders, exploits the advantages of parametric coders and waveform coders. The quality of reconstructed speech increases with the size of Gaussian codebook used for quantizing excitation sequences. But this results in increased transmission bit rate and search complexity of the codebook. This can be dealt with using tools from Compressed Sensing (CS) domain that transfers complexity of transmitter to space of sparse recovery at receiver. Sparse signal recovery gained much interest in signal processing research as it allows data sampling below Nyquist rate. A Compressive Sensing based CELP coder that allows bit rate scalability by varying the dimension of the measurement vectors is designed and implemented in this paper. Vector quantization of CS measurements and Linear Predictive Coding (LPC) coefficients using Gaussian and LPC codebooks respectively resulted in a bit rate of 11.9kbps which is less than that of CELP coder of the same speech quality. By optimizing the number of bits allocated for parameters and interpolating the LP coefficients, the bit rate is further reduced to 8.1kbps without much degradation in the quality of the reconstructed speech.

5 citations


Proceedings Article
16 Mar 2016
TL;DR: The performance of the proposed technique AC with MFCC has proved better in terms of bit rate, word error rate and compression ratio than various existing techniques like ADPCM, LD-CELP, CS-A ELP, LPC, and MFCC.
Abstract: This paper presents speech feature extraction of Telugu language through proper compression. Compression is provided to speech using Digital Arithmetic coding and features are extracted by MFCC then classification is done by ANN. Speech feature extraction and feature classification are the major steps in ASR. This paper presents a technique to extract the speech features after speech compression. A technique with arithmetic coding and MFCC is done by reducing the average number of bits. Arithmetic coding and MFCC stands out in terms of magnificence and potency. A text dependent Telugu ASR is designed. Features extraction process is done for 140 bits/frame and 80 bits /frame and features extracted are LSP, Pitch prediction filter, code base indexes, gain, synchronization, FEC, future expansion. The proposed technique AC with MFCC has been compared with various existing techniques like ADPCM, LD-CELP, CS-AELP, CELP, LPC, and MFCC. The performance of the proposed technique has proved better in terms of bit rate, word error rate and compression ratio.

5 citations


Proceedings ArticleDOI
23 Mar 2016
TL;DR: It is shown that CELP technique is an improvement to a coder called Linear Predictive Coder (LPC), it is an efficient coding technique for the bit rate of 16-9.6 kbps and the MELP coder discussed here helps to remove the voicing error in two state excitation model of LPC.
Abstract: Speech is one of the natural ways of communication amongst humans. Nowadays there is insatiable demand for speech communication as it carries more information like speaker identity, emotional state, prosodic nuance which adds naturalness in communication. With rapid growth and increased number of applications there exists a need for devising an approach for data compression techniques which reduces communication cost by using available bandwidth and storage space effectively. The speech coding techniques helps to achieve bit rate reduction by simultaneously maintaining original speech quality. In this paper, Hybrid speech coding technique i.e. Code Excited Linear Prediction (CELP) and Parametric coding technique i.e. Mixed Excitation Linear Prediction (MELP) are discussed and CELP technique is implemented using MATLAB. The parameters like mean square error (MSE), Mean Opinion Score (MOS), and Signal to Noise Ratio are calculated for CELP technique which shows that CELP technique is an improvement to a coder called Linear Predictive Coder (LPC). It is an efficient coding technique for the bit rate of 16–9.6 kbps. The MELP coder discussed here helps to remove the voicing error in two state excitation model of LPC. It is a low bit rate coder having a bit rate of 2.4 kbps and mainly used by military and federal standards.

4 citations


Proceedings ArticleDOI
01 Oct 2016
TL;DR: The objective of this paper is to implement SPEEX decoding on ARM microprocessor, based on the voice compression algorithm technology of Code Excited Linear Prediction (CELP), which can effectively compress voice and retain the integrity of speech.
Abstract: The objective of this paper is to implement SPEEX decoding on ARM microprocessor. SPEEX [1] is based on the voice compression algorithm technology of Code Excited Linear Prediction (CELP) [2], which can effectively compress voice and retain the integrity of speech. For hardware part, we give up the high-cost, high-power consumption digital signal processor, and select the STM32 series ARM microprocessor produced by STMicroelectronics. Through coding at PC end, Bluetooth wireless transmission to the ARM processor, and SPEEX decoding, the voice is then played back. Finally, voice quality verification is carried out through Perceptual Evaluation of Speech Quality (PESQ).

4 citations


Journal ArticleDOI
TL;DR: An insight to the capabilities of compressive sensing (CS) in speech processing and a novel idea in the quantized framework is provided and the results indicate that the proposed scheme offers better compression in comparison with basic Gaussian codebook CELP.
Abstract: Speech compression or speech coding is inevitable for effective communication of speech signals in resource limited scenarios and researcher’s have been working on achieving lower and lower transmission bit rates (BR) without much compromise on the quality of speech Medium BR hybrid speech coding schemes have gained much interest in the recent years with most of them based on CELP, the basic medium bit-rate coding scheme In this work, we provide an insight to the capabilities of compressive sensing (CS) in speech processing and propose a novel idea in the quantized framework Three major aspects demonstrated in this paper are (1) Inherent de-noising of noisy speech by the CS based coder along with compression (2) Quantization of CS measurements to achieve medium transmission bit-rates and (3) Enhancement of quality and compression performance of the coder with better sparse representations of speech using dictionaries The results indicate that the proposed scheme offers better compression in comparison with basic Gaussian codebook CELP The CS scheme has the added advantage of inherent noise suppression and provides more robustness to background noise in comparison with parameter extraction based medium bit-rate speech coding systems

3 citations


Proceedings ArticleDOI
01 Jul 2016
TL;DR: Two types of codebooks namely Gaussian codebook and Fixed codebook are used in this analysis and the changes in the performance of the CELP codec is evaluated for theses two codebooks.
Abstract: The design of high quality speech coders with low data rate is a very challenging task in speech processing. CELP provides good quality coded speech at a very low bit rate of 4.8kbps. Two types of codebooks namely Gaussian codebook and Fixed codebook are used in this analysis. The changes in the performance of the CELP codec is evaluated for theses two codebooks. Optimization of the codebook can be done by improving the training process in the codebook generation stage. Two different types of codebooks, the Gaussian codebook and the fixed codebook is generated to be implemented in the G723.1 CELP codec and the variation in the performance is evaluated.

Posted Content
TL;DR: In this article, the authors improved the noise shaping of CELP using a more modern psychoacoustic model, which has the significant advantage of improving the quality of an existing codec without the need to change the bit-stream.
Abstract: One key aspect of the CELP algorithm is that it shapes the coding noise using a simple, yet effective, weighting filter. In this paper, we improve the noise shaping of CELP using a more modern psychoacoustic model. This has the significant advantage of improving the quality of an existing codec without the need to change the bit-stream. More specifically, we improve the Speex CELP codec by using the psychoacoustic model used in the Vorbis audio codec. The results show a significant increase in quality, especially at high bit-rates, where the improvement is equivalent to a 20% reduction in bit-rate. The technique itself is not specific to Speex and could be applied to other CELP codecs.

Proceedings ArticleDOI
01 Dec 2016
TL;DR: Theoretical and simulation analyses show that the proposed method for bandwidth extension of NB speech is robust to quantization and channel noises, and the log spectral distortion test shows that the reconstructed wideband signal gives a much better performance in terms of speech quality when compared to some of the existing speech bandwidth extension methods employing data hiding.
Abstract: Public telephone systems transmit speech across a limited frequency range, about 300–3400 Hz, called narrowband (NB) which results in a significant reduction of quality and intelligibility of speech. This paper proposes a fully backward compatible novel method for bandwidth extension of NB speech. The method uses magnitude spectrum data hiding technique to provide a perceptually better wideband speech signal. Code excited linear prediction (CELP) parameters are extracted from the down sampled frequency shifted version of the high frequency components of speech signal existing above NB, which are spread by using pseudo-noise codes, and are embedded in the low-amplitude high-frequency regions of the magnitude spectrum of NB speech signal. The embedded information is extracted at the receiving end to reconstruct the wideband speech signal. Theoretical and simulation analyses show that the proposed method is robust to quantization and channel noises. The log spectral distortion test clearly show that the reconstructed wideband signal gives a much better performance in terms of speech quality when compared to some of the existing speech bandwidth extension methods employing data hiding.

Journal ArticleDOI
TL;DR: A media-specific Forward Error Correction (FEC) method using a Pitch-Pulse Codebook (PPCB)-based approach to model the ACB contribution for voiced frame (frame onset) determined under Zero Crossing Rate constraint is proposed.
Abstract: One of the well-known problems of Code-Excited Linear Prediction (CELP)-type codec is its vulnerability to a frame erasure. When a frame is erased, the inter-frame dependency introduced by the Long Term Prediction causes a desynchronization of the Adaptive Codebook (ACB) which introduces in its turn an error propagation through the correctly received frames. In this paper, we propose a media-specific Forward Error Correction (FEC) method using a Pitch-Pulse Codebook (PPCB)-based approach to model the ACB contribution for voiced frame (frame onset) determined under Zero Crossing Rate constraint. The PPCB uses a single pulse optimized by Multipulse Maximum Likelihood Quantization algorithm to model the pitch-like contribution at the encoder side while the quantized version of that pulse will be sent as FEC information to resynchronize the ACB at the decoder side after a frame erasure. Through this approach a noticeable improvement of the synthesis speech quality is achieved under adverse channel conditions with the advantage of low computational complexity while the legacy bit-rate of the codec is kept unchanged.

Patent
24 Nov 2016
TL;DR: In this article, a speech encoding device for encoding a speech signal, comprising a speech encoder and an auxiliary information encoding unit, is proposed to recover speech quality without increasing an algorithm delay for a packet loss in speech encoding.
Abstract: PROBLEM TO BE SOLVED: To recover speech quality without increasing an algorithm delay for a packet loss in speech encoding.SOLUTION: Provided is a speech encoding device for encoding a speech signal, comprising a speech encoding unit for encoding a speech signal and an auxiliary information encoding unit for calculating the parameter of a lookahead signal in CELP encoding as auxiliary information used for packet loss concealment in CELP encoding. The speech encoding unit calculates an index representing the nature of a frame to be encoded and transmits it to the auxiliary information encoding unit, a pitch lag is included as auxiliary information in a packet immediately preceding a packet to be decoded for only a specific frame class and the pitch lag is not included for frame classes other than the specific frame class.SELECTED DRAWING: Figure 4

Journal ArticleDOI
TL;DR: An ACFBD-MPC (Amplitude Compensation Frequency Band Division-Multi Pulse Coding) using amplitude compensation in a multi-pulses each pitch interval and specific frequency to reduce the distortion of the synthesis speech waveform is presented.
Abstract: Recently, the use of signal compression methods to improve the efficiency of wireless networks have increased. In particular, the MPC system was used in the pitch extraction method and the excitation source of voiced and unvoiced to reduce the bit rate. In general, the MPC system using an excitation source of voiced and unvoiced would result in a distortion of the synthesis speech waveform in the case of voiced and unvoiced consonants in a frame. This is caused by normalization of the synthesis speech waveform in the process of restoring the multi-pulses of the representation segment. This paper presents an ACFBD-MPC (Amplitude Compensation Frequency Band Division-Multi Pulse Coding) using amplitude compensation in a multi-pulses each pitch interval and specific frequency to reduce the distortion of the synthesis speech waveform. The experiments were performed with 16 sentences of male and female voices. The voice signal was A/D converted to 10kHz 12bit. In addition, the ACFBD-MPC system was realized and the SNR of the ACFBD-MPC estimated in the coding condition of 8kbps. As a result, the SNR of ACFBD-MPC was 13.6dB for the female voice and 14.2dB for the male voice. The ACFBD-MPC improved the male and female voice by 1 dB and 0.9 dB, respectively, compared to the traditional MPC. This method is expected to be used for cellular telephones and smartphones using the excitation source with a low bit rate.

Journal ArticleDOI
TL;DR: This document presents an algorithm of switched orthogonalization of fixed-codebook (FCB) search in code-excited linear-predictive (CELP) speech coder and derivation of conditions for switching and the algorithm was evaluated in G.729.1 coder.

Patent
27 Dec 2016
TL;DR: In this article, a method for co-quantizing gains of an adaptive contribution and a fixed contribution of an excitation signal in a frame of a coded sound signal is proposed.
Abstract: PROBLEM TO BE SOLVED: To quantize a gain of adaptive and fixed contributions of the excitation signal to improve the tolerance of a codec to frame loss or packet loss that has possibility of occurring during transmission of an encoding parameter from an encoder to a decoder.SOLUTION: A device and method for co-quantizing gains of an adaptive contribution and a fixed contribution of an excitation signal in a frame of a coded sound signal. For retrieving a quantized gain of a fixed contribution of an excitation signal in a sub-frame in the frame, the gain of the fixed excitation contribution is estimated using a frame classification parameter, a gain codebook supplies a correction factor in response to a received gain codebook index, and a multiplier multiplies the estimated gain by the correction factor to provide the quantized gain of the fixed excitation contribution.SELECTED DRAWING: Figure 5

Patent
10 Sep 2016
TL;DR: In this article, a transform-domain path is configured to obtain a set of spectral coefficients and noise-shaping information on the basis of a time-domain representation of a portion of the audio content to be encoded in a transformdomain mode.
Abstract: FIELD: computer engineering.SUBSTANCE: invention relates to computer engineering. Audio signal encoder comprises a transform-domain path configured to obtain a set of spectral coefficients and noise-shaping information on the basis of a time-domain representation of a portion of the audio content to be encoded in a transform-domain mode. Transform-domain path comprises a time-domain-to-frequency-domain converter which performs window weighing in time domain of audio representation and outputs a set of spectral coefficients using time-domain-to-frequency-domain conversion window-weighted time representation of audio. Audio signal encoder includes a code-excited linear-prediction-domain path (CELP), which extracts information on code excitation and parameters of field of linear prediction of fragment audio encoded in CELP mode. Audio signal encoder allows selective formation of anti-aliasing information, when current fragment of audio follows fragment of audio coded by a CELP mode.EFFECT: technical result consists in improvement of efficiency of encoding successive fragments of audio.28 cl, 32 dwg

Proceedings ArticleDOI
08 Sep 2016
TL;DR: This work demonstrates that the performance of such combinations of speech enhancement and coding methods can be improved by joining the two methods into a single block, based on incorporating Wiener filtering into the objective function used for optimization of the quantization in code excited linear prediction (CELP)-based codecs.
Abstract: The performance of speech communication applications in the field of mobile devices is often hampered by background noises and distortions. Therefore, noise attenuation methods are commonly used as a pre-processing method, cascaded with the speech-codec. We demonstrate that the performance of such combinations of speech enhancement and coding methods can be improved by joining the two methods into a single block. The proposed method is based on incorporating Wiener filtering into the objective function used for optimization of the quantization in code excited linear prediction (CELP)-based codecs. The benefits are that 1) the non-linear components of CELP codecs, including quantization and error feedback, are taken into account in the joint minimization function thereby improving quality and 2) by merging blocks both delay and computational complexity can be minimized. Our experiments demonstrate that the proposed joint enhancement and coding approach consistently improves subjective and objective quality. The proposed method is compatible with any CELP-based codecs without changing the bit-stream, whereby it can be readily applied in mobile phones or speech communication devices applying the concepts of CELP codecs for improving perceptual quality in adverse conditions.

Book ChapterDOI
14 Jul 2016
TL;DR: SPEEX voice compression technology will be transplanted to the STM32 processor, and SPEEX encoded data transmitted through the transmitter to the receiver; and finally through the decoding side the encoded data is restored to playback.
Abstract: The goal of this paper is to implement SPEEX speech compression technology on ARM processor. SPEEX is a speech compression technology based on the Code Excited Linear Prediction (CELP) algorithm. It has a low bit rate, high resolution, and supports a variety of signal sampling rates, so the voice can effectively compression and also preserve speech integrity. For hardware part, we abandon high cost, high power digital signal processors (DSPs), but we select the ARM processor of STM32 series produced by STMicroelectronics. SPEEX voice compression technology will be transplanted to the STM32 processor, and SPEEX encoded data transmitted through the transmitter to the receiver; and finally through the decoding side the encoded data is restored to playback. The experiment is divided into two parts. First through the SPEEX encoded sine wave signal, we input data to the ARM decoder, and use the oscilloscope to compare the difference after encoding and decoding signals. The other is through the PESQ for voice quality verification, and is compared with the traditional ADPCM voice quality.

Book ChapterDOI
23 Nov 2016
TL;DR: A Forward Error Correction (FEC)-based technique which relies on energy constraint to determine frame onset which will be considered for sending the FEC information and greatly improves the CELP-based codec robustness to packet losses with no increase in coder storage capacity.
Abstract: The strong interframe dependency present in Code Excited Linear Prediction (CELP) codecs renders the decoder very vulnerable when the Adaptive Codebook (ACB) is desynchronized. Hence, errors affect not only the concealed frame but also all the subsequent frames. In this paper, we have developed a Forward Error Correction (FEC)-based technique which relies on energy constraint to determine frame onset which will be considered for sending the FEC information. The extra information contains an optimized FEC pulse excitation which models the contribution of the ACB to offer a resynchronization procedure at the decoder. In fact, under the energy constraint the number of Fixed Codebook (FCB) pulses can be reduced in order to be exploited by the FEC intervention. In return, the error propagation is considerably prevented with no overload of added-pulses. Furthermore, the proposed method greatly improves the CELP-based codec robustness to packet losses with no increase in coder storage capacity.

Patent
Hiroyuki Ehara1, Takako Hori1
20 Jan 2016
TL;DR: In this article, a CELP-type speech coding apparatus includes a parameter quantizer that selects an adaptive codebook vector and a fixed codebookvector so as to minimize an error between a synthesized speech signal and an input speech signal.
Abstract: In a CELP-type speech coding apparatus, switching between an orthogonal search of a fixed codebook and a non-orthogonal search is performed in a practical and effective manner. The CELP-type speech coding apparatus includes a parameter quantizer that selects an adaptive codebook vector and a fixed codebook vector so as to minimize an error between a synthesized speech signal and an input speech signal. The parameter quantizer includes a fixed codebook searcher that switches between the orthogonal fixed codebook search and the non-orthogonal fixed codebook search based on a correlation value between a target vector for the fixed codebook search and the adaptive codebook vector obtained as a result of a synthesis filtering process.