scispace - formally typeset
Search or ask a question

Showing papers on "Code-excited linear prediction published in 2019"


Proceedings ArticleDOI
01 Oct 2019
TL;DR: In this paper, instead of applying the mean squared error (MSE) as the loss function during DNN training for speech enhancement, a perceptual weighting filter loss motivated by the weighting filters as employed in analysis-by-synthesis speech coding was proposed.
Abstract: Single-channel speech enhancement with deep neural networks (DNNs) has shown promising performance and is thus intensively being studied. In this paper, instead of applying the mean squared error (MSE) as the loss function during DNN training for speech enhancement, we design a perceptual weighting filter loss motivated by the weighting filter as it is employed in analysis-by-synthesis speech coding, e.g., in code-excited linear prediction (CELP). The experimental results show that the proposed simple loss function improves the speech enhancement performance compared to a reference DNN with MSE loss in terms of perceptual quality and noise attenuation. The proposed loss function can be advantageously applied to an existing DNN-based speech enhancement system, without modification of the DNN topology for speech enhancement.

12 citations


Proceedings ArticleDOI
12 May 2019
TL;DR: This work introduces a method to improve the quality of simple scalar quantization in the context of acoustic sensor networks by combining ideas from sparse reconstruction, artificial neural networks and weighting filters.
Abstract: We introduce a method to improve the quality of simple scalar quantization in the context of acoustic sensor networks by combining ideas from sparse reconstruction, artificial neural networks and weighting filters. We start from the observation that optimization methods based on sparse reconstruction resemble the structure of a neural network. Hence, building upon a successful enhancement method, we unroll the algorithms and use this to build a neural network which we train to obtain enhanced decoding. In addition, the weighting filter from code-excited linear predictive (CELP) speech coding is integrated into the loss function of the neural network, achieving perceptually improved reconstructed speech. Our experiments show that our proposed trained methods allow for better speech reconstruction than the reference optimization methods.

8 citations


Journal ArticleDOI
01 May 2019-Heliyon
TL;DR: A novel scalable speech coding scheme based on Compressive Sensing, which can operate at bit rates from 3.275 to 7.275 kbps is designed and implemented and offers the listening quality for reconstructed speech similar to that of Adaptive Multi rate - Narrowband codec at 6.7 kbps and Enhanced Voice Services (EVS) at 7.2 kbps.

7 citations


Journal ArticleDOI
TL;DR: A recently introduced quantity, the log ratio of entropy powers, can be used to estimate mutual informations in terms of bits/sample and it is shown that for many common distributions and for Gaussian autoregressive processes, the entropy powers in the ratio can be replaced by the corresponding minimum mean squared errors.
Abstract: We write the mutual information between an input speech utterance and its reconstruction by a code-excited linear prediction (CELP) codec in terms of the mutual information between the input speech and the contributions due to the short-term predictor, the adaptive codebook, and the fixed codebook. We then show that a recently introduced quantity, the log ratio of entropy powers, can be used to estimate these mutual informations in terms of bits/sample. A key result is that for many common distributions and for Gaussian autoregressive processes, the entropy powers in the ratio can be replaced by the corresponding minimum mean squared errors. We provide examples of estimating CELP codec performance using the new results and compare these to the performance of the adaptive multirate (AMR) codec and other CELP codecs. Similar to rate distortion theory, this method only needs the input source model and the appropriate distortion measure.

5 citations


Journal ArticleDOI
TL;DR: Improvements to the scalable wideband speech codec based on the iLBC are presented, employing the wavelet packet transform (WPT) instead of the modified discrete cosine transform (MDCT) to enhance the quality, and the proposed codec outperforms G.729.1 at most bit rates according to the objective quality.

3 citations


Posted Content
TL;DR: The experimental results show that the proposed simple loss function improves the speech enhancement performance compared to a reference DNN with MSE loss in terms of perceptual quality and noise attenuation.
Abstract: Single-channel speech enhancement with deep neural networks (DNNs) has shown promising performance and is thus intensively being studied. In this paper, instead of applying the mean squared error (MSE) as the loss function during DNN training for speech enhancement, we design a perceptual weighting filter loss motivated by the weighting filter as it is employed in analysis-by-synthesis speech coding, e.g., in code-excited linear prediction (CELP). The experimental results show that the proposed simple loss function improves the speech enhancement performance compared to a reference DNN with MSE loss in terms of perceptual quality and noise attenuation. The proposed loss function can be advantageously applied to an existing DNN-based speech enhancement system, without modification of the DNN topology for speech enhancement. The source code for the proposed approach is made available.

3 citations


Proceedings ArticleDOI
01 Mar 2019
TL;DR: A codec based on the excited linear prediction (CELP) speech compression method adopting a denoising autoencoder with spectral compensation (DAE-SC) for quality and intelligibility enhancement is proposed in this paper.
Abstract: A codec based on the excited linear prediction (CELP) speech compression method adopting a denoising autoencoder with spectral compensation (DAE-SC) for quality and intelligibility enhancement is proposed in this paper. The sizes of CELP parameters in the encoder are carefully pruned to achieve a higher compression rate. To recover the speech quality and intelligibility degradation due to the pruned CELP parameters, a DAE-SC network with three hidden layers is employed in the decoder. Compared with the conventional CELP codec at a 9.6k bps transmission rate, the proposed speech codec achieves extra 21.9% bit rate reduction with comparable speech quality and intelligibility that are evaluated by four commonly used speech performance metrics.

2 citations


Proceedings ArticleDOI
01 Dec 2019
TL;DR: It is shown that it is possible to reconstruct key acoustic parameters such as linear predictive coefficients (LPCs) when these parameters are encoded using scalar quantization and speech reconstruction is possible when it is known that the vocoder belongs to the family of code excited linear prediction (CELP) models, but the coded bit frame structure is unknown.
Abstract: The problem of reconstructing vocoder acoustic parameters using only encoded bit stream data is considered with applications to forensics and reverse engineering. Wasserstein generative adversarial networks (GANs) and CycleGANs, that map two unpaired domains, are used. It is shown that it is possible to reconstruct key acoustic parameters such as linear predictive coefficients (LPCs) when these parameters are encoded using scalar quantization. It is further shown that speech reconstruction is possible to some extent when it is known that the vocoder belongs to the family of code excited linear prediction (CELP) models, but the coded bit frame structure is unknown.

1 citations


Proceedings ArticleDOI
01 Nov 2019
TL;DR: The Perceptual Evaluation of Speech Quality and Enhanced Modified Bark Spectral Distortion tests under various packet loss conditions confirm that the proposed algorithm, namely MDC is superior to the concealment algorithm embedded in the G.722.2 speech coder.
Abstract: In Voice over Iternet Protocol (VoIP) applications, paquet loss is major source of speech impairement. This work consists on using packet loss concealment methods based on Multiple Description Coding (MDC) and Forward Error Correction (FEC) to improve speech quality deterioration caused by packet losses for Code-Excited Linear Prediction (CELP) based coders in packet network. We applied our approach to the standard ITU-T G.722.2 standard speech coder to evaluate its performance. The Perceptual Evaluation of Speech Quality (PESQ) and Enhanced Modified Bark Spectral Distortion (EMBSD) tests under various packet loss conditions confirm that the proposed algorithm, namely MDC is superior to the concealment algorithm embedded in the G.722.2. The performance measures demonstrate that the concealment method based FEC is better than interleaving and MDC methods at the expense of extra delay.

1 citations


Proceedings ArticleDOI
27 Nov 2019
TL;DR: The quality of the synthesized speech using LSP parameters derived from the trained prototypes is found to be clearly supported by the speech synthesis ability preserved in the training, and thequality of the Bayes error estimation is clearly supported.
Abstract: To increase the utility of Regularization, which is a common framework for avoiding the underestimation of ideal Bayes error, for speech recognizer training, we propose a new classifier training concept that incorporates a regularization term that represents the speech synthesis ability of classifier parameters. To implement our new concept, we first introduce a speech recognizer that embeds Line Spectral Pairs-Conjugate Structure-Algebraic Code Excited Linear Prediction (LSP-CS-ACELP) in a Multi-Prototype State-Transition-Model (MP-STM) classifier, define a regularization term that represents the speech synthesis ability by the distance between a training sample and its nearest MP-STM word model, and formalize a new Minimum Classification Error (MCE) training method for jointly minimizing a conventional smooth classification error count loss and the newly defined regularization term. We evaluated the proposed training method in an isolated-word, closed-vocabulary, and speaker-independent speech recognition task whose Bayes error is estimated to be about 20% and found that our method successfully produced an estimate of Bayes error (about 18.4%) with a single training run over a training dataset without such data resampling as Cross-Validation or the assumptions of sample distribution. Moreover, we investigated the quality of the synthesized speech using LSP parameters derived from the trained prototypes and found that the quality of the Bayes error estimation is clearly supported by the speech synthesis ability preserved in the training.

1 citations


Proceedings ArticleDOI
17 Jul 2019
TL;DR: A survey of various effective codec implementation and performance analyzing methods with different applications, such as CELP Codec, MELP codec, SPEEX speech codec, and OPUS speech codec is proposed.
Abstract: A codec is a device which helps to encode or decode digital data signals, it encodes a data stream or encrypted form and a decoder operates a reverse process of encoder. This codec's are applied in signal editing, video editing; signal conferencing, video conferencing etc. The codec's have two major principles like pulse code modulation method and delta modulation method. In this paper, we proposed a survey of various effective codec implementation and performance analyzing methods with different applications. The various effective codec's are implemented such as CELP codec, MELP codec, SPEEX speech codec, and OPUS speech codec. A CELP codec is code excited linear prediction method which helps to increase bit detection rate, high efficient process in speech coding, and peak quality in low bit rates. A CELP codec is to be applied in real time advanced technologies such as automatic watermarking process, communication field, Zigbee applications, filtering process, power spectrum analyzing fields, quantization methods, Gaussian processing fields, ARM processors etc. A MELP is a mixed excitation liner prediction method which helps to eliminate the voicing error obtains in multiple state excitation method. A SPEEX method helps to analyze speech quality and bit rate and an OPUS is high flexible speech codec technique which plays a vital role in real time applications. These methods are implemented and performances are analyzed based on different real world applications.

Book ChapterDOI
01 Jan 2019
TL;DR: The paper addresses the tactic of using forward mode excitation codebook structure of extended G.729 consuming two non-zero pulses in each track in ITU-T 8 Kbps CS-ACELP speech codec and proposes modification in legacy speech codec for determining optimized excite codebook codevector using least significant search engine.
Abstract: The paper addresses the tactic of using forward mode excitation codebook structure of extended G.729 (ITU-T. Recommendation, 2007 [1]) consuming two non-zero pulses in each track in ITU-T 8 Kbps CS-ACELP speech codec (ITU-T. Recommendation, 2007 [1]). The proposed modification in legacy speech codec for determining optimized excitation codebook codevector using least significant search engine avoids two different codebook structures to minimize the complexity of search engine. Modification in the excitation codebook structure of the legacy speech codec triggers the bit rate of 11.6 Kbps. Comparative performance analysis is performed between the proposed speech codec and CELP-based speech codec using different subjective and objective quality assessment parameters.

Patent
Dmitry V. Shmunk1, Dmitry Rusanov
05 Feb 2019
TL;DR: In this paper, a method for achieving bitstream scalability in a multi-channel audio encoder is presented, which includes receiving audio input data, organizing said input data by a Code Excited Linear Predictor (CELP) processing module for further encoding by arranging said data according to significance of data.
Abstract: The present invention provides for methods and apparatuses for processing audio data In one embodiment, there is provided a method for achieving bitstream scalability in a multi-channel audio encoder, said method comprising receiving audio input data; organizing said input data by a Code Excited Linear Predictor (CELP) processing module for further encoding by arranging said data according to significance of data, where more significant data is placed ahead of less significant data; and providing a scalable output bitstream; a higher bitrate bitstream is scaled to lower bitrate by discarding less significant data from frame ends The organized CELP data comprises of a first part and a second part The first part comprises a frame header, sub frame parameters and innovation vector quantization data from the first frame from all channels The innovation vector quantization data from the first frames from all channels is arranged according to channel number

01 Jan 2019
TL;DR: The proposed method has brought down the bit rate of CELP to 6.4 kbps or reduced the bit requirement by 12% without compromising on the perceptual quality of reconstructed speech.
Abstract: Speech is a highly complex and dynamic acoustic wave produced by the vocal tract as a result of the excitation in the form of air expelled from lungs The vocal tract characteristics vary in different manner during production of various speech categories This time variant acoustic filter has been represented by a Linear Prediction (LP) filter in Speech Production Model based on which Code Excited Linear Prediction (CELP) and many other speech coders are built The periodic nature of voiced speech due to vocal chord vibration causes slow variation for vocal tract characteristics and thus, similarity exists among nearby portions of voiced speech This similarity property is explored to reduce the count of transmitted Linear Predictive Coding (LPC) coefficients and excitation that are bit consuming and also significant parameters of LP filter This has been implemented in 73 kbps CELP by determining appropriate threshold for similarity values of both parameters The proposed method has brought down the bit rate of CELP to 64 kbps or reduced the bit requirement by 12% without compromising on the perceptual quality of reconstructed speech

Proceedings ArticleDOI
01 Oct 2019
TL;DR: In this article, the similarity property of the vocal tract is explored to reduce the count of transmitted LPC coefficients and excitation that are bit consuming and also significant parameters of LP filter.
Abstract: Speech is a highly complex and dynamic acoustic wave produced by the vocal tract as a result of the excitation in the form of air expelled from lungs. The vocal tract characteristics vary in different manner during production of various speech categories. This time variant acoustic filter has been represented by a Linear Prediction (LP) filter in Speech Production Model based on which Code Excited Linear Prediction (CELP) and many other speech coders are built. The periodic nature of voiced speech due to vocal chord vibration causes slow variation for vocal tract characteristics and thus, similarity exists among nearby portions of voiced speech. This similarity property is explored to reduce the count of transmitted Linear Predictive Coding (LPC) coefficients and excitation that are bit consuming and also significant parameters of LP filter. This has been implemented in 7.3 kbps CELP by determining appropriate threshold for similarity values of both parameters. The proposed method has brought down the bit rate of CELP to 6.4 kbps or reduced the bit requirement by 12% without compromising on the perceptual quality of reconstructed speech.