Showing papers on "Code-excited linear prediction published in 2019"

PDF

Open Access

Proceedings Article•DOI•

A Perceptual Weighting Filter Loss for DNN Training In Speech Enhancement

[...]

Ziyue Zhao¹, Samy Elshamy¹, Tim Fingscheidt¹•Institutions (1)

01 Oct 2019

TL;DR: In this paper, instead of applying the mean squared error (MSE) as the loss function during DNN training for speech enhancement, a perceptual weighting filter loss motivated by the weighting filters as employed in analysis-by-synthesis speech coding was proposed.

...read moreread less

Abstract: Single-channel speech enhancement with deep neural networks (DNNs) has shown promising performance and is thus intensively being studied. In this paper, instead of applying the mean squared error (MSE) as the loss function during DNN training for speech enhancement, we design a perceptual weighting filter loss motivated by the weighting filter as it is employed in analysis-by-synthesis speech coding, e.g., in code-excited linear prediction (CELP). The experimental results show that the proposed simple loss function improves the speech enhancement performance compared to a reference DNN with MSE loss in terms of perceptual quality and noise attenuation. The proposed loss function can be advantageously applied to an existing DNN-based speech enhancement system, without modification of the DNN topology for speech enhancement.

...read moreread less

12 citations

Proceedings Article•DOI•

Learning to Dequantize Speech Signals by Primal-dual Networks: an Approach for Acoustic Sensor Networks

[...]

Christoph Brauer, Ziyue Zhao¹, Dirk A. Lorenz, Tim Fingscheidt¹•Institutions (1)

Braunschweig University of Technology¹

12 May 2019

TL;DR: This work introduces a method to improve the quality of simple scalar quantization in the context of acoustic sensor networks by combining ideas from sparse reconstruction, artificial neural networks and weighting filters.

...read moreread less

Abstract: We introduce a method to improve the quality of simple scalar quantization in the context of acoustic sensor networks by combining ideas from sparse reconstruction, artificial neural networks and weighting filters. We start from the observation that optimization methods based on sparse reconstruction resemble the structure of a neural network. Hence, building upon a successful enhancement method, we unroll the algorithms and use this to build a neural network which we train to obtain enhanced decoding. In addition, the weighting filter from code-excited linear predictive (CELP) speech coding is integrated into the loss function of the neural network, achieving perceptually improved reconstructed speech. Our experiments show that our proposed trained methods allow for better speech reconstruction than the reference optimization methods.

...read moreread less

8 citations

Journal Article•DOI•

A scalable speech coding scheme using compressive sensing and orthogonal mapping based vector quantization.

[...]

M. S. Arun Sankar¹, P. S. Sathidevi¹•Institutions (1)

National Institute of Technology Calicut¹

01 May 2019-Heliyon

TL;DR: A novel scalable speech coding scheme based on Compressive Sensing, which can operate at bit rates from 3.275 to 7.275 kbps is designed and implemented and offers the listening quality for reconstructed speech similar to that of Adaptive Multi rate - Narrowband codec at 6.7 kbps and Enhanced Voice Services (EVS) at 7.2 kbps.

...read moreread less

7 citations

Journal Article•DOI•

Mutual Information, the Linear Prediction Model, and CELP Voice Codecs

[...]

Jerry D. Gibson

22 May 2019-Information-an International Interdisciplinary Journal

TL;DR: A recently introduced quantity, the log ratio of entropy powers, can be used to estimate mutual informations in terms of bits/sample and it is shown that for many common distributions and for Gaussian autoregressive processes, the entropy powers in the ratio can be replaced by the corresponding minimum mean squared errors.

...read moreread less

Abstract: We write the mutual information between an input speech utterance and its reconstruction by a code-excited linear prediction (CELP) codec in terms of the mutual information between the input speech and the contributions due to the short-term predictor, the adaptive codebook, and the fixed codebook. We then show that a recently introduced quantity, the log ratio of entropy powers, can be used to estimate these mutual informations in terms of bits/sample. A key result is that for many common distributions and for Gaussian autoregressive processes, the entropy powers in the ratio can be replaced by the corresponding minimum mean squared errors. We provide examples of estimating CELP codec performance using the new results and compare these to the performance of the adaptive multirate (AMR) codec and other CELP codecs. Similar to rate distortion theory, this method only needs the input source model and the appropriate distortion measure.

...read moreread less

5 citations

Journal Article•DOI•

A scalable wideband speech codec using the wavelet packet transform based on the internet low bitrate codec

[...]

Koji Seto¹, Tokunbo Ogunfunmi¹•Institutions (1)

Santa Clara University¹

01 Mar 2019-Computer Speech & Language

TL;DR: Improvements to the scalable wideband speech codec based on the iLBC are presented, employing the wavelet packet transform (WPT) instead of the modified discrete cosine transform (MDCT) to enhance the quality, and the proposed codec outperforms G.729.1 at most bit rates according to the objective quality.

...read moreread less

3 citations

Posted Content•

A Perceptual Weighting Filter Loss for DNN Training in Speech Enhancement

[...]

Ziyue Zhao¹, Samy Elshamy¹, Tim Fingscheidt¹•Institutions (1)

Braunschweig University of Technology¹

23 May 2019-arXiv: Audio and Speech Processing

TL;DR: The experimental results show that the proposed simple loss function improves the speech enhancement performance compared to a reference DNN with MSE loss in terms of perceptual quality and noise attenuation.

...read moreread less

3 citations

Proceedings Article•DOI•

A Pruned-CELP Speech Codec Using Denoising Autoencoder with Spectral Compensation for Quality and Intelligibility Enhancement

[...]

Yu-Ting Lo¹, Syu-Siang Wang², Yu Tsao², Sheng-Yu Peng¹•Institutions (2)

National Taiwan University of Science and Technology¹, Center for Information Technology²

01 Mar 2019

TL;DR: A codec based on the excited linear prediction (CELP) speech compression method adopting a denoising autoencoder with spectral compensation (DAE-SC) for quality and intelligibility enhancement is proposed in this paper.

...read moreread less

Abstract: A codec based on the excited linear prediction (CELP) speech compression method adopting a denoising autoencoder with spectral compensation (DAE-SC) for quality and intelligibility enhancement is proposed in this paper. The sizes of CELP parameters in the encoder are carefully pruned to achieve a higher compression rate. To recover the speech quality and intelligibility degradation due to the pruned CELP parameters, a DAE-SC network with three hidden layers is employed in the decoder. Compared with the conventional CELP codec at a 9.6k bps transmission rate, the proposed speech codec achieves extra 21.9% bit rate reduction with comparable speech quality and intelligibility that are evaluated by four commonly used speech performance metrics.

...read moreread less

2 citations

Proceedings Article•DOI•

Blind Vocoder Speech Reconstruction using Generative Adversarial Networks

[...]

Yoav Blum¹, David Burshtein¹•Institutions (1)

Tel Aviv University¹

01 Dec 2019

TL;DR: It is shown that it is possible to reconstruct key acoustic parameters such as linear predictive coefficients (LPCs) when these parameters are encoded using scalar quantization and speech reconstruction is possible when it is known that the vocoder belongs to the family of code excited linear prediction (CELP) models, but the coded bit frame structure is unknown.

...read moreread less

Abstract: The problem of reconstructing vocoder acoustic parameters using only encoded bit stream data is considered with applications to forensics and reverse engineering. Wasserstein generative adversarial networks (GANs) and CycleGANs, that map two unpaired domains, are used. It is shown that it is possible to reconstruct key acoustic parameters such as linear predictive coefficients (LPCs) when these parameters are encoded using scalar quantization. It is further shown that speech reconstruction is possible to some extent when it is known that the vocoder belongs to the family of code excited linear prediction (CELP) models, but the coded bit frame structure is unknown.

...read moreread less

1 citations

Proceedings Article•DOI•

Multiple Description Coding and Forward Error Correction Concealment Methods for ACELP Coders in Packet Networks

[...]

Hocine Chaouch, Fatiha Merazka

01 Nov 2019

TL;DR: The Perceptual Evaluation of Speech Quality and Enhanced Modified Bark Spectral Distortion tests under various packet loss conditions confirm that the proposed algorithm, namely MDC is superior to the concealment algorithm embedded in the G.722.2 speech coder.

...read moreread less

Abstract: In Voice over Iternet Protocol (VoIP) applications, paquet loss is major source of speech impairement. This work consists on using packet loss concealment methods based on Multiple Description Coding (MDC) and Forward Error Correction (FEC) to improve speech quality deterioration caused by packet losses for Code-Excited Linear Prediction (CELP) based coders in packet network. We applied our approach to the standard ITU-T G.722.2 standard speech coder to evaluate its performance. The Perceptual Evaluation of Speech Quality (PESQ) and Enhanced Modified Bark Spectral Distortion (EMBSD) tests under various packet loss conditions confirm that the proposed algorithm, namely MDC is superior to the concealment algorithm embedded in the G.722.2. The performance measures demonstrate that the concealment method based FEC is better than interleaving and MDC methods at the expense of extra delay.

...read moreread less

1 citations

Proceedings Article•DOI•

Minimum Classification Error Training with Speech Synthesis-Based Regularization for Speech Recognition

[...]

Naoto Umezaki¹, Takumi Okubo¹, Hideyuki Watanabe, Shigeru Katagiri¹, Miho Ohsaki¹ - Show less +1 more•Institutions (1)

Doshisha University¹

27 Nov 2019

TL;DR: The quality of the synthesized speech using LSP parameters derived from the trained prototypes is found to be clearly supported by the speech synthesis ability preserved in the training, and thequality of the Bayes error estimation is clearly supported.

...read moreread less

Abstract: To increase the utility of Regularization, which is a common framework for avoiding the underestimation of ideal Bayes error, for speech recognizer training, we propose a new classifier training concept that incorporates a regularization term that represents the speech synthesis ability of classifier parameters. To implement our new concept, we first introduce a speech recognizer that embeds Line Spectral Pairs-Conjugate Structure-Algebraic Code Excited Linear Prediction (LSP-CS-ACELP) in a Multi-Prototype State-Transition-Model (MP-STM) classifier, define a regularization term that represents the speech synthesis ability by the distance between a training sample and its nearest MP-STM word model, and formalize a new Minimum Classification Error (MCE) training method for jointly minimizing a conventional smooth classification error count loss and the newly defined regularization term. We evaluated the proposed training method in an isolated-word, closed-vocabulary, and speaker-independent speech recognition task whose Bayes error is estimated to be about 20% and found that our method successfully produced an estimate of Bayes error (about 18.4%) with a single training run over a training dataset without such data resampling as Cross-Validation or the assumptions of sample distribution. Moreover, we investigated the quality of the synthesized speech using LSP parameters derived from the trained prototypes and found that the quality of the Bayes error estimation is clearly supported by the speech synthesis ability preserved in the training.

...read moreread less

1 citations

Proceedings Article•DOI•

A Survey of Various Effective Codec Implementation Methods with Different Real Time Applications

[...]

V. Muthu Ganesh, N. Janukiruman

17 Jul 2019

TL;DR: A survey of various effective codec implementation and performance analyzing methods with different applications, such as CELP Codec, MELP codec, SPEEX speech codec, and OPUS speech codec is proposed.

...read moreread less

Abstract: A codec is a device which helps to encode or decode digital data signals, it encodes a data stream or encrypted form and a decoder operates a reverse process of encoder. This codec's are applied in signal editing, video editing; signal conferencing, video conferencing etc. The codec's have two major principles like pulse code modulation method and delta modulation method. In this paper, we proposed a survey of various effective codec implementation and performance analyzing methods with different applications. The various effective codec's are implemented such as CELP codec, MELP codec, SPEEX speech codec, and OPUS speech codec. A CELP codec is code excited linear prediction method which helps to increase bit detection rate, high efficient process in speech coding, and peak quality in low bit rates. A CELP codec is to be applied in real time advanced technologies such as automatic watermarking process, communication field, Zigbee applications, filtering process, power spectrum analyzing fields, quantization methods, Gaussian processing fields, ARM processors etc. A MELP is a mixed excitation liner prediction method which helps to eliminate the voicing error obtains in multiple state excitation method. A SPEEX method helps to analyze speech quality and bit rate and an OPUS is high flexible speech codec technique which plays a vital role in real time applications. These methods are implemented and performances are analyzed based on different real world applications.

...read moreread less

Book Chapter•DOI•

Proposed Modifications in the Excitation Codebook Structure of ITU-T CS-ACELP Speech Codec and Its Overall Comparative Performance Analysis with CELP-Based AMR-NB Speech Codec

[...]

Nikunj V. Tahilramani¹, Ninad Bhatt•Institutions (1)

Uka Tarsadia University¹

01 Jan 2019

TL;DR: The paper addresses the tactic of using forward mode excitation codebook structure of extended G.729 consuming two non-zero pulses in each track in ITU-T 8 Kbps CS-ACELP speech codec and proposes modification in legacy speech codec for determining optimized excite codebook codevector using least significant search engine.

...read moreread less

Abstract: The paper addresses the tactic of using forward mode excitation codebook structure of extended G.729 (ITU-T. Recommendation, 2007 [1]) consuming two non-zero pulses in each track in ITU-T 8 Kbps CS-ACELP speech codec (ITU-T. Recommendation, 2007 [1]). The proposed modification in legacy speech codec for determining optimized excitation codebook codevector using least significant search engine avoids two different codebook structures to minimize the complexity of search engine. Modification in the excitation codebook structure of the legacy speech codec triggers the bit rate of 11.6 Kbps. Comparative performance analysis is performed between the proposed speech codec and CELP-based speech codec using different subjective and objective quality assessment parameters.

...read moreread less

Patent•

Scalable code excited linear prediction bitstream repacked from a higher to a lower bitrate by discarding insignificant frame data

[...]

Dmitry V. Shmunk¹, Dmitry Rusanov•Institutions (1)

DTS¹

05 Feb 2019

TL;DR: In this paper, a method for achieving bitstream scalability in a multi-channel audio encoder is presented, which includes receiving audio input data, organizing said input data by a Code Excited Linear Predictor (CELP) processing module for further encoding by arranging said data according to significance of data.

...read moreread less

Abstract: The present invention provides for methods and apparatuses for processing audio data In one embodiment, there is provided a method for achieving bitstream scalability in a multi-channel audio encoder, said method comprising receiving audio input data; organizing said input data by a Code Excited Linear Predictor (CELP) processing module for further encoding by arranging said data according to significance of data, where more significant data is placed ahead of less significant data; and providing a scalable output bitstream; a higher bitrate bitstream is scaled to lower bitrate by discarding less significant data from frame ends The organized CELP data comprises of a first part and a second part The first part comprises a frame header, sub frame parameters and innovation vector quantization data from the first frame from all channels The innovation vector quantization data from the first frames from all channels is arranged according to channel number

...read moreread less

Bit rate reduction in CELP coders by applying Vocal Tract Filter similarity behaviour

[...]

M. S. Arun Sankar¹, P. S. Sathidevi¹•Institutions (1)

National Institute of Technology Calicut¹

01 Jan 2019

TL;DR: The proposed method has brought down the bit rate of CELP to 6.4 kbps or reduced the bit requirement by 12% without compromising on the perceptual quality of reconstructed speech.

...read moreread less

Abstract: Speech is a highly complex and dynamic acoustic wave produced by the vocal tract as a result of the excitation in the form of air expelled from lungs The vocal tract characteristics vary in different manner during production of various speech categories This time variant acoustic filter has been represented by a Linear Prediction (LP) filter in Speech Production Model based on which Code Excited Linear Prediction (CELP) and many other speech coders are built The periodic nature of voiced speech due to vocal chord vibration causes slow variation for vocal tract characteristics and thus, similarity exists among nearby portions of voiced speech This similarity property is explored to reduce the count of transmitted Linear Predictive Coding (LPC) coefficients and excitation that are bit consuming and also significant parameters of LP filter This has been implemented in 73 kbps CELP by determining appropriate threshold for similarity values of both parameters The proposed method has brought down the bit rate of CELP to 64 kbps or reduced the bit requirement by 12% without compromising on the perceptual quality of reconstructed speech

...read moreread less

Proceedings Article•DOI•

Bit rate reduction in CELP coders by applying Vocal Tract Filter similarity behaviour

[...]

M. S. Arun Sankar¹, P. S. Sathidevi¹•Institutions (1)

National Institute of Technology Calicut¹

01 Oct 2019

TL;DR: In this article, the similarity property of the vocal tract is explored to reduce the count of transmitted LPC coefficients and excitation that are bit consuming and also significant parameters of LP filter.

...read moreread less

Abstract: Speech is a highly complex and dynamic acoustic wave produced by the vocal tract as a result of the excitation in the form of air expelled from lungs. The vocal tract characteristics vary in different manner during production of various speech categories. This time variant acoustic filter has been represented by a Linear Prediction (LP) filter in Speech Production Model based on which Code Excited Linear Prediction (CELP) and many other speech coders are built. The periodic nature of voiced speech due to vocal chord vibration causes slow variation for vocal tract characteristics and thus, similarity exists among nearby portions of voiced speech. This similarity property is explored to reduce the count of transmitted Linear Predictive Coding (LPC) coefficients and excitation that are bit consuming and also significant parameters of LP filter. This has been implemented in 7.3 kbps CELP by determining appropriate threshold for similarity values of both parameters. The proposed method has brought down the bit rate of CELP to 6.4 kbps or reduced the bit requirement by 12% without compromising on the perceptual quality of reconstructed speech.

...read moreread less