scispace - formally typeset
Open AccessProceedings ArticleDOI

Wavenet Based Low Rate Speech Coding

TLDR
In this article, a WaveNet generative speech model is used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s.
Abstract
Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative model and show that approximating the signal waveform incurs a large rate penalty. Our experiments confirm the high performance of the WaveNet based coder and show that the speech produced by the system is able to additionally perform implicit bandwidth extension and does not significantly impair recognition of the original speaker for the human listener, even when that speaker has not been used during the training of the generative model.

read more

Citations
More filters
Proceedings ArticleDOI

LPCNET: Improving Neural Speech Synthesis through Linear Prediction

TL;DR: This article proposed LPCNet, a WaveRNN variant that combines linear prediction with recurrent neural networks to improve the efficiency of speech synthesis, achieving high quality speech synthesis with a complexity under 3 GFLOPS.
Proceedings ArticleDOI

Low Bit-rate Speech Coding with VQ-VAE and a WaveNet Decoder

TL;DR: This work demonstrates that a neural network architecture based on VQ-VAE with a WaveNet decoder can be used to perform very low bit-rate speech coding with high reconstruction quality.
Journal ArticleDOI

Beyond Transmitting Bits: Context, Semantics, and Task-Oriented Communications

TL;DR: This tutorial summarizes the efforts to date, starting from its early adaptations, semantic-aware and task-oriented communications, covering the foundations, algorithms and potential implementations, and focuses on approaches that utilize information theory to provide the foundations.
Proceedings Article

It's Raw! Audio Generation with State-Space Models

TL;DR: SaShiMi, a new multi-scale architecture for waveform modeling built around the recently introduced S4 model for long sequence modeling, is proposed, identifying that S4 can be unstable during autoregressive generation, and providing a simple improvement to its parameterization by drawing connections to Hurwitz matrices.
Journal ArticleDOI

High Fidelity Neural Audio Compression

TL;DR: In this article , the authors propose a loss balancer mechanism to stabilize training, where the weight of a loss now defines the fraction of the overall gradient it should represent, thus decoupling the choice of this hyper-parameter from the typical scale of the loss.
Related Papers (5)