Wavenet Based Low Rate Speech Coding

doi:10.1109/ICASSP.2018.8462529

Open AccessProceedings ArticleDOI

Wavenet Based Low Rate Speech Coding

- pp 676-680

TLDR

In this article, a WaveNet generative speech model is used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s.

Abstract:

Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative model and show that approximating the signal waveform incurs a large rate penalty. Our experiments confirm the high performance of the WaveNet based coder and show that the speech produced by the system is able to additionally perform implicit bandwidth extension and does not significantly impair recognition of the original speaker for the human listener, even when that speaker has not been used during the training of the generative model.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

LPCNET: Improving Neural Speech Synthesis through Linear Prediction

Jean-Marc Valin, +1 more

TL;DR: This article proposed LPCNet, a WaveRNN variant that combines linear prediction with recurrent neural networks to improve the efficiency of speech synthesis, achieving high quality speech synthesis with a complexity under 3 GFLOPS.

...read moreread less

Proceedings ArticleDOI

Low Bit-rate Speech Coding with VQ-VAE and a WaveNet Decoder

Cristina Garbacea, +6 more

TL;DR: This work demonstrates that a neural network architecture based on VQ-VAE with a WaveNet decoder can be used to perform very low bit-rate speech coding with high reconstruction quality.

...read moreread less

Journal ArticleDOI

Beyond Transmitting Bits: Context, Semantics, and Task-Oriented Communications

Deniz Gunduz, +7 more

- 19 Jul 2022 -

IEEE Journal on Selected Areas in Commun...

TL;DR: This tutorial summarizes the efforts to date, starting from its early adaptations, semantic-aware and task-oriented communications, covering the foundations, algorithms and potential implementations, and focuses on approaches that utilize information theory to provide the foundations.

...read moreread less

Proceedings Article

It's Raw! Audio Generation with State-Space Models

Karan Goel, +3 more

TL;DR: SaShiMi, a new multi-scale architecture for waveform modeling built around the recently introduced S4 model for long sequence modeling, is proposed, identifying that S4 can be unstable during autoregressive generation, and providing a simple improvement to its parameterization by drawing connections to Hurwitz matrices.

...read moreread less

Journal ArticleDOI

High Fidelity Neural Audio Compression

Alexandre D'efossez, +3 more

- 24 Oct 2022 -

arXiv.org

TL;DR: In this article , the authors propose a loss balancer mechanism to stabilize training, where the weight of a loss now defines the fraction of the overall gradient it should represent, thus decoupling the choice of this hyper-parameter from the typical scale of the loss.

...read moreread less

Collapse

Related Papers (5)

WaveNet: A Generative Model for Raw Audio

Aaron van den Oord, +8 more

- 12 Sep 2016 -

arXiv: Sound

Wavenet Based Low Rate Speech Coding

Citations

LPCNET: Improving Neural Speech Synthesis through Linear Prediction

Low Bit-rate Speech Coding with VQ-VAE and a WaveNet Decoder

Beyond Transmitting Bits: Context, Semantics, and Task-Oriented Communications

It's Raw! Audio Generation with State-Space Models

High Fidelity Neural Audio Compression

Related Papers (5)

WaveNet: A Generative Model for Raw Audio

LPCNET: Improving Neural Speech Synthesis through Linear Prediction

Adam: A Method for Stochastic Optimization

Waveglow: A Flow-based Generative Network for Speech Synthesis

Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions