Wavenet Based Low Rate Speech Coding
W. Bastiaan Kleijn,Felicia S. C. Lim,Alejandro Luebs,Jan Skoglund,Florian Stimberg,Quan Wang,Thomas C. Walters +6 more
- pp 676-680
TLDR
In this article, a WaveNet generative speech model is used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s.Abstract:
Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative model and show that approximating the signal waveform incurs a large rate penalty. Our experiments confirm the high performance of the WaveNet based coder and show that the speech produced by the system is able to additionally perform implicit bandwidth extension and does not significantly impair recognition of the original speaker for the human listener, even when that speaker has not been used during the training of the generative model.read more
Citations
More filters
Proceedings ArticleDOI
LPCNET: Improving Neural Speech Synthesis through Linear Prediction
Jean-Marc Valin,Jan Skoglund +1 more
TL;DR: This article proposed LPCNet, a WaveRNN variant that combines linear prediction with recurrent neural networks to improve the efficiency of speech synthesis, achieving high quality speech synthesis with a complexity under 3 GFLOPS.
Proceedings ArticleDOI
Low Bit-rate Speech Coding with VQ-VAE and a WaveNet Decoder
Cristina Garbacea,Aaron van den Oord,Yazhe Li,Felicia S. C. Lim,Alejandro Luebs,Oriol Vinyals,Thomas C. Walters +6 more
TL;DR: This work demonstrates that a neural network architecture based on VQ-VAE with a WaveNet decoder can be used to perform very low bit-rate speech coding with high reconstruction quality.
Journal ArticleDOI
Beyond Transmitting Bits: Context, Semantics, and Task-Oriented Communications
Deniz Gunduz,Zhijin Qin,Inaki Estella Aguerri,Harpreet S. Dhillon,Zhaohui Yang,Aylin Yener,Kai-Kit Wong,Chan-Byoung Chae +7 more
TL;DR: This tutorial summarizes the efforts to date, starting from its early adaptations, semantic-aware and task-oriented communications, covering the foundations, algorithms and potential implementations, and focuses on approaches that utilize information theory to provide the foundations.
Proceedings Article
It's Raw! Audio Generation with State-Space Models
TL;DR: SaShiMi, a new multi-scale architecture for waveform modeling built around the recently introduced S4 model for long sequence modeling, is proposed, identifying that S4 can be unstable during autoregressive generation, and providing a simple improvement to its parameterization by drawing connections to Hurwitz matrices.
Journal ArticleDOI
High Fidelity Neural Audio Compression
TL;DR: In this article , the authors propose a loss balancer mechanism to stabilize training, where the weight of a loss now defines the fraction of the overall gradient it should represent, thus decoupling the choice of this hyper-parameter from the typical scale of the loss.