scispace - formally typeset
Open AccessProceedings ArticleDOI

Deep Vocoder: Low Bit Rate Compression of Speech with Deep Autoencoder

TLDR
In this article, the authors proposed Deep Vocoder, a direct end-to-end low bit rate speech compression method with deep autoencoder (DAE) for extracting the latent representing features (LRFs) of speech, which are then efficiently quantized by an analysis-by-synthesis vector quantization (AbS VQ) method.
Abstract
Inspired by the success of deep neural networks (DNNs) in speech processing, this paper presents Deep Vocoder, a direct end-to-end low bit rate speech compression method with deep autoencoder (DAE). In Deep Vocoder, DAE is used for extracting the latent representing features (LRFs) of speech, which are then efficiently quantized by an analysis-by-synthesis vector quantization (AbS VQ) method. AbS VQ aims to minimize the perceptual spectral reconstruction distortion rather than the distortion of LRFs vector itself. Also, a suboptimal codebook searching technique is proposed to further reduce the computational complexity. Experimental results demonstrate that Deep Vocoder yields substantial improvements in terms of frequency-weighted segmental SNR, STOI and PESQ score when compared to the output of the conventional SQ-or VQ-based codec. The yielded PESQ score over the TIMIT corpus is 3.34 and 3.08 for speech coding at 2400 bit/s and 1200 bit/s, respectively.

read more

Citations
More filters
Journal Article

Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding

TL;DR: In this paper, the authors proposed a neural network-based speech coding framework for end-to-end speech analysis and synthesis without HMMs, which relies on a phonological subphonetic representation of speech.
Proceedings ArticleDOI

Vector-Quantized Zero-Delay Deep Autoencoders for the Compression of Electrical Stimulation Patterns of Cochlear Implants using STOI

TL;DR: In this paper , a zero-delay deep autoencoder (DAE) was proposed for the coding of the electrical stimulation patters of cochlear implant (CIs).
References
More filters
Proceedings Article

Auto-Encoding Variational Bayes

TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
Journal ArticleDOI

Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator

TL;DR: In this article, a system which utilizes a minimum mean square error (MMSE) estimator is proposed and then compared with other widely used systems which are based on Wiener filtering and the "spectral subtraction" algorithm.
Posted Content

Generative Adversarial Networks

TL;DR: In this article, a generative adversarial network (GAN) is proposed to estimate generative models via an adversarial process, in which two models are simultaneously trained: a generator G and a discriminator D that estimates the probability that a sample came from the training data rather than G.
Proceedings ArticleDOI

Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs

TL;DR: A new model has been developed for use across a wider range of network conditions, including analogue connections, codecs, packet loss and variable delay, known as perceptual evaluation of speech quality (PESQ).
Journal ArticleDOI

Signal estimation from modified short-time Fourier transform

TL;DR: An algorithm to estimate a signal from its modified short-time Fourier transform (STFT) by minimizing the mean squared error between the STFT of the estimated signal and the modified STFT magnitude is presented.