scispace - formally typeset
Proceedings ArticleDOI

A novel scheme for SVAC audio encoder

Reads0
Chats0
TLDR
A novel scheme is proposed in which speech coding module based on Algebraic Code Excited Linear Prediction (ACELP) is removed completely and speech waveforms can be reconstructed from MFCCs in decoding and this greatly simplifies the structure of SVAC.
Abstract
In the audio encoder of Surveillance Video and Audio Coding (SVAC), both audio signals and MEL-frequency cepstral coefficients (MFCCs) are coded and this leads to high computational complexity. This paper proposes a novel scheme for SVAC in which speech coding module based on Algebraic Code Excited Linear Prediction (ACELP) is removed completely and speech waveforms can be reconstructed from MFCCs in decoding. The novel scheme greatly simplifies the structure of SVAC and also has a high performance for decoded speech signals in quality evaluation.

read more

References
More filters
Journal ArticleDOI

Signal estimation from modified short-time Fourier transform

TL;DR: An algorithm to estimate a signal from its modified short-time Fourier transform (STFT) by minimizing the mean squared error between the STFT of the estimated signal and the modified STFT magnitude is presented.
Proceedings ArticleDOI

Speech reconstruction from mel frequency cepstral coefficients and pitch frequency

TL;DR: A novel low complexity, frequency domain algorithm for reconstruction of speech from the mel-frequency cepstral coefficients (MFCC), commonly used by speech recognition systems, and the pitch frequency values, which achieves natural sounding, good quality intelligible speech.
Journal ArticleDOI

Prediction of Fundamental Frequency and Voicing From Mel-Frequency Cepstral Coefficients for Unconstrained Speech Reconstruction

TL;DR: Spectrogram analysis of reconstructed speech shows that highly intelligible speech is produced with the quality of the speaker-dependent speech being slightly higher owing to the more accurate fundamental frequency and voicing predictions.
Journal ArticleDOI

Low Bit-Rate Speech Coding Through Quantization of Mel-Frequency Cepstral Coefficients

TL;DR: The results show that the MFCC-based codec exceeds the state-of-the-art MELPe codec across the entire range of 600-2400 bps, when evaluated with the perceptual evaluation of speech quality (PESQ) (ITU-T recommendation P.862).
Proceedings ArticleDOI

Hybrid Scalar/Vector Quantization of Mel-Frequency Cepstral Coefficients for Low Bit-Rate Coding of Speech

TL;DR: The results show perceptual evaluation of speech quality (PESQ) of the MFCC-based codec matches the state-of-the-art MELPe codec at 600 bps and exceeds the CELP codec at 2000 -- 4000 bps coding rates.