A novel scheme for SVAC audio encoder

doi:10.1109/ISCIT.2014.7011970

Proceedings ArticleDOI

A novel scheme for SVAC audio encoder

Ruo Shu, +3 more

- pp 531-534

Chats0

TLDR

A novel scheme is proposed in which speech coding module based on Algebraic Code Excited Linear Prediction (ACELP) is removed completely and speech waveforms can be reconstructed from MFCCs in decoding and this greatly simplifies the structure of SVAC.

Abstract:

In the audio encoder of Surveillance Video and Audio Coding (SVAC), both audio signals and MEL-frequency cepstral coefficients (MFCCs) are coded and this leads to high computational complexity. This paper proposes a novel scheme for SVAC in which speech coding module based on Algebraic Code Excited Linear Prediction (ACELP) is removed completely and speech waveforms can be reconstructed from MFCCs in decoding. The novel scheme greatly simplifies the structure of SVAC and also has a high performance for decoded speech signals in quality evaluation.

References

PDF

Open Access

More filters

Journal ArticleDOI

Signal estimation from modified short-time Fourier transform

D. Griffin, +1 more

- 01 Apr 1984 -

IEEE Transactions on Acoustics, Speech, ...

TL;DR: An algorithm to estimate a signal from its modified short-time Fourier transform (STFT) by minimizing the mean squared error between the STFT of the estimated signal and the modified STFT magnitude is presented.

...read moreread less

Proceedings ArticleDOI

Speech reconstruction from mel frequency cepstral coefficients and pitch frequency

Dan Chazan, +3 more

TL;DR: A novel low complexity, frequency domain algorithm for reconstruction of speech from the mel-frequency cepstral coefficients (MFCC), commonly used by speech recognition systems, and the pitch frequency values, which achieves natural sounding, good quality intelligible speech.

...read moreread less

Journal ArticleDOI

Prediction of Fundamental Frequency and Voicing From Mel-Frequency Cepstral Coefficients for Unconstrained Speech Reconstruction

Ben Milner, +1 more

- 01 Jan 2007 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: Spectrogram analysis of reconstructed speech shows that highly intelligible speech is produced with the quality of the speaker-dependent speech being slightly higher owing to the more accurate fundamental frequency and voicing predictions.

...read moreread less

Journal ArticleDOI

Low Bit-Rate Speech Coding Through Quantization of Mel-Frequency Cepstral Coefficients

Laura E. Boucheron, +2 more

- 01 Feb 2012 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: The results show that the MFCC-based codec exceeds the state-of-the-art MELPe codec across the entire range of 600-2400 bps, when evaluated with the perceptual evaluation of speech quality (PESQ) (ITU-T recommendation P.862).

...read moreread less

Proceedings ArticleDOI

Hybrid Scalar/Vector Quantization of Mel-Frequency Cepstral Coefficients for Low Bit-Rate Coding of Speech

Laura E. Boucheron, +2 more

TL;DR: The results show perceptual evaluation of speech quality (PESQ) of the MFCC-based codec matches the state-of-the-art MELPe codec at 600 bps and exceeds the CELP codec at 2000 -- 4000 bps coding rates.

...read moreread less

A novel scheme for SVAC audio encoder

References

Signal estimation from modified short-time Fourier transform

Speech reconstruction from mel frequency cepstral coefficients and pitch frequency

Prediction of Fundamental Frequency and Voicing From Mel-Frequency Cepstral Coefficients for Unconstrained Speech Reconstruction

Low Bit-Rate Speech Coding Through Quantization of Mel-Frequency Cepstral Coefficients

Hybrid Scalar/Vector Quantization of Mel-Frequency Cepstral Coefficients for Low Bit-Rate Coding of Speech

Related Papers (5)

A robust speech/music discriminator for switched audio coding

A multimode transform predictive coder (MTPC) for speech and audio

Method and apparatus for encoding or decoding a speech and/or non-speech audio input signal

Direct linear conversion of LSP parameters for perceptual control in speech and audio coding

Speaker identification and verification from audio coded speech in matched and mismatched conditions