Unsupervised Speech Activity Detection Using Voicing Measures and Perceptual Spectral Flux

doi:10.1109/LSP.2013.2237903

Open AccessJournal ArticleDOI

Unsupervised Speech Activity Detection Using Voicing Measures and Perceptual Spectral Flux

Seyed Omid Sadjadi, +1 more

- 04 Jan 2013 -

IEEE Signal Processing Letters

- Vol. 20, Iss: 3, pp 197-200

Chats0

TLDR

Experimental results indicate that the proposed SAD scheme is highly effective and provides superior and consistent performance across various noise types and distortion levels.

Abstract:

Effective speech activity detection (SAD) is a necessary first step for robust speech applications. In this letter, we propose a robust and unsupervised SAD solution that leverages four different speech voicing measures combined with a perceptual spectral flux feature, for audio-based surveillance and monitoring applications. Effectiveness of the proposed technique is evaluated and compared against several commonly adopted unsupervised SAD methods under simulated and actual harsh acoustic conditions with varying distortion levels. Experimental results indicate that the proposed SAD scheme is highly effective and provides superior and consistent performance across various noise types and distortion levels.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Speaker Recognition by Machines and Humans: A tutorial review

John H. L. Hansen, +1 more

- 14 Oct 2015 -

IEEE Signal Processing Magazine

TL;DR: A comparative study of human versus machine speaker recognition is concluded, with an emphasis on prominent speaker-modeling techniques that have emerged in the last decade for automatic systems.

...read moreread less

Journal ArticleDOI

MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception

Carlos Busso, +5 more

- 01 Jan 2017 -

IEEE Transactions on Affective Computing

TL;DR: The MSP-IMPROV corpus is presented, a multimodal emotional database, where the goal is to have control over lexical content and emotion while also promoting naturalness in the recordings, leveraging the large size of the audiovisual database.

...read moreread less

Journal ArticleDOI

Applications of Artificial Intelligence in Machine Learning: Review and Prospect

Sumit Das, +3 more

- 22 Apr 2015 -

International Journal of Computer Applic...

TL;DR: A brief review and future prospect of the vast applications of machine learning has been made.

...read moreread less

Journal ArticleDOI

Boosting contextual information for deep neural network based voice activity detection

Xiao-Lei Zhang, +1 more

- 01 Feb 2016 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: When trained on a large amount of noise types and a wide range of signal-to-noise ratios, the MRS-based VAD demonstrates surprisingly good generalization performance on unseen test scenarios, approaching the performance with noise-dependent training.

...read moreread less

Proceedings ArticleDOI

Speech activity detection on youtube using deep neural networks.

Neville Ryant, +2 more

TL;DR: It is demonstrated that a DNN with input consisting of multiple frames of mel frequency cepstral coefficients (MFCCs) yields drastically lower frame-wise error rates on YouTube videos compared to a conventional GMM based system.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

A statistical model-based voice activity detection

Jongseo Sohn, +2 more

- 01 Jan 1999 -

IEEE Signal Processing Letters

TL;DR: An effective hang-over scheme which considers the previous observations by a first-order Markov process modeling of speech occurrences is proposed which shows significantly better performances than the G.729B VAD in low signal-to-noise ratio (SNR) and vehicular noise environments.

...read moreread less

Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound

Paul Boersma

TL;DR: In this article, the authors present an autocorrelation-based method for detecting the acoustic pitch period of a sound, where the position of the maximum of the auto-correlation function of the sound can be found from the relative height of this maximum.

...read moreread less

Proceedings ArticleDOI

Construction and evaluation of a robust multifeature speech/music discriminator

Eric D. Scheirer, +1 more

TL;DR: A real-time computer system capable of distinguishing speech signals from music signals over a wide range of digital audio input is constructed and extensive data on system performance and the cross-validated training/test setup used to evaluate the system is provided.

...read moreread less

Journal Article

ITU-T recommendation G.729 Annex B : A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications : Standardization and characterization of G.729

A. Benyassine, +5 more

- 01 Jan 1997 -

IEEE Communications Magazine

Journal ArticleDOI

Efficient voice activity detection algorithms using long-term speech information

Javier Ramírez, +4 more

- 01 Apr 2004 -

Speech Communication

TL;DR: A new VAD algorithm for improving speech detection robustness in noisy environments and the performance of speech recognition systems is presented, which formsulates the speech/non-speech decision rule by comparing the long-term spectral envelope to the average noise spectrum, thus yielding a high discriminating decision rule and minimizing the average number of decision errors.

...read moreread less