Unsupervised Speech Activity Detection Using Voicing Measures and Perceptual Spectral Flux
Reads0
Chats0
TLDR
Experimental results indicate that the proposed SAD scheme is highly effective and provides superior and consistent performance across various noise types and distortion levels.Abstract:
Effective speech activity detection (SAD) is a necessary first step for robust speech applications. In this letter, we propose a robust and unsupervised SAD solution that leverages four different speech voicing measures combined with a perceptual spectral flux feature, for audio-based surveillance and monitoring applications. Effectiveness of the proposed technique is evaluated and compared against several commonly adopted unsupervised SAD methods under simulated and actual harsh acoustic conditions with varying distortion levels. Experimental results indicate that the proposed SAD scheme is highly effective and provides superior and consistent performance across various noise types and distortion levels.read more
Citations
More filters
Journal ArticleDOI
Speaker Recognition by Machines and Humans: A tutorial review
John H. L. Hansen,Taufiq Hasan +1 more
TL;DR: A comparative study of human versus machine speaker recognition is concluded, with an emphasis on prominent speaker-modeling techniques that have emerged in the last decade for automatic systems.
Journal ArticleDOI
MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception
Carlos Busso,Srinivas Parthasarathy,Alec Burmania,Mohammed Abdelwahab,Najmeh Sadoughi,Emily Mower Provost +5 more
TL;DR: The MSP-IMPROV corpus is presented, a multimodal emotional database, where the goal is to have control over lexical content and emotion while also promoting naturalness in the recordings, leveraging the large size of the audiovisual database.
Journal ArticleDOI
Applications of Artificial Intelligence in Machine Learning: Review and Prospect
TL;DR: A brief review and future prospect of the vast applications of machine learning has been made.
Journal ArticleDOI
Boosting contextual information for deep neural network based voice activity detection
Xiao-Lei Zhang,DeLiang Wang +1 more
TL;DR: When trained on a large amount of noise types and a wide range of signal-to-noise ratios, the MRS-based VAD demonstrates surprisingly good generalization performance on unseen test scenarios, approaching the performance with noise-dependent training.
Proceedings ArticleDOI
Speech activity detection on youtube using deep neural networks.
TL;DR: It is demonstrated that a DNN with input consisting of multiple frames of mel frequency cepstral coefficients (MFCCs) yields drastically lower frame-wise error rates on YouTube videos compared to a conventional GMM based system.
References
More filters
Journal ArticleDOI
A statistical model-based voice activity detection
TL;DR: An effective hang-over scheme which considers the previous observations by a first-order Markov process modeling of speech occurrences is proposed which shows significantly better performances than the G.729B VAD in low signal-to-noise ratio (SNR) and vehicular noise environments.
Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound
TL;DR: In this article, the authors present an autocorrelation-based method for detecting the acoustic pitch period of a sound, where the position of the maximum of the auto-correlation function of the sound can be found from the relative height of this maximum.
Proceedings ArticleDOI
Construction and evaluation of a robust multifeature speech/music discriminator
Eric D. Scheirer,Malcolm Slaney +1 more
TL;DR: A real-time computer system capable of distinguishing speech signals from music signals over a wide range of digital audio input is constructed and extensive data on system performance and the cross-validated training/test setup used to evaluate the system is provided.
Journal Article
ITU-T recommendation G.729 Annex B : A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications : Standardization and characterization of G.729
Journal ArticleDOI
Efficient voice activity detection algorithms using long-term speech information
TL;DR: A new VAD algorithm for improving speech detection robustness in noisy environments and the performance of speech recognition systems is presented, which formsulates the speech/non-speech decision rule by comparing the long-term spectral envelope to the average noise spectrum, thus yielding a high discriminating decision rule and minimizing the average number of decision errors.