Online Unsupervised Classification With Model Comparison in the Variational Bayes Framework for Voice Activity Detection

doi:10.1109/JSTSP.2010.2080821

Open AccessJournal ArticleDOI

Online Unsupervised Classification With Model Comparison in the Variational Bayes Framework for Voice Activity Detection

David Cournapeau, +3 more

- 27 Sep 2010 -

IEEE Journal of Selected Topics in Signa...

- Vol. 4, Iss: 6, pp 1071-1083

Chats0

TLDR

The proposed VAD method, based on the Variational Bayes approach to the online Expectation Maximization (EM), can automatically adapt the decision level and the statistical model at the same time, and outperforms the conventional VAD algorithms, especially in the remote recording condition.

Abstract:

A new online, unsupervised method for Voice Activity Detection (VAD) is proposed. The conventional VAD methods often rely on heuristics to adapt the decision threshold to the estimated SNR. The proposed VAD method is based on the Variational Bayes (VB) approach to the online Expectation Maximization (EM), so that it can automatically adapt the decision level and the statistical model at the same time. We consider two parallel classifiers, one for the noise-only case, and the other for speech-and-noise case. Both models are trained concurrently and online using the VB framework. The VB framework also provides an explicit approximation of the log evidence called free energy. It is used to assess the reliability of the classifier in an online fashion, and to decide which model is more appropriate at a given time frame. Experimental evaluations were conducted on the CENSREC-1-C database designed for VAD evaluations. With the effect of the model comparison, the proposed scheme outperforms the conventional VAD algorithms, especially in the remote recording condition. It is also shown to be more robust with respect to changes of the noise type.

Online Unsupervised Classification With Model Comparison in the Variational Bayes Framework for Voice Activity Detection

Citations

Deep Belief Networks Based Voice Activity Detection

Bayesian Speech and Language Processing

Robust muscle activity onset detection using an unsupervised electromyogram learning framework

Multidomain Voice Activity Detection during Human-Robot Interaction

Linearithmic Time Sparse and Convex Maximum Margin Clustering

References

Maximum likelihood from incomplete data via the EM algorithm

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning

Fundamentals of speech recognition

Information Theory, Inference and Learning Algorithms

Related Papers (5)

Bayesian Model Averaging of Naive Bayes for Clustering

Voice Activity Detection Based on an Unsupervised Learning Framework

A Word-Based Naïve Bayes Classifier for Confidence Estimation in Speech Recognition

Learning an Optimal Naive Bayes Classifier

Supervised classification with conditional Gaussian networks: Increasing the structure complexity from naive Bayes