Online Unsupervised Classification With Model Comparison in the Variational Bayes Framework for Voice Activity Detection
Reads0
Chats0
TLDR
The proposed VAD method, based on the Variational Bayes approach to the online Expectation Maximization (EM), can automatically adapt the decision level and the statistical model at the same time, and outperforms the conventional VAD algorithms, especially in the remote recording condition.Abstract:
A new online, unsupervised method for Voice Activity Detection (VAD) is proposed. The conventional VAD methods often rely on heuristics to adapt the decision threshold to the estimated SNR. The proposed VAD method is based on the Variational Bayes (VB) approach to the online Expectation Maximization (EM), so that it can automatically adapt the decision level and the statistical model at the same time. We consider two parallel classifiers, one for the noise-only case, and the other for speech-and-noise case. Both models are trained concurrently and online using the VB framework. The VB framework also provides an explicit approximation of the log evidence called free energy. It is used to assess the reliability of the classifier in an online fashion, and to decide which model is more appropriate at a given time frame. Experimental evaluations were conducted on the CENSREC-1-C database designed for VAD evaluations. With the effect of the model comparison, the proposed scheme outperforms the conventional VAD algorithms, especially in the remote recording condition. It is also shown to be more robust with respect to changes of the noise type.read more
Citations
More filters
Journal ArticleDOI
Deep Belief Networks Based Voice Activity Detection
Xiao-Lei Zhang,Ji Wu +1 more
TL;DR: Extensive experimental results on the AURORA2 corpus show that the DBN-based VAD not only outperforms eleven referenced VADs, but also can meet the real-time detection demand of VAD.
Book
Bayesian Speech and Language Processing
Shinji Watanabe,Jen-Tzung Chien +1 more
TL;DR: A range of statistical models is detailed, from hidden Markov models to Gaussian mixture models, n-gram models and latent topic models, along with applications including automatic speech recognition, speaker verification, and information retrieval.
Journal ArticleDOI
Robust muscle activity onset detection using an unsupervised electromyogram learning framework
TL;DR: This study presents an unsupervised EMG learning framework based on a sequential Gaussian mixture model (GMM) to detect muscle activity onsets and demonstrated robust performance for low and changing signal to noise ratios in a dynamic environment.
Book ChapterDOI
Multidomain Voice Activity Detection during Human-Robot Interaction
TL;DR: A robust VAD system that, by means of the microphones located in a robot, is able to detect when a person starts to talk and when he ends, and shows a high percentage of success in the classification of different audio signal as voice or unvoice.
Journal ArticleDOI
Linearithmic Time Sparse and Convex Maximum Margin Clustering
Xiao-Lei Zhang,Ji Wu +1 more
TL;DR: A new linearithmic time sparse and convex MMC algorithm, called support-vector-regression-based MMC (SVR-MMC), is proposed, which first uses the SVR as the core of the MMC, and is relaxed as a convex optimization problem, which is iteratively solved by the cutting-plane algorithm.
References
More filters
Journal ArticleDOI
Maximum likelihood from incomplete data via the EM algorithm
Book
Pattern Recognition and Machine Learning
TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.
Journal ArticleDOI
Pattern Recognition and Machine Learning
TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.
Book
Fundamentals of speech recognition
TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
Book
Information Theory, Inference and Learning Algorithms
TL;DR: A fun and exciting textbook on the mathematics underpinning the most dynamic areas of modern science and engineering.