scispace - formally typeset
Open AccessJournal ArticleDOI

Online Unsupervised Classification With Model Comparison in the Variational Bayes Framework for Voice Activity Detection

Reads0
Chats0
TLDR
The proposed VAD method, based on the Variational Bayes approach to the online Expectation Maximization (EM), can automatically adapt the decision level and the statistical model at the same time, and outperforms the conventional VAD algorithms, especially in the remote recording condition.
Abstract
A new online, unsupervised method for Voice Activity Detection (VAD) is proposed. The conventional VAD methods often rely on heuristics to adapt the decision threshold to the estimated SNR. The proposed VAD method is based on the Variational Bayes (VB) approach to the online Expectation Maximization (EM), so that it can automatically adapt the decision level and the statistical model at the same time. We consider two parallel classifiers, one for the noise-only case, and the other for speech-and-noise case. Both models are trained concurrently and online using the VB framework. The VB framework also provides an explicit approximation of the log evidence called free energy. It is used to assess the reliability of the classifier in an online fashion, and to decide which model is more appropriate at a given time frame. Experimental evaluations were conducted on the CENSREC-1-C database designed for VAD evaluations. With the effect of the model comparison, the proposed scheme outperforms the conventional VAD algorithms, especially in the remote recording condition. It is also shown to be more robust with respect to changes of the noise type.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Deep Belief Networks Based Voice Activity Detection

TL;DR: Extensive experimental results on the AURORA2 corpus show that the DBN-based VAD not only outperforms eleven referenced VADs, but also can meet the real-time detection demand of VAD.
Book

Bayesian Speech and Language Processing

TL;DR: A range of statistical models is detailed, from hidden Markov models to Gaussian mixture models, n-gram models and latent topic models, along with applications including automatic speech recognition, speaker verification, and information retrieval.
Journal ArticleDOI

Robust muscle activity onset detection using an unsupervised electromyogram learning framework

TL;DR: This study presents an unsupervised EMG learning framework based on a sequential Gaussian mixture model (GMM) to detect muscle activity onsets and demonstrated robust performance for low and changing signal to noise ratios in a dynamic environment.
Book ChapterDOI

Multidomain Voice Activity Detection during Human-Robot Interaction

TL;DR: A robust VAD system that, by means of the microphones located in a robot, is able to detect when a person starts to talk and when he ends, and shows a high percentage of success in the classification of different audio signal as voice or unvoice.
Journal ArticleDOI

Linearithmic Time Sparse and Convex Maximum Margin Clustering

TL;DR: A new linearithmic time sparse and convex MMC algorithm, called support-vector-regression-based MMC (SVR-MMC), is proposed, which first uses the SVR as the core of the MMC, and is relaxed as a convex optimization problem, which is iteratively solved by the cutting-plane algorithm.
References
More filters
Book

Pattern Recognition and Machine Learning

TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.
Journal ArticleDOI

Pattern Recognition and Machine Learning

Radford M. Neal
- 01 Aug 2007 - 
TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.
Book

Fundamentals of speech recognition

TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
Book

Information Theory, Inference and Learning Algorithms

TL;DR: A fun and exciting textbook on the mathematics underpinning the most dynamic areas of modern science and engineering.
Related Papers (5)