Universal speech models for speaker independent single channel source separation
Dennis L. Sun,Gautham J. Mysore +1 more
- pp 141-145
TLDR
This work proposes a method to learn a universal speech model from a general corpus of speech and shows how to use this model to separate speech from other sound sources and shows that this method improves performance when training data of the non-speech source is available.Abstract:
Supervised and semi-supervised source separation algorithms based on non-negative matrix factorization have been shown to be quite effective. However, they require isolated training examples of one or more sources, which is often difficult to obtain. This limits the practical applicability of these algorithms. We examine the problem of efficiently utilizing general training data in the absence of specific training examples. Specifically, we propose a method to learn a universal speech model from a general corpus of speech and show how to use this model to separate speech from other sound sources. This model is used in lieu of a speech model trained on speaker-dependent training examples, and thus circumvents the aforementioned problem. Our experimental results show that our method achieves nearly the same performance as when speaker-dependent training examples are used. Furthermore, we show that our method improves performance when training data of the non-speech source is available.read more
Citations
More filters
Proceedings ArticleDOI
Adversarial Semi-Supervised Audio Source Separation Applied to Singing Voice Extraction
TL;DR: This work adopts adversarial training for music source separation with the aim of driving the separator towards outputs deemed as realistic by discriminator networks that are trained to tell apart real from separator samples.
Proceedings ArticleDOI
Speaker and noise independent voice activity detection.
TL;DR: This paper proposes a VAD method based on non-negative matrix factorization that is robust to a variety of non-stationary noises mixed at a wide range of signal-to-noise ratios and significantly outperforms baseline algorithms.
Book ChapterDOI
Single-channel audio source separation with NMF: divergences, constraints and algorithms
TL;DR: The standard majorisation-minimisation strategy to address optimisation for NMF with the common \(\beta \)-divergence is presented, a family of measures of fit that takes the quadratic cost, the generalised Kullback-Leibler divergence and the Itakura-Saito divergence as special cases.
Proceedings ArticleDOI
Speech enhancement by sparse, low-rank, and dictionary spectrogram decomposition
Zhuo Chen,Daniel P. W. Ellis +1 more
TL;DR: A novel speech enhancement system based on decomposing the spectrogram into sparse activation of a dictionary of target speech templates, and a low-rank background model, which makes few assumptions about the noise other than its limited spectral variation is proposed.
Journal ArticleDOI
Mixtures of Local Dictionaries for Unsupervised Speech Enhancement
Minje Kim,Paris Smaragdis +1 more
TL;DR: The proposed Mixture of Local Dictionaries (MLD) outperforms the state of the art technology by up to 2 dB in signal-to-distortion ratio, especially in the unsupervised environment where neither the speaker identity nor the type of noise is known in advance.
References
More filters
Journal ArticleDOI
The Elements of Statistical Learning
TL;DR: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods.
Journal ArticleDOI
Model selection and estimation in regression with grouped variables
Ming Yuan,Yi Lin +1 more
TL;DR: In this paper, instead of selecting factors by stepwise backward elimination, the authors focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection.
Journal ArticleDOI
Enhancing Sparsity by Reweighted ℓ 1 Minimization
TL;DR: A novel method for sparse signal recovery that in many situations outperforms ℓ1 minimization in the sense that substantially fewer measurements are needed for exact recovery.
Journal ArticleDOI
Speaker Verification Using Adapted Gaussian Mixture Models
TL;DR: The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.